Benchmark API
============

lrdbenchmark provides a comprehensive benchmarking framework for evaluating and comparing all 18 estimators of long-range dependence.

Comprehensive Benchmark
-----------------------

.. autoclass:: lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark
   :members:
   :undoc-members:
   :show-inheritance:

   .. automethod:: __init__
   .. automethod:: run_comprehensive_benchmark
   .. automethod:: run_classical_benchmark
   .. automethod:: run_ml_benchmark
   .. automethod:: run_neural_benchmark

Benchmark Results
-----------------

.. autoclass:: lrdbenchmark.analysis.benchmark.BenchmarkResult
   :members:
   :undoc-members:
   :show-inheritance:

.. autoclass:: lrdbenchmark.analysis.benchmark.EstimatorResult
   :members:
   :undoc-members:
   :show-inheritance:

Benchmark Configuration
-----------------------

.. autoclass:: lrdbenchmark.analysis.benchmark.BenchmarkConfig
   :members:
   :undoc-members:
   :show-inheritance:

Usage Examples
-------------

Basic Benchmark
~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import pandas as pd

   # Create benchmark instance
   benchmark = ComprehensiveBenchmark()

   print("Running comprehensive benchmark...")
   print("This will test multiple estimators on various data models")
   
   # Run comprehensive benchmark
   results = benchmark.run_comprehensive_benchmark(
       data_length=1000,
       n_runs=10
   )

   # Access results
   print(f"\n=== BENCHMARK RESULTS ===")
   print(f"Number of estimators tested: {len(results.estimators)}")
   print(f"Number of datasets generated: {len(results.datasets)}")
   print(f"Total runs completed: {len(results.estimators) * len(results.datasets) * 10}")

   # Get summary statistics
   summary = results.get_summary()
   print(f"\n=== SUMMARY STATISTICS ===")
   print(summary)
   
   # Convert to DataFrame for detailed analysis
   df = results.to_dataframe()
   print(f"\n=== DETAILED RESULTS ===")
   print(f"DataFrame shape: {df.shape}")
   print(f"Columns: {list(df.columns)}")
   
   # Show top performing estimators
   estimator_performance = df.groupby('estimator')['estimated_H'].agg(['mean', 'std', 'count'])
   print(f"\n=== ESTIMATOR PERFORMANCE ===")
   print(estimator_performance.round(3))
   
   # Show results by data model
   model_performance = df.groupby('data_model')['estimated_H'].agg(['mean', 'std', 'count'])
   print(f"\n=== DATA MODEL PERFORMANCE ===")
   print(model_performance.round(3))

Classical Estimators Only
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   
   benchmark = ComprehensiveBenchmark()
   
   # Run only classical estimators
   results = benchmark.run_classical_benchmark(
       data_length=1000,
       estimators=['dfa', 'rs', 'gph', 'wavelet_variance'],
       n_runs=5
   )
   
   # Get results for specific estimator
   dfa_results = results.get_estimator_results('dfa')
   print(f"DFA mean H estimate: {dfa_results.mean_estimate:.3f}")
   print(f"DFA standard error: {dfa_results.std_error:.3f}")

Machine Learning Estimators
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   
   benchmark = ComprehensiveBenchmark()
   
   # Run ML estimators with custom parameters
   results = benchmark.run_ml_benchmark(
       data_length=1000,
       estimators=['random_forest', 'gradient_boosting', 'svr'],
       n_runs=3,
       train_test_split=0.8
   )
   
   # Get performance metrics
   for estimator_name, result in results.estimators.items():
       print(f"{estimator_name}:")
       print(f"  Mean H estimate: {result.mean_estimate:.3f}")
       print(f"  RMSE: {result.rmse:.3f}")
       print(f"  MAE: {result.mae:.3f}")

Neural Network Estimators
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   
   benchmark = ComprehensiveBenchmark()
   
   # Run neural network estimators
   results = benchmark.run_neural_benchmark(
       data_length=1000,
       estimators=['cnn', 'lstm', 'transformer'],
       n_runs=2,
       epochs=50,
       batch_size=32
   )
   
   # Get training history
   for estimator_name, result in results.estimators.items():
       if hasattr(result, 'training_history'):
           print(f"{estimator_name} training completed")
           print(f"  Final loss: {result.training_history['loss'][-1]:.4f}")

Custom Configuration
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark, BenchmarkConfig
   
   # Create custom configuration
   config = BenchmarkConfig(
       data_models=['fbm', 'fgn', 'arfima'],
       estimators=['dfa', 'gph', 'random_forest'],
       data_lengths=[500, 1000, 2000],
       n_runs=5,
       random_seed=42
   )
   
   # Create benchmark with custom config
   benchmark = ComprehensiveBenchmark(config=config)
   
   # Run benchmark
   results = benchmark.run_comprehensive_benchmark()
   
   # Get results for specific data length
   results_1000 = results.get_results_by_length(1000)
   print(f"Results for length 1000: {len(results_1000.estimators)} estimators")

Advanced Usage
--------------

Parallel Processing
~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import multiprocessing as mp
   
   # Set number of processes
   mp.set_start_method('spawn', force=True)
   
   benchmark = ComprehensiveBenchmark()
   
   # Run benchmark with parallel processing
   results = benchmark.run_comprehensive_benchmark(
       data_length=1000,
       n_runs=20,
       n_jobs=4  # Use 4 parallel processes
   )

Custom Data Models
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark, FBMModel, FGNModel
   
   # Create custom data models
   custom_models = {
       'fbm_high': FBMModel(H=0.8, sigma=1.0),
       'fbm_low': FBMModel(H=0.3, sigma=1.0),
       'fgn_medium': FGNModel(H=0.6, sigma=1.0)
   }
   
   benchmark = ComprehensiveBenchmark()
   
   # Run benchmark with custom models
   results = benchmark.run_comprehensive_benchmark(
       data_length=1000,
       custom_models=custom_models,
       n_runs=5
   )

Custom Estimators
~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   from lrdbenchmark import DFAEstimator
   
   # Create custom estimator
   custom_dfa = DFAEstimator(
       min_scale=4,
       max_scale=100,
       num_scales=20,
       polynomial_order=2
   )
   
   custom_estimators = {
       'custom_dfa': custom_dfa
   }
   
   benchmark = ComprehensiveBenchmark()
   
   # Run benchmark with custom estimator
   results = benchmark.run_comprehensive_benchmark(
       data_length=1000,
       custom_estimators=custom_estimators,
       n_runs=5
   )

Results Analysis
----------------

Statistical Analysis
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import pandas as pd
   
   benchmark = ComprehensiveBenchmark()
   results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
   
   # Convert to pandas DataFrame for analysis
   df = results.to_dataframe()
   
   # Group by estimator and calculate statistics
   stats = df.groupby('estimator')['estimated_H'].agg([
       'mean', 'std', 'min', 'max', 'count'
   ]).round(3)
   
   print("Estimator Statistics:")
   print(stats)
   
   # Calculate bias for each estimator
   true_H = df['true_H'].iloc[0]  # Assuming same true H for all
   bias = df.groupby('estimator')['estimated_H'].mean() - true_H
   
   print(f"\nBias (estimated - true H = {true_H}):")
   print(bias.round(3))

Visualisation
~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import matplotlib.pyplot as plt
   import seaborn as sns
   
   benchmark = ComprehensiveBenchmark()
   results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
   
   # Create box plot
   df = results.to_dataframe()
   
   plt.figure(figsize=(12, 6))
   sns.boxplot(data=df, x='estimator', y='estimated_H')
   plt.axhline(y=df['true_H'].iloc[0], color='red', linestyle='--', label='True H')
   plt.title('Hurst Parameter Estimates by Estimator')
   plt.xticks(rotation=45)
   plt.legend()
   plt.tight_layout()
   plt.show()
   
   # Create scatter plot
   plt.figure(figsize=(10, 6))
   for estimator in df['estimator'].unique():
       subset = df[df['estimator'] == estimator]
       plt.scatter(subset['true_H'], subset['estimated_H'], 
                  label=estimator, alpha=0.6)
   
   plt.plot([0.3, 0.9], [0.3, 0.9], 'k--', label='Perfect Estimation')
   plt.xlabel('True Hurst Parameter')
   plt.ylabel('Estimated Hurst Parameter')
   plt.title('True vs Estimated Hurst Parameters')
   plt.legend()
   plt.grid(True)
   plt.show()

Performance Comparison
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import time
   
   benchmark = ComprehensiveBenchmark()
   
   # Measure execution time
   estimators = ['dfa', 'rs', 'gph', 'wavelet_variance']
   execution_times = {}
   
   for estimator in estimators:
       start_time = time.time()
       results = benchmark.run_classical_benchmark(
           data_length=1000,
           estimators=[estimator],
           n_runs=5
       )
       execution_time = time.time() - start_time
       execution_times[estimator] = execution_time
   
   print("Execution Times:")
   for estimator, time_taken in execution_times.items():
       print(f"{estimator}: {time_taken:.2f} seconds")

Error Analysis
~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import numpy as np
   
   benchmark = ComprehensiveBenchmark()
   results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
   
   df = results.to_dataframe()
   
   # Calculate errors
   df['error'] = df['estimated_H'] - df['true_H']
   df['abs_error'] = np.abs(df['error'])
   df['squared_error'] = df['error']**2
   
   # Error statistics by estimator
   error_stats = df.groupby('estimator').agg({
       'error': ['mean', 'std'],
       'abs_error': 'mean',
       'squared_error': 'mean'
   }).round(4)
   
   error_stats.columns = ['Bias', 'Bias_Std', 'MAE', 'MSE']
   print("Error Statistics:")
   print(error_stats)
   
   # Identify outliers
   Q1 = df.groupby('estimator')['error'].quantile(0.25)
   Q3 = df.groupby('estimator')['error'].quantile(0.75)
   IQR = Q3 - Q1
   
   outliers = df[
       (df['error'] < (Q1 - 1.5 * IQR).loc[df['estimator']]) |
       (df['error'] > (Q3 + 1.5 * IQR).loc[df['estimator']])
   ]
   
   print(f"\nNumber of outliers: {len(outliers)}")

Confidence Intervals
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import scipy.stats as stats
   
   benchmark = ComprehensiveBenchmark()
   results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=20)
   
   df = results.to_dataframe()
   
   # Calculate confidence intervals
   confidence_level = 0.95
   alpha = 1 - confidence_level
   
   ci_results = {}
   for estimator in df['estimator'].unique():
       subset = df[df['estimator'] == estimator]
       estimates = subset['estimated_H'].values
       
       # Bootstrap confidence interval
       n_bootstrap = 1000
       bootstrap_means = []
       
       for _ in range(n_bootstrap):
           bootstrap_sample = np.random.choice(estimates, size=len(estimates), replace=True)
           bootstrap_means.append(np.mean(bootstrap_sample))
       
       lower_ci = np.percentile(bootstrap_means, alpha/2 * 100)
       upper_ci = np.percentile(bootstrap_means, (1-alpha/2) * 100)
       
       ci_results[estimator] = {
           'mean': np.mean(estimates),
           'lower_ci': lower_ci,
           'upper_ci': upper_ci,
           'width': upper_ci - lower_ci
       }
   
   print("Confidence Intervals (95%):")
   for estimator, ci in ci_results.items():
       print(f"{estimator}: {ci['mean']:.3f} [{ci['lower_ci']:.3f}, {ci['upper_ci']:.3f}]")

Export and Reporting
--------------------

Export Results
~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   import json
   import pandas as pd
   
   benchmark = ComprehensiveBenchmark()
   results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=5)
   
   # Export to JSON
   results.save_json('benchmark_results.json')
   
   # Export to CSV
   df = results.to_dataframe()
   df.to_csv('benchmark_results.csv', index=False)
   
   # Export to Excel
   with pd.ExcelWriter('benchmark_results.xlsx') as writer:
       df.to_excel(writer, sheet_name='Results', index=False)
       
       # Create summary sheet
       summary = results.get_summary()
       summary_df = pd.DataFrame([summary])
       summary_df.to_excel(writer, sheet_name='Summary', index=False)

Generate Reports
~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark
   
   benchmark = ComprehensiveBenchmark()
   results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
   
   # Generate comprehensive report
   report = results.generate_report(
       include_plots=True,
       include_statistics=True,
       include_recommendations=True
   )
   
   # Save report
   with open('benchmark_report.html', 'w') as f:
       f.write(report)
   
   # Print summary
   print(results.get_summary())

Best Practices
-------------

1. **Sample Size**: Use at least 1000 data points for reliable estimates
2. **Number of Runs**: Use 10-20 runs for stable statistics
3. **Multiple Estimators**: Compare results from different estimator types
4. **Data Models**: Test on various synthetic data models
5. **Error Handling**: Always handle potential estimation failures
6. **Performance Monitoring**: Track execution times for large-scale benchmarks
7. **Result Validation**: Cross-validate results with known theoretical values

.. note::
   The benchmark system automatically handles parallel processing, error recovery,
   and result aggregation. For large-scale benchmarks, consider using the
   parallel processing capabilities.

.. warning::
   Some estimators may fail on certain data types or parameter combinations.
   The benchmark system will report failures but continue with other estimators.