Benchmark API
lrdbenchmark provides a comprehensive benchmarking framework for evaluating and comparing all 18 estimators of long-range dependence.
Comprehensive Benchmark
- class lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark(output_dir: str | None = None, runtime_profile: str = 'auto')[source]
Bases:
objectComprehensive benchmark class for testing all estimators and data models.
- __init__(output_dir: str | None = None, runtime_profile: str = 'auto')[source]
Initialize the benchmark system.
- Parameters:
output_dir (str, optional) – Directory to save benchmark results
runtime_profile (str, optional) – Runtime profile to control computational intensity. Options: - “auto”: determine automatically (default) - “quick”: minimise expensive diagnostics (useful for tests) - “full”: enable all diagnostics and resampling routines
- run_comprehensive_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run comprehensive benchmark across all estimators and data models.
- Parameters:
data_length (int) – Length of test data to generate
benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
contamination_type (str, optional) – Type of contamination to apply to the data
contamination_level (float) – Level/intensity of contamination (0.0 to 1.0)
save_results (bool) – Whether to save results to file
- Returns:
Comprehensive benchmark results
- Return type:
- run_classical_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only classical statistical estimators.
- run_ml_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only machine learning estimators (non-neural).
- run_neural_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only neural network estimators.
- __init__(output_dir: str | None = None, runtime_profile: str = 'auto')[source]
Initialize the benchmark system.
- Parameters:
output_dir (str, optional) – Directory to save benchmark results
runtime_profile (str, optional) – Runtime profile to control computational intensity. Options: - “auto”: determine automatically (default) - “quick”: minimise expensive diagnostics (useful for tests) - “full”: enable all diagnostics and resampling routines
- _resolve_runtime_profile(runtime_profile: str | None) str[source]
Determine the runtime profile controlling benchmark intensity.
- _load_protocol_config(path: Path) Dict[str, Any][source]
Load benchmark protocol configuration from YAML/JSON file.
- _deep_merge_dicts(base: Dict[str, Any], updates: Dict[str, Any]) Dict[str, Any][source]
Recursively merge dictionaries without mutating inputs.
- _initialize_all_estimators() Dict[str, Dict[str, Any]][source]
Initialize all available estimators organized by category.
- _apply_estimator_overrides(estimators: Dict[str, Dict[str, Any]], overrides: Dict[str, Dict[str, Any]]) Dict[str, Dict[str, Any]][source]
Apply protocol-defined parameter overrides to initialized estimators.
- _initialize_contamination_models() Dict[str, Any][source]
Initialize all available contamination models.
- get_estimators_by_type(benchmark_type: str = 'comprehensive', data_length: int = 1000) Dict[str, Any][source]
Get estimators based on the specified benchmark type.
- Parameters:
benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
data_length (int) – Length of data to be tested (used for adaptive wavelet estimators)
- Returns:
Dictionary of estimators for the specified type
- Return type:
- generate_test_data(model_name: str, data_length: int = 1000, **kwargs) Tuple[ndarray, Dict[str, Any]][source]
Generate test data using specified model.
- apply_contamination(data: ndarray, contamination_type: str, contamination_level: float = 0.1, **kwargs) Tuple[ndarray, Dict[str, Any]][source]
Apply contamination to the data.
- Parameters:
- Returns:
(contaminated_data, contamination_info)
- Return type:
- run_single_estimator_test(estimator_name: str, data: ndarray, true_params: Dict[str, Any]) Dict[str, Any][source]
Run a single estimator test.
- _calculate_monte_carlo_mse(estimator, data: ndarray, true_value: float, n_simulations: int = 50) Dict[str, Any][source]
Calculate mean signed error using Monte Carlo simulations.
- Parameters:
estimator (BaseEstimator) – Estimator instance
data (np.ndarray) – Original dataset
true_value (float) – True parameter value
n_simulations (int) – Number of Monte Carlo simulations
- Returns:
Mean signed error analysis results
- Return type:
- _compute_significance_tests(results: Dict[str, Any], alpha: float = 0.05) Dict[str, Any][source]
Compute omnibus and post-hoc significance tests across estimators.
- _compute_stratified_metrics(results: Dict[str, Any], data_length: int, contamination_type: str | None, contamination_level: float) Dict[str, Any][source]
Produce stratified summaries across H bands, tail classes, data length, and contamination regime.
- _categorise_hurst_band(hurst_value: float | None) str[source]
Assign H estimates to qualitative persistence bands.
- _categorise_length_band(data_length: int | None) str[source]
Bucket data length into interpretable regimes.
- _extract_scale_data(result: Dict[str, Any], estimator: Any) Tuple[ndarray | None, ndarray | None][source]
Extract scale and statistics data from estimator result for diagnostics.
- Parameters:
result (dict) – Estimator result dictionary
estimator (BaseEstimator) – Estimator instance
- Returns:
(scales, statistics) arrays or (None, None) if unavailable
- Return type:
- _infer_estimator_family(estimator_name: str) str[source]
Infer the family (classical, ML, neural) from estimator name.
- _infer_tail_class(model_name: str | None, data_params: Dict[str, Any] | None = None) str[source]
Infer a qualitative tail/heaviness class based on the data model.
- _build_provenance_bundle(summary: Dict[str, Any]) Dict[str, Any][source]
Construct a comprehensive provenance bundle using ProvenanceTracker.
This bundle includes all settings needed to reproduce the experiment: - Data generation parameters - Estimator configuration - Preprocessing settings - Scale selection parameters - Analytics configuration - Environment information
- _attach_uncertainty_calibration_summary(summary: Dict[str, Any], lookback_days: int = 90) None[source]
Augment benchmark summaries with uncertainty calibration diagnostics.
- _build_result_row_provenance(result: Dict[str, Any], data_params: Dict[str, Any]) Dict[str, Any][source]
Build provenance bundle for a single result row.
This creates a lightweight provenance artifact per result that includes: - Experiment-level provenance (reference) - Row-specific parameters (data model, estimator, etc.) - Result metadata
- _record_uncertainty_event(estimator_name: str, data_model: str | None, uncertainty: Any, estimate: float | None, true_value: float | None, data_length: int, estimator_family: str | None) None[source]
Persist uncertainty calibration data via the error analyzer.
- run_comprehensive_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run comprehensive benchmark across all estimators and data models.
- Parameters:
data_length (int) – Length of test data to generate
benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
contamination_type (str, optional) – Type of contamination to apply to the data
contamination_level (float) – Level/intensity of contamination (0.0 to 1.0)
save_results (bool) – Whether to save results to file
- Returns:
Comprehensive benchmark results
- Return type:
- run_classical_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only classical statistical estimators.
- run_ml_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only machine learning estimators (non-neural).
- run_neural_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only neural network estimators.
- run_classical_estimators(data_models: list | None = None, n_samples: int = 1000, n_trials: int = 10, save_results: bool = True) Dict[str, Any][source]
Backward-compatible alias for run_classical_benchmark.
This method maintains the old API for compatibility with existing code.
- run_advanced_metrics_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', n_monte_carlo: int = 100, convergence_threshold: float = 1e-06, save_results: bool = True) Dict[str, Any][source]
Run advanced metrics benchmark focusing on convergence and bias analysis.
- Parameters:
- Returns:
Advanced metrics benchmark results
- Return type:
Benchmark Results
Benchmark Configuration
Usage Examples
Basic Benchmark
from lrdbenchmark import ComprehensiveBenchmark
import pandas as pd
# Create benchmark instance
benchmark = ComprehensiveBenchmark()
print("Running comprehensive benchmark...")
print("This will test multiple estimators on various data models")
# Run comprehensive benchmark
results = benchmark.run_comprehensive_benchmark(
data_length=1000,
n_runs=10
)
# Access results
print(f"\n=== BENCHMARK RESULTS ===")
print(f"Number of estimators tested: {len(results.estimators)}")
print(f"Number of datasets generated: {len(results.datasets)}")
print(f"Total runs completed: {len(results.estimators) * len(results.datasets) * 10}")
# Get summary statistics
summary = results.get_summary()
print(f"\n=== SUMMARY STATISTICS ===")
print(summary)
# Convert to DataFrame for detailed analysis
df = results.to_dataframe()
print(f"\n=== DETAILED RESULTS ===")
print(f"DataFrame shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
# Show top performing estimators
estimator_performance = df.groupby('estimator')['estimated_H'].agg(['mean', 'std', 'count'])
print(f"\n=== ESTIMATOR PERFORMANCE ===")
print(estimator_performance.round(3))
# Show results by data model
model_performance = df.groupby('data_model')['estimated_H'].agg(['mean', 'std', 'count'])
print(f"\n=== DATA MODEL PERFORMANCE ===")
print(model_performance.round(3))
Classical Estimators Only
from lrdbenchmark import ComprehensiveBenchmark
benchmark = ComprehensiveBenchmark()
# Run only classical estimators
results = benchmark.run_classical_benchmark(
data_length=1000,
estimators=['dfa', 'rs', 'gph', 'wavelet_variance'],
n_runs=5
)
# Get results for specific estimator
dfa_results = results.get_estimator_results('dfa')
print(f"DFA mean H estimate: {dfa_results.mean_estimate:.3f}")
print(f"DFA standard error: {dfa_results.std_error:.3f}")
Machine Learning Estimators
from lrdbenchmark import ComprehensiveBenchmark
benchmark = ComprehensiveBenchmark()
# Run ML estimators with custom parameters
results = benchmark.run_ml_benchmark(
data_length=1000,
estimators=['random_forest', 'gradient_boosting', 'svr'],
n_runs=3,
train_test_split=0.8
)
# Get performance metrics
for estimator_name, result in results.estimators.items():
print(f"{estimator_name}:")
print(f" Mean H estimate: {result.mean_estimate:.3f}")
print(f" RMSE: {result.rmse:.3f}")
print(f" MAE: {result.mae:.3f}")
Neural Network Estimators
from lrdbenchmark import ComprehensiveBenchmark
benchmark = ComprehensiveBenchmark()
# Run neural network estimators
results = benchmark.run_neural_benchmark(
data_length=1000,
estimators=['cnn', 'lstm', 'transformer'],
n_runs=2,
epochs=50,
batch_size=32
)
# Get training history
for estimator_name, result in results.estimators.items():
if hasattr(result, 'training_history'):
print(f"{estimator_name} training completed")
print(f" Final loss: {result.training_history['loss'][-1]:.4f}")
Custom Configuration
from lrdbenchmark import ComprehensiveBenchmark, BenchmarkConfig
# Create custom configuration
config = BenchmarkConfig(
data_models=['fbm', 'fgn', 'arfima'],
estimators=['dfa', 'gph', 'random_forest'],
data_lengths=[500, 1000, 2000],
n_runs=5,
random_seed=42
)
# Create benchmark with custom config
benchmark = ComprehensiveBenchmark(config=config)
# Run benchmark
results = benchmark.run_comprehensive_benchmark()
# Get results for specific data length
results_1000 = results.get_results_by_length(1000)
print(f"Results for length 1000: {len(results_1000.estimators)} estimators")
Advanced Usage
Parallel Processing
from lrdbenchmark import ComprehensiveBenchmark
import multiprocessing as mp
# Set number of processes
mp.set_start_method('spawn', force=True)
benchmark = ComprehensiveBenchmark()
# Run benchmark with parallel processing
results = benchmark.run_comprehensive_benchmark(
data_length=1000,
n_runs=20,
n_jobs=4 # Use 4 parallel processes
)
Custom Data Models
from lrdbenchmark import ComprehensiveBenchmark, FBMModel, FGNModel
# Create custom data models
custom_models = {
'fbm_high': FBMModel(H=0.8, sigma=1.0),
'fbm_low': FBMModel(H=0.3, sigma=1.0),
'fgn_medium': FGNModel(H=0.6, sigma=1.0)
}
benchmark = ComprehensiveBenchmark()
# Run benchmark with custom models
results = benchmark.run_comprehensive_benchmark(
data_length=1000,
custom_models=custom_models,
n_runs=5
)
Custom Estimators
from lrdbenchmark import ComprehensiveBenchmark
from lrdbenchmark import DFAEstimator
# Create custom estimator
custom_dfa = DFAEstimator(
min_scale=4,
max_scale=100,
num_scales=20,
polynomial_order=2
)
custom_estimators = {
'custom_dfa': custom_dfa
}
benchmark = ComprehensiveBenchmark()
# Run benchmark with custom estimator
results = benchmark.run_comprehensive_benchmark(
data_length=1000,
custom_estimators=custom_estimators,
n_runs=5
)
Results Analysis
Statistical Analysis
from lrdbenchmark import ComprehensiveBenchmark
import pandas as pd
benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
# Convert to pandas DataFrame for analysis
df = results.to_dataframe()
# Group by estimator and calculate statistics
stats = df.groupby('estimator')['estimated_H'].agg([
'mean', 'std', 'min', 'max', 'count'
]).round(3)
print("Estimator Statistics:")
print(stats)
# Calculate bias for each estimator
true_H = df['true_H'].iloc[0] # Assuming same true H for all
bias = df.groupby('estimator')['estimated_H'].mean() - true_H
print(f"\nBias (estimated - true H = {true_H}):")
print(bias.round(3))
Visualisation
from lrdbenchmark import ComprehensiveBenchmark
import matplotlib.pyplot as plt
import seaborn as sns
benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
# Create box plot
df = results.to_dataframe()
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='estimator', y='estimated_H')
plt.axhline(y=df['true_H'].iloc[0], color='red', linestyle='--', label='True H')
plt.title('Hurst Parameter Estimates by Estimator')
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()
# Create scatter plot
plt.figure(figsize=(10, 6))
for estimator in df['estimator'].unique():
subset = df[df['estimator'] == estimator]
plt.scatter(subset['true_H'], subset['estimated_H'],
label=estimator, alpha=0.6)
plt.plot([0.3, 0.9], [0.3, 0.9], 'k--', label='Perfect Estimation')
plt.xlabel('True Hurst Parameter')
plt.ylabel('Estimated Hurst Parameter')
plt.title('True vs Estimated Hurst Parameters')
plt.legend()
plt.grid(True)
plt.show()
Performance Comparison
from lrdbenchmark import ComprehensiveBenchmark
import time
benchmark = ComprehensiveBenchmark()
# Measure execution time
estimators = ['dfa', 'rs', 'gph', 'wavelet_variance']
execution_times = {}
for estimator in estimators:
start_time = time.time()
results = benchmark.run_classical_benchmark(
data_length=1000,
estimators=[estimator],
n_runs=5
)
execution_time = time.time() - start_time
execution_times[estimator] = execution_time
print("Execution Times:")
for estimator, time_taken in execution_times.items():
print(f"{estimator}: {time_taken:.2f} seconds")
Error Analysis
from lrdbenchmark import ComprehensiveBenchmark
import numpy as np
benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
df = results.to_dataframe()
# Calculate errors
df['error'] = df['estimated_H'] - df['true_H']
df['abs_error'] = np.abs(df['error'])
df['squared_error'] = df['error']**2
# Error statistics by estimator
error_stats = df.groupby('estimator').agg({
'error': ['mean', 'std'],
'abs_error': 'mean',
'squared_error': 'mean'
}).round(4)
error_stats.columns = ['Bias', 'Bias_Std', 'MAE', 'MSE']
print("Error Statistics:")
print(error_stats)
# Identify outliers
Q1 = df.groupby('estimator')['error'].quantile(0.25)
Q3 = df.groupby('estimator')['error'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[
(df['error'] < (Q1 - 1.5 * IQR).loc[df['estimator']]) |
(df['error'] > (Q3 + 1.5 * IQR).loc[df['estimator']])
]
print(f"\nNumber of outliers: {len(outliers)}")
Confidence Intervals
from lrdbenchmark import ComprehensiveBenchmark
import scipy.stats as stats
benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=20)
df = results.to_dataframe()
# Calculate confidence intervals
confidence_level = 0.95
alpha = 1 - confidence_level
ci_results = {}
for estimator in df['estimator'].unique():
subset = df[df['estimator'] == estimator]
estimates = subset['estimated_H'].values
# Bootstrap confidence interval
n_bootstrap = 1000
bootstrap_means = []
for _ in range(n_bootstrap):
bootstrap_sample = np.random.choice(estimates, size=len(estimates), replace=True)
bootstrap_means.append(np.mean(bootstrap_sample))
lower_ci = np.percentile(bootstrap_means, alpha/2 * 100)
upper_ci = np.percentile(bootstrap_means, (1-alpha/2) * 100)
ci_results[estimator] = {
'mean': np.mean(estimates),
'lower_ci': lower_ci,
'upper_ci': upper_ci,
'width': upper_ci - lower_ci
}
print("Confidence Intervals (95%):")
for estimator, ci in ci_results.items():
print(f"{estimator}: {ci['mean']:.3f} [{ci['lower_ci']:.3f}, {ci['upper_ci']:.3f}]")
Export and Reporting
Export Results
from lrdbenchmark import ComprehensiveBenchmark
import json
import pandas as pd
benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=5)
# Export to JSON
results.save_json('benchmark_results.json')
# Export to CSV
df = results.to_dataframe()
df.to_csv('benchmark_results.csv', index=False)
# Export to Excel
with pd.ExcelWriter('benchmark_results.xlsx') as writer:
df.to_excel(writer, sheet_name='Results', index=False)
# Create summary sheet
summary = results.get_summary()
summary_df = pd.DataFrame([summary])
summary_df.to_excel(writer, sheet_name='Summary', index=False)
Generate Reports
from lrdbenchmark import ComprehensiveBenchmark
benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)
# Generate comprehensive report
report = results.generate_report(
include_plots=True,
include_statistics=True,
include_recommendations=True
)
# Save report
with open('benchmark_report.html', 'w') as f:
f.write(report)
# Print summary
print(results.get_summary())
Best Practices
Sample Size: Use at least 1000 data points for reliable estimates
Number of Runs: Use 10-20 runs for stable statistics
Multiple Estimators: Compare results from different estimator types
Data Models: Test on various synthetic data models
Error Handling: Always handle potential estimation failures
Performance Monitoring: Track execution times for large-scale benchmarks
Result Validation: Cross-validate results with known theoretical values
Note
The benchmark system automatically handles parallel processing, error recovery, and result aggregation. For large-scale benchmarks, consider using the parallel processing capabilities.
Warning
Some estimators may fail on certain data types or parameter combinations. The benchmark system will report failures but continue with other estimators.