Benchmark API

lrdbenchmark provides a comprehensive benchmarking framework for evaluating and comparing all 18 estimators of long-range dependence.

Comprehensive Benchmark

class lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark(output_dir: str | None = None, runtime_profile: str = 'auto')[source]

Bases: object

Comprehensive benchmark class for testing all estimators and data models.

__init__(output_dir: str | None = None, runtime_profile: str = 'auto')[source]

Initialize the benchmark system.

Parameters:

output_dir (str, optional) – Directory to save benchmark results
runtime_profile (str, optional) – Runtime profile to control computational intensity. Options: - “auto”: determine automatically (default) - “quick”: minimise expensive diagnostics (useful for tests) - “full”: enable all diagnostics and resampling routines

run_comprehensive_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]

Run comprehensive benchmark across all estimators and data models.

Parameters:

data_length (int) – Length of test data to generate
benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
contamination_type (str, optional) – Type of contamination to apply to the data
contamination_level (float) – Level/intensity of contamination (0.0 to 1.0)
save_results (bool) – Whether to save results to file

Returns:

Comprehensive benchmark results

Return type:

dict

run_classical_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]: Run benchmark with only classical statistical estimators.

run_ml_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]: Run benchmark with only machine learning estimators (non-neural).

run_neural_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]: Run benchmark with only neural network estimators.

__init__(output_dir: str | None = None, runtime_profile: str = 'auto')[source]

Initialize the benchmark system.

Parameters:

output_dir (str, optional) – Directory to save benchmark results
runtime_profile (str, optional) – Runtime profile to control computational intensity. Options: - “auto”: determine automatically (default) - “quick”: minimise expensive diagnostics (useful for tests) - “full”: enable all diagnostics and resampling routines

_resolve_runtime_profile(runtime_profile: str | None) → str[source]: Determine the runtime profile controlling benchmark intensity.

_load_protocol_config(path: Path) → Dict[str, Any][source]: Load benchmark protocol configuration from YAML/JSON file.

_deep_merge_dicts(base: Dict[str, Any], updates: Dict[str, Any]) → Dict[str, Any][source]: Recursively merge dictionaries without mutating inputs.

_initialize_all_estimators() → Dict[str, Dict[str, Any]][source]: Initialize all available estimators organized by category.

_apply_estimator_overrides(estimators: Dict[str, Dict[str, Any]], overrides: Dict[str, Dict[str, Any]]) → Dict[str, Dict[str, Any]][source]: Apply protocol-defined parameter overrides to initialized estimators.

_initialize_data_models() → Dict[str, Any][source]: Initialize all available data models.

_initialize_contamination_models() → Dict[str, Any][source]: Initialize all available contamination models.

get_estimators_by_type(benchmark_type: str = 'comprehensive', data_length: int = 1000) → Dict[str, Any][source]

Get estimators based on the specified benchmark type.

Parameters:

benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
data_length (int) – Length of data to be tested (used for adaptive wavelet estimators)

Returns:

Dictionary of estimators for the specified type

Return type:

dict

generate_test_data(model_name: str, data_length: int = 1000, **kwargs) → Tuple[ndarray, Dict[str, Any]][source]

Generate test data using specified model.

Parameters:

model_name (str) – Name of the data model to use
data_length (int) – Length of data to generate
**kwargs (dict) – Additional parameters for the data model

Returns:

(data, parameters)

Return type:

tuple

apply_contamination(data: ndarray, contamination_type: str, contamination_level: float = 0.1, **kwargs) → Tuple[ndarray, Dict[str, Any]][source]

Apply contamination to the data.

Parameters:

data (np.ndarray) – Original clean data
contamination_type (str) – Type of contamination to apply
contamination_level (float) – Level/intensity of contamination (0.0 to 1.0)
**kwargs (dict) – Additional parameters for specific contamination types

Returns:

(contaminated_data, contamination_info)

Return type:

tuple

run_single_estimator_test(estimator_name: str, data: ndarray, true_params: Dict[str, Any]) → Dict[str, Any][source]

Run a single estimator test.

Parameters:

estimator_name (str) – Name of the estimator to test
data (np.ndarray) – Test data
true_params (dict) – True parameters of the data

Returns:

Test results

Return type:

dict

_calculate_monte_carlo_mse(estimator, data: ndarray, true_value: float, n_simulations: int = 50) → Dict[str, Any][source]

Calculate mean signed error using Monte Carlo simulations.

Parameters:

estimator (BaseEstimator) – Estimator instance
data (np.ndarray) – Original dataset
true_value (float) – True parameter value
n_simulations (int) – Number of Monte Carlo simulations

Returns:

Mean signed error analysis results

Return type:

dict

_compute_significance_tests(results: Dict[str, Any], alpha: float = 0.05) → Dict[str, Any][source]

Compute omnibus and post-hoc significance tests across estimators.

Parameters:

results (Dict[str, Any]) – Raw benchmark results grouped by data model.
alpha (float) – Significance level for hypothesis testing.

Returns:

Significance testing outcomes including Friedman statistics and Holm-adjusted pairwise Wilcoxon tests.

Return type:

Dict[str, Any]

_compute_stratified_metrics(results: Dict[str, Any], data_length: int, contamination_type: str | None, contamination_level: float) → Dict[str, Any][source]: Produce stratified summaries across H bands, tail classes, data length, and contamination regime.

_categorise_hurst_band(hurst_value: float | None) → str[source]: Assign H estimates to qualitative persistence bands.

_categorise_length_band(data_length: int | None) → str[source]: Bucket data length into interpretable regimes.

_extract_scale_data(result: Dict[str, Any], estimator: Any) → Tuple[ndarray | None, ndarray | None][source]

Extract scale and statistics data from estimator result for diagnostics.

Parameters:

result (dict) – Estimator result dictionary
estimator (BaseEstimator) – Estimator instance

Returns:

(scales, statistics) arrays or (None, None) if unavailable

Return type:

tuple

_infer_estimator_family(estimator_name: str) → str[source]

Infer the family (classical, ML, neural) from estimator name.

Parameters:: estimator_name (str) – Name of the estimator
Returns:: Estimator family
Return type:: str

_infer_tail_class(model_name: str | None, data_params: Dict[str, Any] | None = None) → str[source]: Infer a qualitative tail/heaviness class based on the data model.

_build_provenance_bundle(summary: Dict[str, Any]) → Dict[str, Any][source]

Construct a comprehensive provenance bundle using ProvenanceTracker.

This bundle includes all settings needed to reproduce the experiment: - Data generation parameters - Estimator configuration - Preprocessing settings - Scale selection parameters - Analytics configuration - Environment information

_attach_uncertainty_calibration_summary(summary: Dict[str, Any], lookback_days: int = 90) → None[source]: Augment benchmark summaries with uncertainty calibration diagnostics.

_build_result_row_provenance(result: Dict[str, Any], data_params: Dict[str, Any]) → Dict[str, Any][source]

Build provenance bundle for a single result row.

This creates a lightweight provenance artifact per result that includes: - Experiment-level provenance (reference) - Row-specific parameters (data model, estimator, etc.) - Result metadata

_record_uncertainty_event(estimator_name: str, data_model: str | None, uncertainty: Any, estimate: float | None, true_value: float | None, data_length: int, estimator_family: str | None) → None[source]: Persist uncertainty calibration data via the error analyzer.

run_comprehensive_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]

Run comprehensive benchmark across all estimators and data models.

Parameters:

data_length (int) – Length of test data to generate
benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
contamination_type (str, optional) – Type of contamination to apply to the data
contamination_level (float) – Level/intensity of contamination (0.0 to 1.0)
save_results (bool) – Whether to save results to file

Returns:

Comprehensive benchmark results

Return type:

dict

run_classical_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]: Run benchmark with only classical statistical estimators.

run_ml_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]: Run benchmark with only machine learning estimators (non-neural).

run_neural_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) → Dict[str, Any][source]: Run benchmark with only neural network estimators.

run_classical_estimators(data_models: list | None = None, n_samples: int = 1000, n_trials: int = 10, save_results: bool = True) → Dict[str, Any][source]

Backward-compatible alias for run_classical_benchmark.

This method maintains the old API for compatibility with existing code.

run_advanced_metrics_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', n_monte_carlo: int = 100, convergence_threshold: float = 1e-06, save_results: bool = True) → Dict[str, Any][source]

Run advanced metrics benchmark focusing on convergence and bias analysis.

Parameters:

data_length (int) – Length of test data to generate
benchmark_type (str) – Type of benchmark to run
n_monte_carlo (int) – Number of Monte Carlo simulations for bias analysis
convergence_threshold (float) – Threshold for convergence detection
save_results (bool) – Whether to save results to file

Returns:

Advanced metrics benchmark results

Return type:

dict

save_advanced_results(results: Dict[str, Any]) → None[source]: Save advanced benchmark results to files.

print_advanced_summary(summary: Dict[str, Any]) → None[source]: Print advanced benchmark summary.

save_results(results: Dict[str, Any]) → None[source]: Save benchmark results to files.

print_summary(summary: Dict[str, Any]) → None[source]: Print benchmark summary.

export_results(results: Dict[str, Any], output_path: str) → None[source]

Export benchmark results to a file.

Parameters:

results (dict) – Benchmark results dictionary
output_path (str) – Path to save the results (JSON format)

Benchmark Results

Benchmark Configuration

Usage Examples

Basic Benchmark

from lrdbenchmark import ComprehensiveBenchmark
import pandas as pd

# Create benchmark instance
benchmark = ComprehensiveBenchmark()

print("Running comprehensive benchmark...")
print("This will test multiple estimators on various data models")

# Run comprehensive benchmark
results = benchmark.run_comprehensive_benchmark(
    data_length=1000,
    n_runs=10
)

# Access results
print(f"\n=== BENCHMARK RESULTS ===")
print(f"Number of estimators tested: {len(results.estimators)}")
print(f"Number of datasets generated: {len(results.datasets)}")
print(f"Total runs completed: {len(results.estimators) * len(results.datasets) * 10}")

# Get summary statistics
summary = results.get_summary()
print(f"\n=== SUMMARY STATISTICS ===")
print(summary)

# Convert to DataFrame for detailed analysis
df = results.to_dataframe()
print(f"\n=== DETAILED RESULTS ===")
print(f"DataFrame shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

# Show top performing estimators
estimator_performance = df.groupby('estimator')['estimated_H'].agg(['mean', 'std', 'count'])
print(f"\n=== ESTIMATOR PERFORMANCE ===")
print(estimator_performance.round(3))

# Show results by data model
model_performance = df.groupby('data_model')['estimated_H'].agg(['mean', 'std', 'count'])
print(f"\n=== DATA MODEL PERFORMANCE ===")
print(model_performance.round(3))

Classical Estimators Only

from lrdbenchmark import ComprehensiveBenchmark

benchmark = ComprehensiveBenchmark()

# Run only classical estimators
results = benchmark.run_classical_benchmark(
    data_length=1000,
    estimators=['dfa', 'rs', 'gph', 'wavelet_variance'],
    n_runs=5
)

# Get results for specific estimator
dfa_results = results.get_estimator_results('dfa')
print(f"DFA mean H estimate: {dfa_results.mean_estimate:.3f}")
print(f"DFA standard error: {dfa_results.std_error:.3f}")

Machine Learning Estimators

from lrdbenchmark import ComprehensiveBenchmark

benchmark = ComprehensiveBenchmark()

# Run ML estimators with custom parameters
results = benchmark.run_ml_benchmark(
    data_length=1000,
    estimators=['random_forest', 'gradient_boosting', 'svr'],
    n_runs=3,
    train_test_split=0.8
)

# Get performance metrics
for estimator_name, result in results.estimators.items():
    print(f"{estimator_name}:")
    print(f"  Mean H estimate: {result.mean_estimate:.3f}")
    print(f"  RMSE: {result.rmse:.3f}")
    print(f"  MAE: {result.mae:.3f}")

Neural Network Estimators

from lrdbenchmark import ComprehensiveBenchmark

benchmark = ComprehensiveBenchmark()

# Run neural network estimators
results = benchmark.run_neural_benchmark(
    data_length=1000,
    estimators=['cnn', 'lstm', 'transformer'],
    n_runs=2,
    epochs=50,
    batch_size=32
)

# Get training history
for estimator_name, result in results.estimators.items():
    if hasattr(result, 'training_history'):
        print(f"{estimator_name} training completed")
        print(f"  Final loss: {result.training_history['loss'][-1]:.4f}")

Custom Configuration

from lrdbenchmark import ComprehensiveBenchmark, BenchmarkConfig

# Create custom configuration
config = BenchmarkConfig(
    data_models=['fbm', 'fgn', 'arfima'],
    estimators=['dfa', 'gph', 'random_forest'],
    data_lengths=[500, 1000, 2000],
    n_runs=5,
    random_seed=42
)

# Create benchmark with custom config
benchmark = ComprehensiveBenchmark(config=config)

# Run benchmark
results = benchmark.run_comprehensive_benchmark()

# Get results for specific data length
results_1000 = results.get_results_by_length(1000)
print(f"Results for length 1000: {len(results_1000.estimators)} estimators")

Advanced Usage

Parallel Processing

from lrdbenchmark import ComprehensiveBenchmark
import multiprocessing as mp

# Set number of processes
mp.set_start_method('spawn', force=True)

benchmark = ComprehensiveBenchmark()

# Run benchmark with parallel processing
results = benchmark.run_comprehensive_benchmark(
    data_length=1000,
    n_runs=20,
    n_jobs=4  # Use 4 parallel processes
)

Custom Data Models

from lrdbenchmark import ComprehensiveBenchmark, FBMModel, FGNModel

# Create custom data models
custom_models = {
    'fbm_high': FBMModel(H=0.8, sigma=1.0),
    'fbm_low': FBMModel(H=0.3, sigma=1.0),
    'fgn_medium': FGNModel(H=0.6, sigma=1.0)
}

benchmark = ComprehensiveBenchmark()

# Run benchmark with custom models
results = benchmark.run_comprehensive_benchmark(
    data_length=1000,
    custom_models=custom_models,
    n_runs=5
)

Custom Estimators

from lrdbenchmark import ComprehensiveBenchmark
from lrdbenchmark import DFAEstimator

# Create custom estimator
custom_dfa = DFAEstimator(
    min_scale=4,
    max_scale=100,
    num_scales=20,
    polynomial_order=2
)

custom_estimators = {
    'custom_dfa': custom_dfa
}

benchmark = ComprehensiveBenchmark()

# Run benchmark with custom estimator
results = benchmark.run_comprehensive_benchmark(
    data_length=1000,
    custom_estimators=custom_estimators,
    n_runs=5
)

Results Analysis

Statistical Analysis

from lrdbenchmark import ComprehensiveBenchmark
import pandas as pd

benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)

# Convert to pandas DataFrame for analysis
df = results.to_dataframe()

# Group by estimator and calculate statistics
stats = df.groupby('estimator')['estimated_H'].agg([
    'mean', 'std', 'min', 'max', 'count'
]).round(3)

print("Estimator Statistics:")
print(stats)

# Calculate bias for each estimator
true_H = df['true_H'].iloc[0]  # Assuming same true H for all
bias = df.groupby('estimator')['estimated_H'].mean() - true_H

print(f"\nBias (estimated - true H = {true_H}):")
print(bias.round(3))

Visualisation

from lrdbenchmark import ComprehensiveBenchmark
import matplotlib.pyplot as plt
import seaborn as sns

benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)

# Create box plot
df = results.to_dataframe()

plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='estimator', y='estimated_H')
plt.axhline(y=df['true_H'].iloc[0], color='red', linestyle='--', label='True H')
plt.title('Hurst Parameter Estimates by Estimator')
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()

# Create scatter plot
plt.figure(figsize=(10, 6))
for estimator in df['estimator'].unique():
    subset = df[df['estimator'] == estimator]
    plt.scatter(subset['true_H'], subset['estimated_H'],
               label=estimator, alpha=0.6)

plt.plot([0.3, 0.9], [0.3, 0.9], 'k--', label='Perfect Estimation')
plt.xlabel('True Hurst Parameter')
plt.ylabel('Estimated Hurst Parameter')
plt.title('True vs Estimated Hurst Parameters')
plt.legend()
plt.grid(True)
plt.show()

Performance Comparison

from lrdbenchmark import ComprehensiveBenchmark
import time

benchmark = ComprehensiveBenchmark()

# Measure execution time
estimators = ['dfa', 'rs', 'gph', 'wavelet_variance']
execution_times = {}

for estimator in estimators:
    start_time = time.time()
    results = benchmark.run_classical_benchmark(
        data_length=1000,
        estimators=[estimator],
        n_runs=5
    )
    execution_time = time.time() - start_time
    execution_times[estimator] = execution_time

print("Execution Times:")
for estimator, time_taken in execution_times.items():
    print(f"{estimator}: {time_taken:.2f} seconds")

Error Analysis

from lrdbenchmark import ComprehensiveBenchmark
import numpy as np

benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)

df = results.to_dataframe()

# Calculate errors
df['error'] = df['estimated_H'] - df['true_H']
df['abs_error'] = np.abs(df['error'])
df['squared_error'] = df['error']**2

# Error statistics by estimator
error_stats = df.groupby('estimator').agg({
    'error': ['mean', 'std'],
    'abs_error': 'mean',
    'squared_error': 'mean'
}).round(4)

error_stats.columns = ['Bias', 'Bias_Std', 'MAE', 'MSE']
print("Error Statistics:")
print(error_stats)

# Identify outliers
Q1 = df.groupby('estimator')['error'].quantile(0.25)
Q3 = df.groupby('estimator')['error'].quantile(0.75)
IQR = Q3 - Q1

outliers = df[
    (df['error'] < (Q1 - 1.5 * IQR).loc[df['estimator']]) |
    (df['error'] > (Q3 + 1.5 * IQR).loc[df['estimator']])
]

print(f"\nNumber of outliers: {len(outliers)}")

Confidence Intervals

from lrdbenchmark import ComprehensiveBenchmark
import scipy.stats as stats

benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=20)

df = results.to_dataframe()

# Calculate confidence intervals
confidence_level = 0.95
alpha = 1 - confidence_level

ci_results = {}
for estimator in df['estimator'].unique():
    subset = df[df['estimator'] == estimator]
    estimates = subset['estimated_H'].values

    # Bootstrap confidence interval
    n_bootstrap = 1000
    bootstrap_means = []

    for _ in range(n_bootstrap):
        bootstrap_sample = np.random.choice(estimates, size=len(estimates), replace=True)
        bootstrap_means.append(np.mean(bootstrap_sample))

    lower_ci = np.percentile(bootstrap_means, alpha/2 * 100)
    upper_ci = np.percentile(bootstrap_means, (1-alpha/2) * 100)

    ci_results[estimator] = {
        'mean': np.mean(estimates),
        'lower_ci': lower_ci,
        'upper_ci': upper_ci,
        'width': upper_ci - lower_ci
    }

print("Confidence Intervals (95%):")
for estimator, ci in ci_results.items():
    print(f"{estimator}: {ci['mean']:.3f} [{ci['lower_ci']:.3f}, {ci['upper_ci']:.3f}]")

Export and Reporting

Export Results

from lrdbenchmark import ComprehensiveBenchmark
import json
import pandas as pd

benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=5)

# Export to JSON
results.save_json('benchmark_results.json')

# Export to CSV
df = results.to_dataframe()
df.to_csv('benchmark_results.csv', index=False)

# Export to Excel
with pd.ExcelWriter('benchmark_results.xlsx') as writer:
    df.to_excel(writer, sheet_name='Results', index=False)

    # Create summary sheet
    summary = results.get_summary()
    summary_df = pd.DataFrame([summary])
    summary_df.to_excel(writer, sheet_name='Summary', index=False)

Generate Reports

from lrdbenchmark import ComprehensiveBenchmark

benchmark = ComprehensiveBenchmark()
results = benchmark.run_comprehensive_benchmark(data_length=1000, n_runs=10)

# Generate comprehensive report
report = results.generate_report(
    include_plots=True,
    include_statistics=True,
    include_recommendations=True
)

# Save report
with open('benchmark_report.html', 'w') as f:
    f.write(report)

# Print summary
print(results.get_summary())

Best Practices

Sample Size: Use at least 1000 data points for reliable estimates
Number of Runs: Use 10-20 runs for stable statistics
Multiple Estimators: Compare results from different estimator types
Data Models: Test on various synthetic data models
Error Handling: Always handle potential estimation failures
Performance Monitoring: Track execution times for large-scale benchmarks
Result Validation: Cross-validate results with known theoretical values

Note

The benchmark system automatically handles parallel processing, error recovery, and result aggregation. For large-scale benchmarks, consider using the parallel processing capabilities.

Warning

Some estimators may fail on certain data types or parameter combinations. The benchmark system will report failures but continue with other estimators.