Benchmark API
lrdbenchmark provides a comprehensive benchmarking framework for evaluating and comparing all 20 long-range dependence estimators (13 classical, 3 machine-learning, 4 neural), plus optional entropy-based estimators in the classical set.
Comprehensive benchmark engine
The primary entry point for publication-style runs (runtime profiles,
stratified metrics, significance tests, optional JSON export) is
ComprehensiveBenchmark.
- class lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark(output_dir: str | None = None, runtime_profile: str = 'auto')[source]
Bases:
objectComprehensive benchmark class for testing all estimators and data models.
- __init__(output_dir: str | None = None, runtime_profile: str = 'auto')[source]
Initialize the benchmark system.
- Parameters:
output_dir (str, optional) – Directory to save benchmark results
runtime_profile (str, optional) – Runtime profile to control computational intensity. Options: - “auto”: determine automatically (default) - “quick”: minimise expensive diagnostics (useful for tests) - “full”: enable all diagnostics and resampling routines
- _resolve_runtime_profile(runtime_profile: str | None) str[source]
Determine the runtime profile controlling benchmark intensity.
- _load_protocol_config(path: Path) Dict[str, Any][source]
Load benchmark protocol configuration from YAML/JSON file.
- _deep_merge_dicts(base: Dict[str, Any], updates: Dict[str, Any]) Dict[str, Any][source]
Recursively merge dictionaries without mutating inputs.
- _initialize_all_estimators() Dict[str, Dict[str, Any]][source]
Initialize all available estimators organized by category.
- _apply_estimator_overrides(estimators: Dict[str, Dict[str, Any]], overrides: Dict[str, Dict[str, Any]]) Dict[str, Dict[str, Any]][source]
Apply protocol-defined parameter overrides to initialized estimators.
- _initialize_contamination_models() Dict[str, Any][source]
Initialize all available contamination models.
- get_estimators_by_type(benchmark_type: str = 'comprehensive', data_length: int = 1000) Dict[str, Any][source]
Get estimators based on the specified benchmark type.
- Parameters:
benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
data_length (int) – Length of data to be tested (used for adaptive wavelet estimators)
- Returns:
Dictionary of estimators for the specified type
- Return type:
- generate_test_data(model_name: str, data_length: int = 1000, **kwargs) Tuple[ndarray, Dict[str, Any]][source]
Generate test data using specified model.
- apply_contamination(data: ndarray, contamination_type: str, contamination_level: float = 0.1, **kwargs) Tuple[ndarray, Dict[str, Any]][source]
Apply contamination to the data.
- Parameters:
- Returns:
(contaminated_data, contamination_info)
- Return type:
- run_single_estimator_test(estimator_name: str, data: ndarray, true_params: Dict[str, Any]) Dict[str, Any][source]
Run a single estimator test.
- _calculate_monte_carlo_mse(estimator, data: ndarray, true_value: float, n_simulations: int = 50) Dict[str, Any][source]
Calculate mean signed error using Monte Carlo simulations.
- Parameters:
estimator (BaseEstimator) – Estimator instance
data (np.ndarray) – Original dataset
true_value (float) – True parameter value
n_simulations (int) – Number of Monte Carlo simulations
- Returns:
Mean signed error analysis results
- Return type:
- _compute_significance_tests(results: Dict[str, Any], alpha: float = 0.05) Dict[str, Any][source]
Compute omnibus and post-hoc significance tests across estimators.
- _compute_stratified_metrics(results: Dict[str, Any], data_length: int, contamination_type: str | None, contamination_level: float) Dict[str, Any][source]
Produce stratified summaries across H bands, tail classes, data length, and contamination regime.
- _categorise_hurst_band(hurst_value: float | None) str[source]
Assign H estimates to qualitative persistence bands.
- _categorise_length_band(data_length: int | None) str[source]
Bucket data length into interpretable regimes.
- _extract_scale_data(result: Dict[str, Any], estimator: Any) Tuple[ndarray | None, ndarray | None][source]
Extract scale and statistics data from estimator result for diagnostics.
- Parameters:
result (dict) – Estimator result dictionary
estimator (BaseEstimator) – Estimator instance
- Returns:
(scales, statistics) arrays or (None, None) if unavailable
- Return type:
- _infer_estimator_family(estimator_name: str) str[source]
Infer the family (classical, ML, neural) from estimator name.
- _infer_tail_class(model_name: str | None, data_params: Dict[str, Any] | None = None) str[source]
Infer a qualitative tail/heaviness class based on the data model.
- _build_provenance_bundle(summary: Dict[str, Any]) Dict[str, Any][source]
Construct a comprehensive provenance bundle using ProvenanceTracker.
This bundle includes all settings needed to reproduce the experiment: - Data generation parameters - Estimator configuration - Preprocessing settings - Scale selection parameters - Analytics configuration - Environment information
- _attach_uncertainty_calibration_summary(summary: Dict[str, Any], lookback_days: int = 90) None[source]
Augment benchmark summaries with uncertainty calibration diagnostics.
- _build_result_row_provenance(result: Dict[str, Any], data_params: Dict[str, Any]) Dict[str, Any][source]
Build provenance bundle for a single result row.
This creates a lightweight provenance artifact per result that includes: - Experiment-level provenance (reference) - Row-specific parameters (data model, estimator, etc.) - Result metadata
- _record_uncertainty_event(estimator_name: str, data_model: str | None, uncertainty: Any, estimate: float | None, true_value: float | None, data_length: int, estimator_family: str | None) None[source]
Persist uncertainty calibration data via the error analyzer.
- run_comprehensive_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run comprehensive benchmark across all estimators and data models.
- Parameters:
data_length (int) – Length of test data to generate
benchmark_type (str) – Type of benchmark to run: - ‘comprehensive’: All estimators (default) - ‘classical’: Only classical statistical estimators - ‘ML’: Only machine learning estimators (non-neural) - ‘neural’: Only neural network estimators
contamination_type (str, optional) – Type of contamination to apply to the data
contamination_level (float) – Level/intensity of contamination (0.0 to 1.0)
save_results (bool) – Whether to save results to file
- Returns:
Comprehensive benchmark results
- Return type:
- run_classical_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only classical statistical estimators.
- run_ml_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only machine learning estimators (non-neural).
- run_neural_benchmark(data_length: int = 1000, contamination_type: str | None = None, contamination_level: float = 0.1, save_results: bool = True) Dict[str, Any][source]
Run benchmark with only neural network estimators.
- run_classical_estimators(data_models: list | None = None, n_samples: int = 1000, n_trials: int = 10, save_results: bool = True) Dict[str, Any][source]
Backward-compatible alias for run_classical_benchmark.
This method maintains the old API for compatibility with existing code.
- run_advanced_metrics_benchmark(data_length: int = 1000, benchmark_type: str = 'comprehensive', n_monte_carlo: int = 100, convergence_threshold: float = 1e-06, save_results: bool = True) Dict[str, Any][source]
Run advanced metrics benchmark focusing on convergence and bias analysis.
- Parameters:
- Returns:
Advanced metrics benchmark results
- Return type:
Public package import
from lrdbenchmark import ComprehensiveBenchmark resolves to the same class
documented above.
Multi-category sweep benchmark
For lighter-weight sweeps that delegate to the classical, ML, and NN benchmark runners (list-of-row results, separate from the engine’s summary dict), use:
- class lrdbenchmark.benchmarks.MultiCategoryBenchmark(output_dir: str | None = None, seed: int | None = None)[source]
Bases:
BaseBenchmarkRun classical, ML, and NN sweep benchmarks behind one entry point.
This coordinates
ClassicalBenchmark,MLBenchmark, andNNBenchmark. For the full diagnostic engine (runtime profiles, stratified metrics, significance tests), useComprehensiveBenchmark.
Usage examples
Basic run (returns a summary dict)
from lrdbenchmark import ComprehensiveBenchmark
benchmark = ComprehensiveBenchmark(runtime_profile="quick")
summary = benchmark.run_comprehensive_benchmark(
data_length=256,
benchmark_type="classical",
save_results=False,
)
print(summary["random_state"])
print(summary.get("stratified_metrics", {}))
Classical-only and profiles
from lrdbenchmark import ComprehensiveBenchmark
# Quick profile: skips heavy diagnostics (see engine docstring)
quick = ComprehensiveBenchmark(runtime_profile="quick")
out_quick = quick.run_classical_benchmark(data_length=512, save_results=False)
# Default engine profile is "auto" (defers to environment / heuristics)
full = ComprehensiveBenchmark()
out_full = full.run_comprehensive_benchmark(
data_length=1000,
benchmark_type="comprehensive",
save_results=True,
)
Inspecting per-model results
run_comprehensive_benchmark returns a dictionary. Per–data-model outcomes
live under summary["results"] (keys are model names; values contain
estimator_results lists with success flags, estimates, and errors).
summary = benchmark.run_comprehensive_benchmark(
data_length=512,
benchmark_type="classical",
save_results=False,
)
for model_name, block in summary["results"].items():
if block.get("error"):
print(model_name, "failed:", block["error"])
continue
n_ok = sum(1 for r in block["estimator_results"] if r.get("success"))
print(f"{model_name}: {n_ok}/{len(block['estimator_results'])} estimators OK")
Multi-category sweep (optional)
from lrdbenchmark.benchmarks import MultiCategoryBenchmark
runner = MultiCategoryBenchmark(output_dir="sweep_results", seed=42)
rows = runner.run(
models=["fbm", "fgn"],
lengths=[512],
num_realizations=3,
run_classical=True,
run_ml=True,
run_nn=False,
)
Best practices
Use
data_length≥ 512 for stable wavelet and spectral estimates when comparing families.Use
runtime_profile="quick"in CI or smoke tests; use"full"or default"auto"for exhaustive diagnostics.Set
LRDBENCHMARK_AUTO_CPU=1before import to force CPU-only JAX/CUDA visibility when you need deterministic, GPU-free environments.Handle failed estimator rows via the
successflag on each result entry.
Note
Earlier documentation referred to BenchmarkResult, EstimatorResult,
and BenchmarkConfig helpers; the current engine returns structured
dict summaries. Prefer the keys documented on
run_comprehensive_benchmark().