Benchmark API ============= lrdbenchmark provides a comprehensive benchmarking framework for evaluating and comparing all **20** long-range dependence estimators (13 classical, 3 machine-learning, 4 neural), plus optional entropy-based estimators in the classical set. .. _comprehensive-benchmark-engine: Comprehensive benchmark engine ------------------------------ The primary entry point for publication-style runs (runtime profiles, stratified metrics, significance tests, optional JSON export) is :class:`~lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark`. .. autoclass:: lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark :members: :undoc-members: :show-inheritance: Public package import ~~~~~~~~~~~~~~~~~~~~~ ``from lrdbenchmark import ComprehensiveBenchmark`` resolves to the same class documented above. Multi-category sweep benchmark ------------------------------ For lighter-weight sweeps that delegate to the classical, ML, and NN benchmark runners (list-of-row results, separate from the engine’s summary dict), use: .. autoclass:: lrdbenchmark.benchmarks.MultiCategoryBenchmark :members: :undoc-members: :show-inheritance: Usage examples -------------- Basic run (returns a summary ``dict``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from lrdbenchmark import ComprehensiveBenchmark benchmark = ComprehensiveBenchmark(runtime_profile="quick") summary = benchmark.run_comprehensive_benchmark( data_length=256, benchmark_type="classical", save_results=False, ) print(summary["random_state"]) print(summary.get("stratified_metrics", {})) Classical-only and profiles ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from lrdbenchmark import ComprehensiveBenchmark # Quick profile: skips heavy diagnostics (see engine docstring) quick = ComprehensiveBenchmark(runtime_profile="quick") out_quick = quick.run_classical_benchmark(data_length=512, save_results=False) # Default engine profile is "auto" (defers to environment / heuristics) full = ComprehensiveBenchmark() out_full = full.run_comprehensive_benchmark( data_length=1000, benchmark_type="comprehensive", save_results=True, ) Inspecting per-model results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``run_comprehensive_benchmark`` returns a dictionary. Per–data-model outcomes live under ``summary["results"]`` (keys are model names; values contain ``estimator_results`` lists with success flags, estimates, and errors). .. code-block:: python summary = benchmark.run_comprehensive_benchmark( data_length=512, benchmark_type="classical", save_results=False, ) for model_name, block in summary["results"].items(): if block.get("error"): print(model_name, "failed:", block["error"]) continue n_ok = sum(1 for r in block["estimator_results"] if r.get("success")) print(f"{model_name}: {n_ok}/{len(block['estimator_results'])} estimators OK") Multi-category sweep (optional) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from lrdbenchmark.benchmarks import MultiCategoryBenchmark runner = MultiCategoryBenchmark(output_dir="sweep_results", seed=42) rows = runner.run( models=["fbm", "fgn"], lengths=[512], num_realizations=3, run_classical=True, run_ml=True, run_nn=False, ) Best practices -------------- 1. Use ``data_length`` ≥ 512 for stable wavelet and spectral estimates when comparing families. 2. Use ``runtime_profile="quick"`` in CI or smoke tests; use ``"full"`` or default ``"auto"`` for exhaustive diagnostics. 3. Set ``LRDBENCHMARK_AUTO_CPU=1`` before import to force CPU-only JAX/CUDA visibility when you need deterministic, GPU-free environments. 4. Handle failed estimator rows via the ``success`` flag on each result entry. .. note:: Earlier documentation referred to ``BenchmarkResult``, ``EstimatorResult``, and ``BenchmarkConfig`` helpers; the current engine returns structured ``dict`` summaries. Prefer the keys documented on :meth:`~lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark.run_comprehensive_benchmark`.