Benchmark API
=============

lrdbenchmark provides a comprehensive benchmarking framework for evaluating and
comparing all **20** long-range dependence estimators (13 classical, 3
machine-learning, 4 neural), plus optional entropy-based estimators in the
classical set.

.. _comprehensive-benchmark-engine:

Comprehensive benchmark engine
------------------------------

The primary entry point for publication-style runs (runtime profiles,
stratified metrics, significance tests, optional JSON export) is
:class:`~lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark`.

.. autoclass:: lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark
   :members:
   :undoc-members:
   :show-inheritance:

Public package import
~~~~~~~~~~~~~~~~~~~~~

``from lrdbenchmark import ComprehensiveBenchmark`` resolves to the same class
documented above.

Multi-category sweep benchmark
------------------------------

For lighter-weight sweeps that delegate to the classical, ML, and NN benchmark
runners (list-of-row results, separate from the engine’s summary dict), use:

.. autoclass:: lrdbenchmark.benchmarks.MultiCategoryBenchmark
   :members:
   :undoc-members:
   :show-inheritance:

Usage examples
--------------

Basic run (returns a summary ``dict``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark

   benchmark = ComprehensiveBenchmark(runtime_profile="quick")
   summary = benchmark.run_comprehensive_benchmark(
       data_length=256,
       benchmark_type="classical",
       save_results=False,
   )

   print(summary["random_state"])
   print(summary.get("stratified_metrics", {}))

Classical-only and profiles
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark import ComprehensiveBenchmark

   # Quick profile: skips heavy diagnostics (see engine docstring)
   quick = ComprehensiveBenchmark(runtime_profile="quick")
   out_quick = quick.run_classical_benchmark(data_length=512, save_results=False)

   # Default engine profile is "auto" (defers to environment / heuristics)
   full = ComprehensiveBenchmark()
   out_full = full.run_comprehensive_benchmark(
       data_length=1000,
       benchmark_type="comprehensive",
       save_results=True,
   )

Inspecting per-model results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``run_comprehensive_benchmark`` returns a dictionary. Per–data-model outcomes
live under ``summary["results"]`` (keys are model names; values contain
``estimator_results`` lists with success flags, estimates, and errors).

.. code-block:: python

   summary = benchmark.run_comprehensive_benchmark(
       data_length=512,
       benchmark_type="classical",
       save_results=False,
   )
   for model_name, block in summary["results"].items():
       if block.get("error"):
           print(model_name, "failed:", block["error"])
           continue
       n_ok = sum(1 for r in block["estimator_results"] if r.get("success"))
       print(f"{model_name}: {n_ok}/{len(block['estimator_results'])} estimators OK")

Multi-category sweep (optional)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from lrdbenchmark.benchmarks import MultiCategoryBenchmark

   runner = MultiCategoryBenchmark(output_dir="sweep_results", seed=42)
   rows = runner.run(
       models=["fbm", "fgn"],
       lengths=[512],
       num_realizations=3,
       run_classical=True,
       run_ml=True,
       run_nn=False,
   )

Best practices
--------------

1. Use ``data_length`` ≥ 512 for stable wavelet and spectral estimates when
   comparing families.
2. Use ``runtime_profile="quick"`` in CI or smoke tests; use ``"full"`` or
   default ``"auto"`` for exhaustive diagnostics.
3. Set ``LRDBENCHMARK_AUTO_CPU=1`` before import to force CPU-only JAX/CUDA
   visibility when you need deterministic, GPU-free environments.
4. Handle failed estimator rows via the ``success`` flag on each result entry.

.. note::

   Earlier documentation referred to ``BenchmarkResult``, ``EstimatorResult``,
   and ``BenchmarkConfig`` helpers; the current engine returns structured
   ``dict`` summaries. Prefer the keys documented on
   :meth:`~lrdbenchmark.analysis.benchmark.ComprehensiveBenchmark.run_comprehensive_benchmark`.