Quick Start Guide
================

This guide will get you up and running with lrdbenchmark in minutes.

**📓 For comprehensive examples, see the 5 demonstration notebooks in the `notebooks/` directory:**
- **Data Generation & Visualization**: All stochastic models with comprehensive plots
- **Estimation & Validation**: All estimator categories with statistical validation  
- **Custom Models & Estimators**: Library extensibility and custom implementations
- **Comprehensive Benchmarking**: Full benchmarking system with contamination testing
- **Leaderboard Generation**: Performance rankings and comparative analysis

Basic Usage
-----------

Generate synthetic data and run a benchmark:

.. code-block:: python

   import numpy as np
   from lrdbenchmark import FBMModel, RSEstimator
   
   # Generate Fractional Brownian Motion data
   model = FBMModel(H=0.7, sigma=1.0)
   data = model.generate(length=1000, seed=42)
   
   # Estimate Hurst parameter using R/S analysis
   rs_estimator = RSEstimator()
   result = rs_estimator.estimate(data)
   hurst_estimate = result["hurst_parameter"]
   
   print(f"True Hurst: 0.7, Estimated: {hurst_estimate:.3f}")

Neural Network Usage
--------------------

lrdbenchmark provides a comprehensive neural network factory with 4 architectures that achieve excellent speed-accuracy trade-offs:

.. code-block:: python

   from lrdbenchmark import NeuralNetworkFactory
   from lrdbenchmark.analysis.machine_learning.neural_network_factory import NNArchitecture, NNConfig, create_all_benchmark_networks
   import numpy as np

   # Create neural network factory
   factory = NeuralNetworkFactory()

   # Create a specific network
   config = NNConfig(
       architecture=NNArchitecture.TRANSFORMER,
       input_length=500,
       hidden_dims=[64, 32],
       learning_rate=0.001,
       epochs=50
   )
   network = factory.create_network(config)

   # Generate training data
   X_train = np.random.randn(100, 500)  # 100 samples of length 500
   y_train = np.random.uniform(0.2, 0.8, 100)  # True Hurst parameters

   # Train the network (train-once, apply-many workflow)
   history = network.train_model(X_train, y_train)

   # Make predictions on new data
   new_data = np.random.randn(1, 500)
   prediction = network.predict(new_data)

   print(f"Neural Network Prediction: {prediction[0]:.3f}")

   # Create all benchmark networks
   all_networks = create_all_benchmark_networks(input_length=500)
   for name, network in all_networks.items():
       print(f"Created {name} network")

Machine Learning Usage
----------------------

lrdbenchmark provides production-ready machine learning estimators:

.. code-block:: python

   from lrdbenchmark import SVREstimator, GradientBoostingEstimator, RandomForestEstimator
   import numpy as np

   # Generate training data
   X_train = np.random.randn(100, 500)  # 100 samples of length 500
   y_train = np.random.uniform(0.2, 0.8, 100)  # True Hurst parameters

   # Train ML models
   svr = SVREstimator(kernel='rbf', C=1.0)
   svr.train(X_train, y_train)

   gb = GradientBoostingEstimator(n_estimators=50, learning_rate=0.1)
   gb.train(X_train, y_train)

   rf = RandomForestEstimator(n_estimators=50, max_depth=5)
   rf.train(X_train, y_train)

   # Make predictions on new data
   new_data = np.random.randn(1, 500)
   svr_pred = svr.predict(new_data)
   gb_pred = gb.predict(new_data)
   rf_pred = rf.predict(new_data)

   print(f"SVR: {svr_pred:.3f}, Gradient Boosting: {gb_pred:.3f}, Random Forest: {rf_pred:.3f}")

Advanced Neural Network Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For production deployment with neural networks, use the Neural Network Factory:

.. code-block:: python

   from lrdbenchmark import NeuralNetworkFactory, FBMModel
   from lrdbenchmark.analysis.machine_learning.neural_network_factory import NNConfig, NNArchitecture
   import numpy as np

   # Create factory
   factory = NeuralNetworkFactory()

   # Configure network
   config = NNConfig(
       architecture=NNArchitecture.CNN,
       input_length=500,
       hidden_dims=[64, 32],
       learning_rate=0.001,
       epochs=20
   )

   # Create and train network
   network = factory.create_network(config)
   X_train = np.random.randn(100, 500)
   y_train = np.random.uniform(0.2, 0.8, 100)
   history = network.train_model(X_train, y_train)

   # Make prediction
   new_data = np.random.randn(1, 500)
   prediction = network.predict(new_data)

   print(f"CNN Prediction: {prediction[0]:.3f}")

Data Models
-----------

lrdbenchmark provides several synthetic data models:

.. code-block:: python

   from lrdbenchmark import FBMModel
   from lrdbenchmark import FGNModel, ARFIMAModel, MRWModel
   
   # Fractional Brownian Motion
   fbm = FBMModel(H=0.7, sigma=1.0)
   fbm_data = fbm.generate(1000)
   
   # Fractional Gaussian Noise
   fgn = FGNModel(H=0.6, sigma=1.0)
   fgn_data = fgn.generate(1000)
   
   # ARFIMA process
   arfima = ARFIMAModel(d=0.3, sigma=1.0)
   arfima_data = arfima.generate(1000)
   
   # Multifractal Random Walk
   mrw = MRWModel(H=0.7, lambda_param=0.1, sigma=1.0)
   mrw_data = mrw.generate(1000)

Individual Estimators
---------------------

Use specific estimators directly:

.. code-block:: python

   from lrdbenchmark import DFAEstimator, GPHEstimator
   
   # Detrended Fluctuation Analysis
   dfa = DFAEstimator()
   dfa_result = dfa.estimate(data)
   H_dfa = dfa_result["hurst_parameter"]
   
   # Geweke-Porter-Hudak estimator
   gph = GPHEstimator()
   gph_result = gph.estimate(data)
   H_gph = gph_result["hurst_parameter"]
   
   print(f"DFA H estimate: {H_dfa:.3f}")
   print(f"GPH H estimate: {H_gph:.3f}")

Analytics System
----------------

Track usage and performance:

.. code-block:: python

   from lrdbenchmark import FBMModel, RSEstimator
   
   # Generate data and run analysis
   model = FBMModel(H=0.7)
   data = model.generate(1000)
   
   # Estimate Hurst parameter
   rs_estimator = RSEstimator()
   result = rs_estimator.estimate(data)
   hurst_estimate = result["hurst_parameter"]
   
   print(f"Hurst estimate: {hurst_estimate:.3f}")

Enhanced ML and Neural Network Estimators
-----------------------------------------

Use the new enhanced estimators with pre-trained models:

.. code-block:: python

   from lrdbenchmark import (
       CNNEstimator, LSTMEstimator, GRUEstimator, TransformerEstimator,
       RandomForestEstimator, SVREstimator, GradientBoostingEstimator
   )
   
   # Enhanced CNN with residual connections and attention
   cnn = CNNEstimator()
   cnn_result = cnn.estimate(data)
   H_cnn = cnn_result["hurst_parameter"]
   
   # Enhanced LSTM with bidirectional architecture
   lstm = LSTMEstimator()
   lstm_result = lstm.estimate(data)
   H_lstm = lstm_result["hurst_parameter"]
   
   # Enhanced GRU with attention mechanisms
   gru = GRUEstimator()
   gru_result = gru.estimate(data)
   H_gru = gru_result["hurst_parameter"]
   
   # Enhanced Transformer with self-attention
   transformer = TransformerEstimator()
   transformer_result = transformer.estimate(data)
   H_transformer = transformer_result["hurst_parameter"]
   
   # Traditional ML estimators
   rf = RandomForestEstimator()
   rf_result = rf.estimate(data)
   H_rf = rf_result["hurst_parameter"]
   
   svr = SVREstimator()
   svr_result = svr.estimate(data)
   H_svr = svr_result["hurst_parameter"]
   
   gb = GradientBoostingEstimator()
   gb_result = gb.estimate(data)
   H_gb = gb_result["hurst_parameter"]
   
   print(f"CNN H estimate: {H_cnn:.3f}")
   print(f"LSTM H estimate: {H_lstm:.3f}")
   print(f"GRU H estimate: {H_gru:.3f}")
   print(f"Transformer H estimate: {H_transformer:.3f}")

Advanced Usage
--------------

Custom benchmark configuration:

.. code-block:: python

   from lrdbenchmark import FBMModel, RSEstimator, DFAEstimator
   
   # Generate data
   model = FBMModel(H=0.7)
   data = model.generate(2000)
   
   # Test multiple estimators
   rs_estimator = RSEstimator()
   dfa_estimator = DFAEstimator()
   
   rs_result = rs_estimator.estimate(data)
   dfa_result = dfa_estimator.estimate(data)
   
   print(f"R/S estimate: {rs_result['hurst_parameter']:.3f}")
   print(f"DFA estimate: {dfa_result['hurst_parameter']:.3f}")

Integration note: HPFracc
-------------------------

An updated HPFracc API is available; see `documentation_summaries/PROJECT_CLEANUP_SUMMARY.md` for the current reference and adapt example code accordingly. The integration is optional and not required for core lrdbenchmark usage.

Visualization
-------------

Plot results and data:

.. code-block:: python

   import matplotlib.pyplot as plt
   from lrdbenchmark import FBMModel
   
   # Generate data with different H values
   H_values = [0.3, 0.5, 0.7, 0.9]
   datasets = {}
   
   for H in H_values:
       model = FBMModel(H=H, sigma=1.0)
       datasets[f'H={H}'] = model.generate(1000)
   
   # Plot
   plt.figure(figsize=(12, 8))
   for name, data in datasets.items():
       plt.plot(data[:200], label=name, alpha=0.7)
   
   plt.title('Fractional Brownian Motion with Different H Values')
   plt.xlabel('Time')
   plt.ylabel('Value')
   plt.legend()
   plt.grid(True)
   plt.show()

Performance Tips
----------------

1. **Use GPU acceleration** when available
2. **Batch processing** for large datasets
3. **Enable analytics** for monitoring
4. **Use appropriate data lengths** (1000+ samples recommended)

Nonstationarity Testing
-----------------------

Test estimator robustness under nonstationarity conditions:

.. code-block:: python

   from lrdbenchmark.generation import (
       RegimeSwitchingProcess,
       ContinuousDriftProcess,
       StructuralBreakProcess
   )
   
   # Regime switching: H jumps from 0.3 to 0.8 at midpoint
   gen = RegimeSwitchingProcess(h_regimes=[0.3, 0.8], change_points=[0.5])
   result = gen.generate(1000)
   signal = result['signal']
   h_trajectory = result['h_trajectory']  # True H at each timepoint
   
   # Continuous linear drift from H=0.3 to H=0.8
   gen = ContinuousDriftProcess(h_start=0.3, h_end=0.8, drift_type='linear')
   result = gen.generate(1000)
   
   # Structural break with level shift
   gen = StructuralBreakProcess(h_before=0.7, h_after=0.4, break_severity=0.3)
   result = gen.generate(1000)

Critical Regime Models
----------------------

Test estimators in physics-motivated critical regimes:

.. code-block:: python

   from lrdbenchmark.generation import (
       OrnsteinUhlenbeckProcess,
       FractionalLevyMotion,
       SOCAvalancheModel
   )
   
   # OU with time-varying friction (transient criticality)
   gen = OrnsteinUhlenbeckProcess(theta_start=0.1, theta_end=1.0)
   result = gen.generate(1000)
   
   # Heavy-tailed fractional Lévy motion (α<2 stable)
   gen = FractionalLevyMotion(H=0.7, alpha=1.5)
   result = gen.generate(1000)
   
   # Self-organized criticality avalanche model
   gen = SOCAvalancheModel(grid_size=32)
   result = gen.generate(500)

Structural Break Detection
--------------------------

Detect stationarity violations before running classical estimators:

.. code-block:: python

   from lrdbenchmark.analysis.diagnostics import StructuralBreakDetector
   
   detector = StructuralBreakDetector(significance_level=0.05)
   result = detector.detect_all(data)
   
   if result['any_break_detected']:
       print("⚠️ Warning: Stationarity violated!")
       print(result['warnings'])
   else:
       print("Data appears stationary; proceed with classical estimation")

Surrogate Data Testing
----------------------

Generate surrogates for hypothesis testing:

.. code-block:: python

   from lrdbenchmark.generation import IAFFTSurrogate, PhaseRandomizedSurrogate
   
   # IAAFT: preserve spectrum AND amplitude distribution
   gen = IAFFTSurrogate()
   result = gen.generate(original_data, n_surrogates=100)
   surrogates = result['surrogates']
   
   # Phase randomization: preserve spectrum only
   gen = PhaseRandomizedSurrogate()
   result = gen.generate(original_data, n_surrogates=100)

Running Failure Benchmarks
--------------------------

Systematically test classical estimators under nonstationarity:

.. code-block:: bash

   # Quick screening (~5 min)
   python scripts/benchmarks/run_classical_failure_benchmark.py --profile quick
   
   # Standard analysis (~1 hour)
   python scripts/benchmarks/run_classical_failure_benchmark.py --profile standard
   
   # Full publication run (~8-10 hours)
   python scripts/benchmarks/run_classical_failure_benchmark.py --profile full

Next Steps
----------


* :doc:`notebooks/notebooks_overview` - **Start here**: Comprehensive demonstration notebooks
* :doc:`installation` - Detailed installation guide
* :doc:`api/data_models` - Learn about data models
* :doc:`api/estimators` - Explore available estimators
* :doc:`examples/comprehensive_demo` - More examples and use cases

**Recommended Learning Path**:

1. **Follow the tutorials**: Begin with :doc:`tutorials/tutorial_01_synthetic_data` (or open `notebooks/markdown/01_data_generation_and_visualisation.md`)
2. **Explore API**: Use the quickstart examples above
3. **Advanced Usage**: Try the comprehensive benchmarking examples
4. **Custom Development**: Learn extensibility from the custom models notebook