Quick Start Guide
================

This guide will get you up and running with lrdbenchmark in minutes.

**📓 For comprehensive examples, see the 5 demonstration notebooks under ``notebooks/markdown/``:**
- **Data Generation & Visualization**: All stochastic models with comprehensive plots
- **Estimation & Validation**: All estimator categories with statistical validation  
- **Custom Models & Estimators**: Library extensibility and custom implementations
- **Comprehensive Benchmarking**: Full benchmarking system with contamination testing
- **Leaderboard Generation**: Performance rankings and comparative analysis

Basic Usage
-----------

Generate synthetic data and run a benchmark:

.. code-block:: python

   import numpy as np
   from lrdbenchmark import FBMModel, RSEstimator
   
   # Generate Fractional Brownian Motion data
   model = FBMModel(H=0.7, sigma=1.0)
   data = model.generate(length=1000, seed=42)
   
   # Estimate Hurst parameter using R/S analysis
   rs_estimator = RSEstimator()
   result = rs_estimator.estimate(data)
   hurst_estimate = result["hurst_parameter"]
   
   print(f"True Hurst: 0.7, Estimated: {hurst_estimate:.3f}")

Neural Network Usage
--------------------

lrdbenchmark provides a comprehensive neural network factory with 4 architectures that achieve excellent speed-accuracy trade-offs:

.. code-block:: python

   from lrdbenchmark import NeuralNetworkFactory
   from lrdbenchmark.analysis.machine_learning.neural_network_factory import NNArchitecture, NNConfig, create_all_benchmark_networks
   import numpy as np

   # Create neural network factory
   factory = NeuralNetworkFactory()

   # Create a specific network
   config = NNConfig(
       architecture=NNArchitecture.TRANSFORMER,
       input_length=500,
       hidden_dims=[64, 32],
       learning_rate=0.001,
       epochs=50
   )
   network = factory.create_network(config)

   # Generate training data
   X_train = np.random.randn(100, 500)  # 100 samples of length 500
   y_train = np.random.uniform(0.2, 0.8, 100)  # True Hurst parameters

   # Train the network (train-once, apply-many workflow)
   history = network.train_model(X_train, y_train)

   # Make predictions on new data
   new_data = np.random.randn(1, 500)
   prediction = network.predict(new_data)

   print(f"Neural Network Prediction: {prediction[0]:.3f}")

   # Create all benchmark networks
   all_networks = create_all_benchmark_networks(input_length=500)
   for name, network in all_networks.items():
       print(f"Created {name} network")

Machine Learning Usage
----------------------

lrdbenchmark provides production-ready machine learning estimators:

.. code-block:: python

   from lrdbenchmark import SVREstimator, GradientBoostingEstimator, RandomForestEstimator
   import numpy as np

   # Generate training data
   X_train = np.random.randn(100, 500)  # 100 samples of length 500
   y_train = np.random.uniform(0.2, 0.8, 100)  # True Hurst parameters

   # Train ML models
   svr = SVREstimator(kernel='rbf', C=1.0)
   svr.train(X_train, y_train)

   gb = GradientBoostingEstimator(n_estimators=50, learning_rate=0.1)
   gb.train(X_train, y_train)

   rf = RandomForestEstimator(n_estimators=50, max_depth=5)
   rf.train(X_train, y_train)

   # Make predictions on new data
   new_data = np.random.randn(1, 500)
   svr_pred = svr.predict(new_data)
   gb_pred = gb.predict(new_data)
   rf_pred = rf.predict(new_data)

   print(f"SVR: {svr_pred:.3f}, Gradient Boosting: {gb_pred:.3f}, Random Forest: {rf_pred:.3f}")

Advanced Neural Network Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For production deployment with neural networks, use the Neural Network Factory:

.. code-block:: python

   from lrdbenchmark import NeuralNetworkFactory, FBMModel
   from lrdbenchmark.analysis.machine_learning.neural_network_factory import NNConfig, NNArchitecture
   import numpy as np

   # Create factory
   factory = NeuralNetworkFactory()

   # Configure network
   config = NNConfig(
       architecture=NNArchitecture.CNN,
       input_length=500,
       hidden_dims=[64, 32],
       learning_rate=0.001,
       epochs=20
   )

   # Create and train network
   network = factory.create_network(config)
   X_train = np.random.randn(100, 500)
   y_train = np.random.uniform(0.2, 0.8, 100)
   history = network.train_model(X_train, y_train)

   # Make prediction
   new_data = np.random.randn(1, 500)
   prediction = network.predict(new_data)

   print(f"CNN Prediction: {prediction[0]:.3f}")

Data Models
-----------

lrdbenchmark provides several synthetic data models:

.. code-block:: python

   from lrdbenchmark import FBMModel
   from lrdbenchmark import FGNModel, ARFIMAModel, MRWModel
   
   # Fractional Brownian Motion
   fbm = FBMModel(H=0.7, sigma=1.0)
   fbm_data = fbm.generate(1000)
   
   # Fractional Gaussian Noise
   fgn = FGNModel(H=0.6, sigma=1.0)
   fgn_data = fgn.generate(1000)
   
   # ARFIMA process
   arfima = ARFIMAModel(d=0.3, sigma=1.0)
   arfima_data = arfima.generate(1000)
   
   # Multifractal Random Walk
   mrw = MRWModel(H=0.7, lambda_param=0.1, sigma=1.0)
   mrw_data = mrw.generate(1000)

Individual Estimators
---------------------

Use specific estimators directly:

.. code-block:: python

   from lrdbenchmark import DFAEstimator, GPHEstimator
   
   # Detrended Fluctuation Analysis
   dfa = DFAEstimator()
   dfa_result = dfa.estimate(data)
   H_dfa = dfa_result["hurst_parameter"]
   
   # Geweke-Porter-Hudak estimator
   gph = GPHEstimator()
   gph_result = gph.estimate(data)
   H_gph = gph_result["hurst_parameter"]
   
   print(f"DFA H estimate: {H_dfa:.3f}")
   print(f"GPH H estimate: {H_gph:.3f}")

Analytics System
----------------

Track usage and performance:

.. code-block:: python

   from lrdbenchmark import FBMModel, RSEstimator
   
   # Generate data and run analysis
   model = FBMModel(H=0.7)
   data = model.generate(1000)
   
   # Estimate Hurst parameter
   rs_estimator = RSEstimator()
   result = rs_estimator.estimate(data)
   hurst_estimate = result["hurst_parameter"]
   
   print(f"Hurst estimate: {hurst_estimate:.3f}")

Enhanced ML and Neural Network Estimators
-----------------------------------------

Use the new enhanced estimators with pre-trained models:

.. code-block:: python

   from lrdbenchmark import (
       CNNEstimator, LSTMEstimator, GRUEstimator, TransformerEstimator,
       RandomForestEstimator, SVREstimator, GradientBoostingEstimator
   )
   
   # Enhanced CNN with residual connections and attention
   cnn = CNNEstimator()
   cnn_result = cnn.estimate(data)
   H_cnn = cnn_result["hurst_parameter"]
   
   # Enhanced LSTM with bidirectional architecture
   lstm = LSTMEstimator()
   lstm_result = lstm.estimate(data)
   H_lstm = lstm_result["hurst_parameter"]
   
   # Enhanced GRU with attention mechanisms
   gru = GRUEstimator()
   gru_result = gru.estimate(data)
   H_gru = gru_result["hurst_parameter"]
   
   # Enhanced Transformer with self-attention
   transformer = TransformerEstimator()
   transformer_result = transformer.estimate(data)
   H_transformer = transformer_result["hurst_parameter"]
   
   # Traditional ML estimators
   rf = RandomForestEstimator()
   rf_result = rf.estimate(data)
   H_rf = rf_result["hurst_parameter"]
   
   svr = SVREstimator()
   svr_result = svr.estimate(data)
   H_svr = svr_result["hurst_parameter"]
   
   gb = GradientBoostingEstimator()
   gb_result = gb.estimate(data)
   H_gb = gb_result["hurst_parameter"]
   
   print(f"CNN H estimate: {H_cnn:.3f}")
   print(f"LSTM H estimate: {H_lstm:.3f}")
   print(f"GRU H estimate: {H_gru:.3f}")
   print(f"Transformer H estimate: {H_transformer:.3f}")

Advanced Usage
--------------

Custom benchmark configuration:

.. code-block:: python

   from lrdbenchmark import FBMModel, RSEstimator, DFAEstimator
   
   # Generate data
   model = FBMModel(H=0.7)
   data = model.generate(2000)
   
   # Test multiple estimators
   rs_estimator = RSEstimator()
   dfa_estimator = DFAEstimator()
   
   rs_result = rs_estimator.estimate(data)
   dfa_result = dfa_estimator.estimate(data)
   
   print(f"R/S estimate: {rs_result['hurst_parameter']:.3f}")
   print(f"DFA estimate: {dfa_result['hurst_parameter']:.3f}")

Integration note: HPFracc
-------------------------

Optional HPFracc integration is not required for core lrdbenchmark usage. If you rely on HPFracc, pin versions against its upstream documentation and adapt any legacy glue code to the current estimator APIs in this package.

Visualization
-------------

Plot results and data:

.. code-block:: python

   import matplotlib.pyplot as plt
   from lrdbenchmark import FBMModel
   
   # Generate data with different H values
   H_values = [0.3, 0.5, 0.7, 0.9]
   datasets = {}
   
   for H in H_values:
       model = FBMModel(H=H, sigma=1.0)
       datasets[f'H={H}'] = model.generate(1000)
   
   # Plot
   plt.figure(figsize=(12, 8))
   for name, data in datasets.items():
       plt.plot(data[:200], label=name, alpha=0.7)
   
   plt.title('Fractional Brownian Motion with Different H Values')
   plt.xlabel('Time')
   plt.ylabel('Value')
   plt.legend()
   plt.grid(True)
   plt.show()

Performance Tips
----------------

1. **Use GPU acceleration** when available
2. **Batch processing** for large datasets
3. **Enable analytics** for monitoring
4. **Use appropriate data lengths** (1000+ samples recommended)

Nonstationarity Testing
-----------------------

Test estimator robustness under nonstationarity conditions:

.. code-block:: python

   from lrdbenchmark.generation import (
       RegimeSwitchingProcess,
       ContinuousDriftProcess,
       StructuralBreakProcess
   )
   
   # Regime switching: H jumps from 0.3 to 0.8 at midpoint
   gen = RegimeSwitchingProcess(h_regimes=[0.3, 0.8], change_points=[0.5])
   result = gen.generate(1000)
   signal = result['signal']
   h_trajectory = result['h_trajectory']  # True H at each timepoint
   
   # Continuous linear drift from H=0.3 to H=0.8
   gen = ContinuousDriftProcess(h_start=0.3, h_end=0.8, drift_type='linear')
   result = gen.generate(1000)
   
   # Structural break with level shift
   gen = StructuralBreakProcess(h_before=0.7, h_after=0.4, break_severity=0.3)
   result = gen.generate(1000)

Critical Regime Models
----------------------

Test estimators in physics-motivated critical regimes:

.. code-block:: python

   from lrdbenchmark.generation import (
       OrnsteinUhlenbeckProcess,
       FractionalLevyMotion,
       SOCAvalancheModel
   )
   
   # OU with time-varying friction (transient criticality)
   gen = OrnsteinUhlenbeckProcess(theta_start=0.1, theta_end=1.0)
   result = gen.generate(1000)
   
   # Heavy-tailed fractional Lévy motion (α<2 stable)
   gen = FractionalLevyMotion(H=0.7, alpha=1.5)
   result = gen.generate(1000)
   
   # Self-organized criticality avalanche model
   gen = SOCAvalancheModel(grid_size=32)
   result = gen.generate(500)

Structural Break Detection
--------------------------

Detect stationarity violations before running classical estimators:

.. code-block:: python

   from lrdbenchmark.analysis.diagnostics import StructuralBreakDetector
   
   detector = StructuralBreakDetector(significance_level=0.05)
   result = detector.detect_all(data)
   
   if result['any_break_detected']:
       print("⚠️ Warning: Stationarity violated!")
       print(result['warnings'])
   else:
       print("Data appears stationary; proceed with classical estimation")

Surrogate Data Testing
----------------------

Generate surrogates for hypothesis testing:

.. code-block:: python

   from lrdbenchmark.generation import IAFFTSurrogate, PhaseRandomizedSurrogate

   # IAAFT (class name IAFFTSurrogate): preserve spectrum AND amplitude distribution
   gen = IAFFTSurrogate()
   result = gen.generate(original_data, n_surrogates=100)
   surrogates = result['surrogates']
   
   # Phase randomization: preserve spectrum only
   gen = PhaseRandomizedSurrogate()
   result = gen.generate(original_data, n_surrogates=100)

Running Failure Benchmarks
--------------------------

Systematically test classical estimators under nonstationarity:

.. code-block:: bash

   # Quick screening (~5 min)
   python scripts/benchmarks/run_classical_failure_benchmark.py --profile quick
   
   # Standard analysis (~1 hour)
   python scripts/benchmarks/run_classical_failure_benchmark.py --profile standard
   
   # Full publication run (~8-10 hours)
   python scripts/benchmarks/run_classical_failure_benchmark.py --profile full

Next Steps
----------


* :doc:`notebooks/notebooks_overview` - **Start here**: Comprehensive demonstration notebooks
* :doc:`installation` - Detailed installation guide
* :doc:`api/data_models` - Learn about data models
* :doc:`api/estimators` - Explore available estimators
* :doc:`examples/comprehensive_demo` - More examples and use cases

**Recommended Learning Path**:

1. **Follow the tutorials**: Begin with :doc:`tutorials/tutorial_01_synthetic_data` (or open `notebooks/markdown/01_data_generation_and_visualisation.md`)
2. **Explore API**: Use the quickstart examples above
3. **Advanced Usage**: Try the comprehensive benchmarking examples
4. **Custom Development**: Learn extensibility from the custom models notebook