Estimation and Statistical Validation

This notebook demonstrates the comprehensive estimation capabilities of the lrdbenchmark library, covering all available estimator categories with statistical validation.

Overview

Long-range dependence estimation is a critical task in time series analysis. This notebook covers:

Estimator Categories: Classical, Machine Learning, and Neural Network estimators
Statistical Validation: Confidence intervals, bootstrap methods, convergence analysis
Performance Comparison: Accuracy, speed, and robustness across different estimators
Decision Guidelines: When to use which estimator

1. Setup and Imports

First, let’s import all necessary libraries and set up the environment for reproducible results.

# Standard scientific computing imports
import numpy as np
# LRDBenchmark imports - using simplified API
from lrdbenchmark import (
    # Data models
    FBMModel, FGNModel, ARFIMAModel, MRWModel, AlphaStableModel,
    # All unified estimators are now available under a consistent API
    RSEstimator, DFAEstimator, DMAEstimator, HiguchiEstimator,
    WhittleEstimator, GPHEstimator, PeriodogramEstimator,
    CWTEstimator, WaveletVarianceEstimator, WaveletLogVarianceEstimator,
    MFDFAEstimator,
    # Machine Learning estimators
    RandomForestEstimator, SVREstimator, GradientBoostingEstimator,
    # Neural Network estimators
    CNNEstimator, LSTMEstimator, GRUEstimator, TransformerEstimator,
    # GPU utilities
    gpu_is_available, get_device_info, clear_gpu_cache
)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import bootstrap
import time
import warnings
import subprocess
import gc
warnings.filterwarnings('ignore')

# Set JAX to use CPU to avoid CUDA issues
import os
os.environ['JAX_PLATFORMS'] = 'cpu'

# GPU Memory Management Functions

2. Estimator Categories Overview

lrdbenchmark provides three main categories of Hurst parameter estimators:

1. Classical Estimators

Temporal: R/S Analysis, DFA, DMA, Higuchi
Spectral: GPH, Whittle, Periodogram
Wavelet: CWT, Wavelet Variance, Log Variance, Wavelet Whittle
Multifractal: MFDFA, Wavelet Leaders

2. Machine Learning Estimators

Random Forest: Ensemble tree-based estimation
Support Vector Regression: SVM-based estimation
Gradient Boosting: Boosted tree estimation

3. Neural Network Estimators

CNN: Convolutional Neural Networks
LSTM: Long Short-Term Memory networks
GRU: Gated Recurrent Units
Transformer: Attention-based architectures

Let’s demonstrate each category with comprehensive examples.

3. Classical Estimators

Classical estimators are based on well-established statistical methods for LRD estimation. They are fast, interpretable, and have strong theoretical foundations.

Estimator Reliability

✅ Reliable Estimators: - Temporal: R/S Analysis, DFA, DMA, Higuchi - well-tested and accurate - Spectral: GPH, Whittle, Periodogram - robust spectral methods

⚠️ Improved Estimators: - CWT: Now works reasonably well with gaus1 wavelet (bias: -0.54 to +0.18)

❌ Biased Estimators (Need Further Development): - Wavelet: Wavelet Variance, Wavelet Log Variance, Wavelet Whittle - Multifractal: MFDFA, Wavelet Leaders

Note: The CWT estimator has been improved with better default parameters (gaus1 wavelet, appropriate scales). Other wavelet and multifractal estimators still show significant bias and need further development.

# Generate test data with known Hurst parameters
print("🔍 Generating test data for classical estimator evaluation...")

# Test with different Hurst parameters
H_values = [0.3, 0.5, 0.7, 0.9]
n_samples = 2000 # Increased sample size for more stable estimates

# Generate FBM data for each H value
test_data = {}
for H in H_values:
    fbm = FBMModel(H=H, sigma=1.0)
    data = fbm.generate(length=n_samples, seed=42)
    test_data[f'H={H}'] = {'data': data, 'true_H': H}

print(f"Generated {len(test_data)} test datasets of length {n_samples}")

# Initialize classical estimators
print("📊 Initializing Classical Estimators...")

classical_estimators = {
    # Temporal estimators
    'R/S Analysis': RSEstimator(),
    'DFA': DFAEstimator(),
    'DMA': DMAEstimator(),
    'Higuchi': HiguchiEstimator(),

    # Spectral estimators
    'GPH': GPHEstimator(),
    'Whittle': WhittleEstimator(),
    'Periodogram': PeriodogramEstimator(),

    # Wavelet estimators
    'CWT': CWTEstimator(),
    'WaveletVariance': WaveletVarianceEstimator(),

    # Multifractal estimators
    'MFDFA': MFDFAEstimator(),
}

print(f"Initialized {len(classical_estimators)} classical estimators")

# Test classical estimators
print("\n📊 Classical Estimator Results:")
print("=" * 60)

results = []

for data_name, data_info in test_data.items():
    data = data_info['data']
    true_H = data_info['true_H']

    print(f"\n{data_name} (True H = {true_H}):")
    print("-" * 40)

    for estimator_name, estimator in classical_estimators.items():
        try:
            start_time = time.time()
            result = estimator.estimate(data)
            end_time = time.time()

            if isinstance(result, dict):
                estimated_H = result.get('hurst_parameter', result.get('H', None))
            else:
                estimated_H = result

            if estimated_H is not None and np.isfinite(estimated_H):
                error = abs(estimated_H - true_H)
                execution_time = end_time - start_time

                print(f"  {estimator_name:20}: H = {estimated_H:.4f}, Error = {error:.4f}, Time = {execution_time:.3f}s")

                results.append({
                    'Data': data_name,
                    'True_H': true_H,
                    'Estimator': estimator_name,
                    'Estimated_H': estimated_H,
                    'Error': error,
                    'Execution_Time': execution_time,
                    'Category': 'Classical'
                })
            else:
                print(f"  {estimator_name:20}: Failed to estimate or got non-finite result")

        except Exception as e:
            print(f"  {estimator_name:20}: Error - {str(e)[:50]}...")

# Create results DataFrame
results_df = pd.DataFrame(results)
print(f"\n📈 Summary: {len(results_df)} successful estimations")

# Calculate performance metrics
if len(results_df) > 0:
    performance_summary = results_df.groupby('Estimator').agg({
        'Error': ['mean', 'std', 'min', 'max'],
        'Execution_Time': ['mean', 'std']
    }).round(4)

    print("\n📊 Performance Summary (Classical Estimators):")
    print(performance_summary)

🔍 Generating test data for classical estimator evaluation...
Generated 4 test datasets
Initialized 13 classical estimators

📊 Classical Estimator Results:
============================================================

H=0.3 (True H = 0.3):
----------------------------------------
  R/S Analysis: H = 0.1488, Error = 0.1512, Time = 1.953s
  DFA         : H = 0.0183, Error = 0.2817, Time = 0.010s
  DMA         : H = 0.0142, Error = 0.2858, Time = 0.001s
  Higuchi     : H = 0.0000, Error = 0.3000, Time = 0.002s
  GPH         : H = 0.9900, Error = 0.6900, Time = 0.293s
  Whittle     : H = 0.7000, Error = 0.4000, Time = 0.001s
  Periodogram : H = 0.9900, Error = 0.6900, Time = 0.001s
  CWT         : H = 1.0027, Error = 0.7027, Time = 0.076s
  Wavelet Variance: H = 0.5539, Error = 0.2539, Time = 0.005s
  Wavelet Log Variance: H = 0.7421, Error = 0.4421, Time = 0.005s
  Wavelet Whittle: H = 0.8044, Error = 0.5044, Time = 0.003s
  MFDFA       : H = 0.0876, Error = 0.2124, Time = 0.008s
  Wavelet Leaders: H = 0.6781, Error = 0.3781, Time = 0.003s

H=0.5 (True H = 0.5):
----------------------------------------
  R/S Analysis: H = 0.6237, Error = 0.1237, Time = 0.028s
  DFA         : H = 0.5679, Error = 0.0679, Time = 0.006s
  DMA         : H = 0.5414, Error = 0.0414, Time = 0.001s
  Higuchi     : H = 0.5134, Error = 0.0134, Time = 0.002s
  GPH         : H = 0.6536, Error = 0.1536, Time = 0.001s
  Whittle     : H = 0.7000, Error = 0.2000, Time = 0.000s
  Periodogram : H = 0.5043, Error = 0.0043, Time = 0.001s
  CWT         : H = 0.5913, Error = 0.0913, Time = 0.077s
  Wavelet Variance: H = 0.6122, Error = 0.1122, Time = 0.005s
  Wavelet Log Variance: H = 0.8084, Error = 0.3084, Time = 0.006s
  Wavelet Whittle: H = 0.8047, Error = 0.3047, Time = 0.003s
  MFDFA       : H = 0.1477, Error = 0.3523, Time = 0.008s
  Wavelet Leaders: H = 0.6810, Error = 0.1810, Time = 0.002s

H=0.7 (True H = 0.7):
----------------------------------------
  R/S Analysis: H = 0.8305, Error = 0.1305, Time = 0.026s
  DFA         : H = 0.8742, Error = 0.1742, Time = 0.006s
  DMA         : H = 0.7639, Error = 0.0639, Time = 0.001s
  Higuchi     : H = 0.7927, Error = 0.0927, Time = 0.002s
  GPH         : H = 0.9115, Error = 0.2115, Time = 0.001s
  Whittle     : H = 0.7000, Error = 0.0000, Time = 0.000s
  Periodogram : H = 0.7618, Error = 0.0618, Time = 0.001s
  CWT         : H = 0.5657, Error = 0.1343, Time = 0.075s
  Wavelet Variance: H = 0.7487, Error = 0.0487, Time = 0.005s
  Wavelet Log Variance: H = 0.9392, Error = 0.2392, Time = 0.005s
  Wavelet Whittle: H = 0.8050, Error = 0.1050, Time = 0.003s
  MFDFA       : H = 0.2963, Error = 0.4037, Time = 0.008s
  Wavelet Leaders: H = 0.6833, Error = 0.0167, Time = 0.002s

H=0.9 (True H = 0.9):
----------------------------------------
  R/S Analysis: H = 0.9215, Error = 0.0215, Time = 0.026s
  DFA         : H = 1.1812, Error = 0.2812, Time = 0.006s
  DMA         : H = 0.9064, Error = 0.0064, Time = 0.001s
  Higuchi     : H = 0.9617, Error = 0.0617, Time = 0.002s
  GPH         : H = 0.9900, Error = 0.0900, Time = 0.001s
  Whittle     : H = 0.7000, Error = 0.2000, Time = 0.000s
  Periodogram : H = 0.9900, Error = 0.0900, Time = 0.001s
  CWT         : H = 0.7718, Error = 0.1282, Time = 0.075s
  Wavelet Variance: H = 1.0000, Error = 0.1000, Time = 0.005s
  Wavelet Log Variance: H = 1.0000, Error = 0.1000, Time = 0.005s
  Wavelet Whittle: H = 0.8073, Error = 0.0927, Time = 0.003s
  MFDFA       : H = 0.9238, Error = 0.0238, Time = 0.008s
  Wavelet Leaders: H = 0.6641, Error = 0.2359, Time = 0.003s

📈 Summary: 52 successful estimations

📊 Performance Summary (Classical Estimators):
                       Error                         Execution_Time
                        mean     std     min     max           mean     std
Estimator
CWT                   0.2641  0.2930  0.0913  0.7027         0.0758  0.0007
DFA                   0.2012  0.1023  0.0679  0.2817         0.0071  0.0017
DMA                   0.0994  0.1265  0.0064  0.2858         0.0007  0.0001
GPH                   0.2863  0.2737  0.0900  0.6900         0.0740  0.1462
Higuchi               0.1169  0.1263  0.0134  0.3000         0.0022  0.0000
MFDFA                 0.2481  0.1700  0.0238  0.4037         0.0081  0.0002
Periodogram           0.2115  0.3210  0.0043  0.6900         0.0007  0.0001
R/S Analysis          0.1067  0.0580  0.0215  0.1512         0.5082  0.9631
Wavelet Leaders       0.2029  0.1494  0.0167  0.3781         0.0025  0.0000
Wavelet Log Variance  0.2724  0.1425  0.1000  0.4421         0.0053  0.0003
Wavelet Variance      0.1287  0.0879  0.0487  0.2539         0.0048  0.0001
Wavelet Whittle       0.2517  0.1945  0.0927  0.5044         0.0026  0.0001
Whittle               0.2000  0.1633  0.0000  0.4000         0.0004  0.0003

4. Machine Learning Estimators

Machine Learning estimators use pre-trained models to estimate Hurst parameters. They are particularly useful for complex time series patterns and can handle non-standard LRD processes.

# Test ML estimators
print("\n🤖 Machine Learning Estimator Results:")
print("=" * 60)

ml_estimators = {
    'Random Forest': RandomForestEstimator(),
    'SVR': SVREstimator(),
    'Gradient Boosting': GradientBoostingEstimator()
}

for data_name, data_info in test_data.items():
    data = data_info['data']
    true_H = data_info['true_H']

    print(f"\n{data_name} (True H = {true_H}):")
    print("-" * 40)

    for estimator_name, estimator in ml_estimators.items():
        try:
            start_time = time.time()
            result = estimator.estimate(data)
            end_time = time.time()

            if isinstance(result, dict):
                estimated_H = result.get('hurst_parameter', result.get('H', None))
            else:
                estimated_H = result

            if estimated_H is not None:
                error = abs(estimated_H - true_H)
                execution_time = end_time - start_time

                print(f"  {estimator_name:15}: H = {estimated_H:.4f}, Error = {error:.4f}, Time = {execution_time:.3f}s")

                results.append({
                    'Data': data_name,
                    'True_H': true_H,
                    'Estimator': estimator_name,
                    'Estimated_H': estimated_H,
                    'Error': error,
                    'Execution_Time': execution_time,
                    'Category': 'ML'
                })
            else:
                print(f"  {estimator_name:15}: Failed to estimate")

        except Exception as e:
            print(f"  {estimator_name:15}: Error - {str(e)[:50]}...")

Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded
Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded
Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded
Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded

🤖 Machine Learning Estimator Results:
============================================================

H=0.3 (True H = 0.3):
----------------------------------------
  Random Forest  : H = nan, Error = nan, Time = 0.001s
  SVR            : H = nan, Error = nan, Time = 0.000s
  Gradient Boosting: H = nan, Error = nan, Time = 0.000s

H=0.5 (True H = 0.5):
----------------------------------------
  Random Forest  : H = nan, Error = nan, Time = 0.000s
  SVR            : H = nan, Error = nan, Time = 0.000s
  Gradient Boosting: H = nan, Error = nan, Time = 0.000s

H=0.7 (True H = 0.7):
----------------------------------------
  Random Forest  : H = nan, Error = nan, Time = 0.000s
  SVR            : H = nan, Error = nan, Time = 0.000s
  Gradient Boosting: H = nan, Error = nan, Time = 0.000s

H=0.9 (True H = 0.9):
----------------------------------------
  Random Forest  : H = nan, Error = nan, Time = 0.001s
  SVR            : H = nan, Error = nan, Time = 0.000s
  Gradient Boosting: H = nan, Error = nan, Time = 0.000s

5. Neural Network Estimators

Neural Network estimators use deep learning models to estimate Hurst parameters. They can capture complex non-linear patterns and are particularly effective for high-dimensional time series.

# GPU Memory Management for Neural Networks
print("\n🔧 GPU Memory Management:")
print("=" * 40)

# Check GPU memory before neural network operations
print("🔍 Checking GPU memory before neural network operations...")
gpu_is_available()

# Clear any existing GPU memory
print("\n🧹 Clearing GPU memory...")
clear_gpu_cache()

# Check GPU memory after cleanup
print("\n🔍 Checking GPU memory after cleanup...")
gpu_is_available()

🔧 GPU Memory Management:
========================================
🔍 Checking GPU memory before neural network operations...

🧹 Clearing GPU memory...

🔍 Checking GPU memory after cleanup...

False

# Test Neural Network estimators
print("\n🧠 Neural Network Estimator Results:")
print("=" * 60)

neural_estimators = {
    'CNN': CNNEstimator(),
    'LSTM': LSTMEstimator(),
    'GRU': GRUEstimator(),
    'Transformer': TransformerEstimator()
}

for data_name, data_info in test_data.items():
    data = data_info['data']
    true_H = data_info['true_H']

    print(f"\n{data_name} (True H = {true_H}):")
    print("-" * 40)

    for estimator_name, estimator in neural_estimators.items():
        try:
            start_time = time.time()
            result = estimator.estimate(data)
            end_time = time.time()

            if isinstance(result, dict):
                estimated_H = result.get('hurst_parameter', result.get('H', None))
            else:
                estimated_H = result

            if estimated_H is not None:
                error = abs(estimated_H - true_H)
                execution_time = end_time - start_time

                print(f"  {estimator_name:12}: H = {estimated_H:.4f}, Error = {error:.4f}, Time = {execution_time:.3f}s")

                results.append({
                    'Data': data_name,
                    'True_H': true_H,
                    'Estimator': estimator_name,
                    'Estimated_H': estimated_H,
                    'Error': error,
                    'Execution_Time': execution_time,
                    'Category': 'Neural'
                })
            else:
                print(f"  {estimator_name:12}: Failed to estimate")

        except Exception as e:
            print(f"  {estimator_name:12}: Error - {str(e)[:50]}...")

# Update results DataFrame
results_df = pd.DataFrame(results)
print(f"\n📈 Total successful estimations: {len(results_df)}")

🧠 Neural Network Estimator Results:
============================================================

H=0.3 (True H = 0.3):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
  CNN         : H = 0.4183, Error = 0.1183, Time = 0.007s
⚠️ No pretrained LSTM model found. Using neural network estimation.
  LSTM        : H = 0.2731, Error = 0.0269, Time = 0.001s
⚠️ No pretrained GRU model found. Using neural network estimation.
  GRU         : H = 0.3310, Error = 0.0310, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
  Transformer : H = 0.3970, Error = 0.0970, Time = 0.002s

H=0.5 (True H = 0.5):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
  CNN         : H = 0.5033, Error = 0.0033, Time = 0.004s
⚠️ No pretrained LSTM model found. Using neural network estimation.
  LSTM        : H = 0.3874, Error = 0.1126, Time = 0.000s
⚠️ No pretrained GRU model found. Using neural network estimation.
  GRU         : H = 0.4125, Error = 0.0875, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
  Transformer : H = 0.4451, Error = 0.0549, Time = 0.002s

H=0.7 (True H = 0.7):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
  CNN         : H = 0.6269, Error = 0.0731, Time = 0.003s
⚠️ No pretrained LSTM model found. Using neural network estimation.
  LSTM        : H = 0.5939, Error = 0.1061, Time = 0.000s
⚠️ No pretrained GRU model found. Using neural network estimation.
  GRU         : H = 0.5403, Error = 0.1597, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
  Transformer : H = 0.5518, Error = 0.1482, Time = 0.001s

H=0.9 (True H = 0.9):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
  CNN         : H = 0.6957, Error = 0.2043, Time = 0.002s
⚠️ No pretrained LSTM model found. Using neural network estimation.
  LSTM        : H = 0.7715, Error = 0.1285, Time = 0.000s
⚠️ No pretrained GRU model found. Using neural network estimation.
  GRU         : H = 0.6338, Error = 0.2662, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
  Transformer : H = 0.6688, Error = 0.2312, Time = 0.001s

📈 Total successful estimations: 80

6. Performance Comparison

Let’s create comprehensive visualizations comparing all estimator categories.

# Create comprehensive performance comparison
if len(results_df) > 0:
    print("📊 Creating performance comparison visualizations...")

    # Create figure with subplots
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))

    # 1. Error distribution by category
    ax1 = axes[0, 0]
    for category in results_df['Category'].unique():
        category_data = results_df[results_df['Category'] == category]['Error']
        ax1.hist(category_data, alpha=0.7, label=category, bins=15)
    ax1.set_xlabel('Absolute Error')
    ax1.set_ylabel('Frequency')
    ax1.set_title('Error Distribution by Category')
    ax1.legend()
    ax1.grid(True, alpha=0.3)

    # 2. Execution time by category
    ax2 = axes[0, 1]
    for category in results_df['Category'].unique():
        category_data = results_df[results_df['Category'] == category]['Execution_Time']
        ax2.hist(category_data, alpha=0.7, label=category, bins=15)
    ax2.set_xlabel('Execution Time (seconds)')
    ax2.set_ylabel('Frequency')
    ax2.set_title('Execution Time Distribution by Category')
    ax2.legend()
    ax2.grid(True, alpha=0.3)

    # 3. Error vs True H
    ax3 = axes[1, 0]
    for category in results_df['Category'].unique():
        category_data = results_df[results_df['Category'] == category]
        ax3.scatter(category_data['True_H'], category_data['Error'],
                   alpha=0.7, label=category, s=50)
    ax3.set_xlabel('True Hurst Parameter')
    ax3.set_ylabel('Absolute Error')
    ax3.set_title('Error vs True Hurst Parameter')
    ax3.legend()
    ax3.grid(True, alpha=0.3)

    # 4. Performance summary by estimator
    ax4 = axes[1, 1]
    performance_by_estimator = results_df.groupby('Estimator')['Error'].mean().sort_values()
    ax4.bar(range(len(performance_by_estimator)), performance_by_estimator.values, alpha=0.7)
    ax4.set_xlabel('Estimator')
    ax4.set_ylabel('Mean Absolute Error')
    ax4.set_title('Mean Error by Estimator')
    ax4.set_xticks(range(len(performance_by_estimator)))
    ax4.set_xticklabels(performance_by_estimator.index, rotation=45, ha='right')
    ax4.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('outputs/estimator_performance_comparison.png', dpi=300, bbox_inches='tight')
    plt.show()

    # Performance summary table
    print("\n📊 Performance Summary by Category:")
    category_summary = results_df.groupby('Category').agg({
        'Error': ['mean', 'std', 'min', 'max'],
        'Execution_Time': ['mean', 'std']
    }).round(4)
    print(category_summary)

    # Best performing estimators
    print("\n🏆 Top 5 Best Performing Estimators (by mean error):")
    best_estimators = results_df.groupby('Estimator')['Error'].mean().sort_values().head()
    for i, (estimator, error) in enumerate(best_estimators.items(), 1):
        print(f"  {i}. {estimator}: {error:.4f}")

    # Save results
    results_df.to_csv('outputs/estimator_results.csv', index=False)
    print("\n💾 Results saved to outputs/estimator_results.csv")
else:
    print("❌ No successful estimations to compare")

📊 Creating performance comparison visualizations...

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[6], line 12
     10 for category in results_df['Category'].unique():
     11     category_data = results_df[results_df['Category'] == category]['Error']
---> 12     ax1.hist(category_data, alpha=0.7, label=category, bins=15)
     13 ax1.set_xlabel('Absolute Error')
     14 ax1.set_ylabel('Frequency')


File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/matplotlib/_api/deprecation.py:453, in make_keyword_only.<locals>.wrapper(*args, **kwargs)
    447 if len(args) > name_idx:
    448     warn_deprecated(
    449         since, message="Passing the %(name)s %(obj_type)s "
    450         "positionally is deprecated since Matplotlib %(since)s; the "
    451         "parameter will become keyword-only in %(removal)s.",
    452         name=name, obj_type=f"parameter of {func.__name__}()")
--> 453 return func(*args, **kwargs)


File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/matplotlib/__init__.py:1524, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
   1521 @functools.wraps(func)
   1522 def inner(ax, *args, data=None, **kwargs):
   1523     if data is None:
-> 1524         return func(
   1525             ax,
   1526             *map(cbook.sanitize_sequence, args),
   1527             **{k: cbook.sanitize_sequence(v) for k, v in kwargs.items()})
   1529     bound = new_sig.bind(ax, *args, **kwargs)
   1530     auto_label = (bound.arguments.get(label_namer)
   1531                   or bound.kwargs.get(label_namer))


File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/matplotlib/axes/_axes.py:7132, in Axes.hist(self, x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   7128 # Loop through datasets
   7129 for i in range(nx):
   7130     # this will automatically overwrite bins,
   7131     # so that each histogram uses the same bins
-> 7132     m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
   7133     tops.append(m)
   7134 tops = np.array(tops, float)  # causes problems later if it's an int


File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/numpy/lib/_histograms_impl.py:792, in histogram(a, bins, range, density, weights)
    687 r"""
    688 Compute the histogram of a dataset.
    689
   (...)    788
    789 """
    790 a, weights = _ravel_and_check_weights(a, weights)
--> 792 bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
    794 # Histogram is an integer or a float array depending on the weights.
    795 if weights is None:


File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/numpy/lib/_histograms_impl.py:425, in _get_bin_edges(a, bins, range, weights)
    422     if n_equal_bins < 1:
    423         raise ValueError('`bins` must be positive, when an integer')
--> 425     first_edge, last_edge = _get_outer_edges(a, range)
    427 elif np.ndim(bins) == 1:
    428     bin_edges = np.asarray(bins)


File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/numpy/lib/_histograms_impl.py:317, in _get_outer_edges(a, range)
    315     first_edge, last_edge = a.min(), a.max()
    316     if not (np.isfinite(first_edge) and np.isfinite(last_edge)):
--> 317         raise ValueError(
    318             f"autodetected range of [{first_edge}, {last_edge}] is not finite")
    320 # expand empty range to avoid divide by zero
    321 if first_edge == last_edge:


ValueError: autodetected range of [nan, nan] is not finite

tutorials/tutorial_02_estimators_files/tutorial_02_estimators_12_2.png

7. Decision Guidelines

When to Use Which Estimator

Classical Estimators

Best for: Standard LRD processes, interpretable results, fast computation
Use when: You need theoretical guarantees, have clean data, want fast results
Recommended: R/S Analysis, DFA, GPH for most applications

Machine Learning Estimators

Best for: Complex patterns, non-standard LRD processes, pre-trained models
Use when: You have diverse data types, need robust estimation, have computational resources
Recommended: Random Forest for general use, SVR for smooth patterns

Neural Network Estimators

Best for: High-dimensional data, complex non-linear patterns, large datasets
Use when: You have sufficient data, need state-of-the-art accuracy, can afford training time
Recommended: CNN for spatial patterns, LSTM for temporal sequences, Transformer for attention-based patterns

Performance Trade-offs

Accuracy vs Speed: Classical < ML < Neural (generally)
Interpretability: Classical > ML > Neural
Robustness: Depends on data quality and estimator choice
Computational Requirements: Classical < ML < Neural

8. Summary and Next Steps

Key Takeaways

Estimator Diversity: lrdbenchmark provides comprehensive coverage across three categories:
- Classical: Fast, interpretable, theoretically grounded
- Machine Learning: Robust, flexible, pre-trained models
- Neural Networks: High accuracy, complex patterns, state-of-the-art
Performance Characteristics:
- Classical estimators are fastest and most interpretable
- ML estimators provide good balance of accuracy and robustness
- Neural networks offer highest accuracy for complex patterns
Selection Guidelines:
- Use classical estimators for standard LRD analysis
- Use ML estimators for diverse data types and robustness
- Use neural networks for complex patterns and high accuracy requirements

Next Steps

Benchmarking: Compare estimators systematically across different data types
Custom Estimators: Learn how to extend the library with custom estimators
Real-world Application: Apply estimators to actual time series data
Performance Optimization: Explore advanced optimization techniques

Files Generated

outputs/estimator_performance_comparison.png: Comprehensive performance visualization
outputs/estimator_results.csv: Detailed results table
Performance metrics and rankings

References

Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers, 116(1), 770-808.
Peng, C. K., et al. (1994). Mosaic organization of DNA nucleotides. Physical review E, 49(2), 1685.
Geweke, J., & Porter-Hudak, S. (1983). The estimation and application of long memory time series models. Journal of time series analysis, 4(4), 221-238.
Abry, P., & Veitch, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on information theory, 44(1), 2-15.

Next Tutorial: Custom Models and Estimators — learn how to extend the library with custom data models and estimators.