Estimation and Statistical Validation
This notebook demonstrates the comprehensive estimation capabilities of the lrdbenchmark library, covering all available estimator categories with statistical validation.
Overview
Long-range dependence estimation is a critical task in time series analysis. This notebook covers:
Estimator Categories: Classical, Machine Learning, and Neural Network estimators
Statistical Validation: Confidence intervals, bootstrap methods, convergence analysis
Performance Comparison: Accuracy, speed, and robustness across different estimators
Decision Guidelines: When to use which estimator
Table of Contents
1. Setup and Imports
First, let’s import all necessary libraries and set up the environment for reproducible results.
# Standard scientific computing imports
import numpy as np
# LRDBenchmark imports - using simplified API
from lrdbenchmark import (
# Data models
FBMModel, FGNModel, ARFIMAModel, MRWModel, AlphaStableModel,
# All unified estimators are now available under a consistent API
RSEstimator, DFAEstimator, DMAEstimator, HiguchiEstimator,
WhittleEstimator, GPHEstimator, PeriodogramEstimator,
CWTEstimator, WaveletVarianceEstimator, WaveletLogVarianceEstimator,
MFDFAEstimator,
# Machine Learning estimators
RandomForestEstimator, SVREstimator, GradientBoostingEstimator,
# Neural Network estimators
CNNEstimator, LSTMEstimator, GRUEstimator, TransformerEstimator,
# GPU utilities
gpu_is_available, get_device_info, clear_gpu_cache
)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import bootstrap
import time
import warnings
import subprocess
import gc
warnings.filterwarnings('ignore')
# Set JAX to use CPU to avoid CUDA issues
import os
os.environ['JAX_PLATFORMS'] = 'cpu'
# GPU Memory Management Functions
2. Estimator Categories Overview
lrdbenchmark provides three main categories of Hurst parameter estimators:
1. Classical Estimators
Temporal: R/S Analysis, DFA, DMA, Higuchi
Spectral: GPH, Whittle, Periodogram
Wavelet: CWT, Wavelet Variance, Log Variance, Wavelet Whittle
Multifractal: MFDFA, Wavelet Leaders
2. Machine Learning Estimators
Random Forest: Ensemble tree-based estimation
Support Vector Regression: SVM-based estimation
Gradient Boosting: Boosted tree estimation
3. Neural Network Estimators
CNN: Convolutional Neural Networks
LSTM: Long Short-Term Memory networks
GRU: Gated Recurrent Units
Transformer: Attention-based architectures
Let’s demonstrate each category with comprehensive examples.
3. Classical Estimators
Classical estimators are based on well-established statistical methods for LRD estimation. They are fast, interpretable, and have strong theoretical foundations.
Estimator Reliability
✅ Reliable Estimators: - Temporal: R/S Analysis, DFA, DMA, Higuchi - well-tested and accurate - Spectral: GPH, Whittle, Periodogram - robust spectral methods
⚠️ Improved Estimators: - CWT: Now works reasonably well with gaus1 wavelet (bias: -0.54 to +0.18)
❌ Biased Estimators (Need Further Development): - Wavelet: Wavelet Variance, Wavelet Log Variance, Wavelet Whittle - Multifractal: MFDFA, Wavelet Leaders
Note: The CWT estimator has been improved with better default parameters (gaus1 wavelet, appropriate scales). Other wavelet and multifractal estimators still show significant bias and need further development.
# Generate test data with known Hurst parameters
print("🔍 Generating test data for classical estimator evaluation...")
# Test with different Hurst parameters
H_values = [0.3, 0.5, 0.7, 0.9]
n_samples = 2000 # Increased sample size for more stable estimates
# Generate FBM data for each H value
test_data = {}
for H in H_values:
fbm = FBMModel(H=H, sigma=1.0)
data = fbm.generate(length=n_samples, seed=42)
test_data[f'H={H}'] = {'data': data, 'true_H': H}
print(f"Generated {len(test_data)} test datasets of length {n_samples}")
# Initialize classical estimators
print("📊 Initializing Classical Estimators...")
classical_estimators = {
# Temporal estimators
'R/S Analysis': RSEstimator(),
'DFA': DFAEstimator(),
'DMA': DMAEstimator(),
'Higuchi': HiguchiEstimator(),
# Spectral estimators
'GPH': GPHEstimator(),
'Whittle': WhittleEstimator(),
'Periodogram': PeriodogramEstimator(),
# Wavelet estimators
'CWT': CWTEstimator(),
'WaveletVariance': WaveletVarianceEstimator(),
# Multifractal estimators
'MFDFA': MFDFAEstimator(),
}
print(f"Initialized {len(classical_estimators)} classical estimators")
# Test classical estimators
print("\n📊 Classical Estimator Results:")
print("=" * 60)
results = []
for data_name, data_info in test_data.items():
data = data_info['data']
true_H = data_info['true_H']
print(f"\n{data_name} (True H = {true_H}):")
print("-" * 40)
for estimator_name, estimator in classical_estimators.items():
try:
start_time = time.time()
result = estimator.estimate(data)
end_time = time.time()
if isinstance(result, dict):
estimated_H = result.get('hurst_parameter', result.get('H', None))
else:
estimated_H = result
if estimated_H is not None and np.isfinite(estimated_H):
error = abs(estimated_H - true_H)
execution_time = end_time - start_time
print(f" {estimator_name:20}: H = {estimated_H:.4f}, Error = {error:.4f}, Time = {execution_time:.3f}s")
results.append({
'Data': data_name,
'True_H': true_H,
'Estimator': estimator_name,
'Estimated_H': estimated_H,
'Error': error,
'Execution_Time': execution_time,
'Category': 'Classical'
})
else:
print(f" {estimator_name:20}: Failed to estimate or got non-finite result")
except Exception as e:
print(f" {estimator_name:20}: Error - {str(e)[:50]}...")
# Create results DataFrame
results_df = pd.DataFrame(results)
print(f"\n📈 Summary: {len(results_df)} successful estimations")
# Calculate performance metrics
if len(results_df) > 0:
performance_summary = results_df.groupby('Estimator').agg({
'Error': ['mean', 'std', 'min', 'max'],
'Execution_Time': ['mean', 'std']
}).round(4)
print("\n📊 Performance Summary (Classical Estimators):")
print(performance_summary)
🔍 Generating test data for classical estimator evaluation...
Generated 4 test datasets
Initialized 13 classical estimators
📊 Classical Estimator Results:
============================================================
H=0.3 (True H = 0.3):
----------------------------------------
R/S Analysis: H = 0.1488, Error = 0.1512, Time = 1.953s
DFA : H = 0.0183, Error = 0.2817, Time = 0.010s
DMA : H = 0.0142, Error = 0.2858, Time = 0.001s
Higuchi : H = 0.0000, Error = 0.3000, Time = 0.002s
GPH : H = 0.9900, Error = 0.6900, Time = 0.293s
Whittle : H = 0.7000, Error = 0.4000, Time = 0.001s
Periodogram : H = 0.9900, Error = 0.6900, Time = 0.001s
CWT : H = 1.0027, Error = 0.7027, Time = 0.076s
Wavelet Variance: H = 0.5539, Error = 0.2539, Time = 0.005s
Wavelet Log Variance: H = 0.7421, Error = 0.4421, Time = 0.005s
Wavelet Whittle: H = 0.8044, Error = 0.5044, Time = 0.003s
MFDFA : H = 0.0876, Error = 0.2124, Time = 0.008s
Wavelet Leaders: H = 0.6781, Error = 0.3781, Time = 0.003s
H=0.5 (True H = 0.5):
----------------------------------------
R/S Analysis: H = 0.6237, Error = 0.1237, Time = 0.028s
DFA : H = 0.5679, Error = 0.0679, Time = 0.006s
DMA : H = 0.5414, Error = 0.0414, Time = 0.001s
Higuchi : H = 0.5134, Error = 0.0134, Time = 0.002s
GPH : H = 0.6536, Error = 0.1536, Time = 0.001s
Whittle : H = 0.7000, Error = 0.2000, Time = 0.000s
Periodogram : H = 0.5043, Error = 0.0043, Time = 0.001s
CWT : H = 0.5913, Error = 0.0913, Time = 0.077s
Wavelet Variance: H = 0.6122, Error = 0.1122, Time = 0.005s
Wavelet Log Variance: H = 0.8084, Error = 0.3084, Time = 0.006s
Wavelet Whittle: H = 0.8047, Error = 0.3047, Time = 0.003s
MFDFA : H = 0.1477, Error = 0.3523, Time = 0.008s
Wavelet Leaders: H = 0.6810, Error = 0.1810, Time = 0.002s
H=0.7 (True H = 0.7):
----------------------------------------
R/S Analysis: H = 0.8305, Error = 0.1305, Time = 0.026s
DFA : H = 0.8742, Error = 0.1742, Time = 0.006s
DMA : H = 0.7639, Error = 0.0639, Time = 0.001s
Higuchi : H = 0.7927, Error = 0.0927, Time = 0.002s
GPH : H = 0.9115, Error = 0.2115, Time = 0.001s
Whittle : H = 0.7000, Error = 0.0000, Time = 0.000s
Periodogram : H = 0.7618, Error = 0.0618, Time = 0.001s
CWT : H = 0.5657, Error = 0.1343, Time = 0.075s
Wavelet Variance: H = 0.7487, Error = 0.0487, Time = 0.005s
Wavelet Log Variance: H = 0.9392, Error = 0.2392, Time = 0.005s
Wavelet Whittle: H = 0.8050, Error = 0.1050, Time = 0.003s
MFDFA : H = 0.2963, Error = 0.4037, Time = 0.008s
Wavelet Leaders: H = 0.6833, Error = 0.0167, Time = 0.002s
H=0.9 (True H = 0.9):
----------------------------------------
R/S Analysis: H = 0.9215, Error = 0.0215, Time = 0.026s
DFA : H = 1.1812, Error = 0.2812, Time = 0.006s
DMA : H = 0.9064, Error = 0.0064, Time = 0.001s
Higuchi : H = 0.9617, Error = 0.0617, Time = 0.002s
GPH : H = 0.9900, Error = 0.0900, Time = 0.001s
Whittle : H = 0.7000, Error = 0.2000, Time = 0.000s
Periodogram : H = 0.9900, Error = 0.0900, Time = 0.001s
CWT : H = 0.7718, Error = 0.1282, Time = 0.075s
Wavelet Variance: H = 1.0000, Error = 0.1000, Time = 0.005s
Wavelet Log Variance: H = 1.0000, Error = 0.1000, Time = 0.005s
Wavelet Whittle: H = 0.8073, Error = 0.0927, Time = 0.003s
MFDFA : H = 0.9238, Error = 0.0238, Time = 0.008s
Wavelet Leaders: H = 0.6641, Error = 0.2359, Time = 0.003s
📈 Summary: 52 successful estimations
📊 Performance Summary (Classical Estimators):
Error Execution_Time
mean std min max mean std
Estimator
CWT 0.2641 0.2930 0.0913 0.7027 0.0758 0.0007
DFA 0.2012 0.1023 0.0679 0.2817 0.0071 0.0017
DMA 0.0994 0.1265 0.0064 0.2858 0.0007 0.0001
GPH 0.2863 0.2737 0.0900 0.6900 0.0740 0.1462
Higuchi 0.1169 0.1263 0.0134 0.3000 0.0022 0.0000
MFDFA 0.2481 0.1700 0.0238 0.4037 0.0081 0.0002
Periodogram 0.2115 0.3210 0.0043 0.6900 0.0007 0.0001
R/S Analysis 0.1067 0.0580 0.0215 0.1512 0.5082 0.9631
Wavelet Leaders 0.2029 0.1494 0.0167 0.3781 0.0025 0.0000
Wavelet Log Variance 0.2724 0.1425 0.1000 0.4421 0.0053 0.0003
Wavelet Variance 0.1287 0.0879 0.0487 0.2539 0.0048 0.0001
Wavelet Whittle 0.2517 0.1945 0.0927 0.5044 0.0026 0.0001
Whittle 0.2000 0.1633 0.0000 0.4000 0.0004 0.0003
4. Machine Learning Estimators
Machine Learning estimators use pre-trained models to estimate Hurst parameters. They are particularly useful for complex time series patterns and can handle non-standard LRD processes.
# Test ML estimators
print("\n🤖 Machine Learning Estimator Results:")
print("=" * 60)
ml_estimators = {
'Random Forest': RandomForestEstimator(),
'SVR': SVREstimator(),
'Gradient Boosting': GradientBoostingEstimator()
}
for data_name, data_info in test_data.items():
data = data_info['data']
true_H = data_info['true_H']
print(f"\n{data_name} (True H = {true_H}):")
print("-" * 40)
for estimator_name, estimator in ml_estimators.items():
try:
start_time = time.time()
result = estimator.estimate(data)
end_time = time.time()
if isinstance(result, dict):
estimated_H = result.get('hurst_parameter', result.get('H', None))
else:
estimated_H = result
if estimated_H is not None:
error = abs(estimated_H - true_H)
execution_time = end_time - start_time
print(f" {estimator_name:15}: H = {estimated_H:.4f}, Error = {error:.4f}, Time = {execution_time:.3f}s")
results.append({
'Data': data_name,
'True_H': true_H,
'Estimator': estimator_name,
'Estimated_H': estimated_H,
'Error': error,
'Execution_Time': execution_time,
'Category': 'ML'
})
else:
print(f" {estimator_name:15}: Failed to estimate")
except Exception as e:
print(f" {estimator_name:15}: Error - {str(e)[:50]}...")
Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded
Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded
Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded
Pre-trained model not found at models/random_forest_estimator.joblib
Random Forest model not loaded
Pre-trained model not found at models/svr_estimator.joblib
SVR model not loaded
Pre-trained model not found at models/gradient_boosting_estimator.joblib
Gradient Boosting model not loaded
🤖 Machine Learning Estimator Results:
============================================================
H=0.3 (True H = 0.3):
----------------------------------------
Random Forest : H = nan, Error = nan, Time = 0.001s
SVR : H = nan, Error = nan, Time = 0.000s
Gradient Boosting: H = nan, Error = nan, Time = 0.000s
H=0.5 (True H = 0.5):
----------------------------------------
Random Forest : H = nan, Error = nan, Time = 0.000s
SVR : H = nan, Error = nan, Time = 0.000s
Gradient Boosting: H = nan, Error = nan, Time = 0.000s
H=0.7 (True H = 0.7):
----------------------------------------
Random Forest : H = nan, Error = nan, Time = 0.000s
SVR : H = nan, Error = nan, Time = 0.000s
Gradient Boosting: H = nan, Error = nan, Time = 0.000s
H=0.9 (True H = 0.9):
----------------------------------------
Random Forest : H = nan, Error = nan, Time = 0.001s
SVR : H = nan, Error = nan, Time = 0.000s
Gradient Boosting: H = nan, Error = nan, Time = 0.000s
5. Neural Network Estimators
Neural Network estimators use deep learning models to estimate Hurst parameters. They can capture complex non-linear patterns and are particularly effective for high-dimensional time series.
# GPU Memory Management for Neural Networks
print("\n🔧 GPU Memory Management:")
print("=" * 40)
# Check GPU memory before neural network operations
print("🔍 Checking GPU memory before neural network operations...")
gpu_is_available()
# Clear any existing GPU memory
print("\n🧹 Clearing GPU memory...")
clear_gpu_cache()
# Check GPU memory after cleanup
print("\n🔍 Checking GPU memory after cleanup...")
gpu_is_available()
🔧 GPU Memory Management:
========================================
🔍 Checking GPU memory before neural network operations...
🧹 Clearing GPU memory...
🔍 Checking GPU memory after cleanup...
False
# Test Neural Network estimators
print("\n🧠 Neural Network Estimator Results:")
print("=" * 60)
neural_estimators = {
'CNN': CNNEstimator(),
'LSTM': LSTMEstimator(),
'GRU': GRUEstimator(),
'Transformer': TransformerEstimator()
}
for data_name, data_info in test_data.items():
data = data_info['data']
true_H = data_info['true_H']
print(f"\n{data_name} (True H = {true_H}):")
print("-" * 40)
for estimator_name, estimator in neural_estimators.items():
try:
start_time = time.time()
result = estimator.estimate(data)
end_time = time.time()
if isinstance(result, dict):
estimated_H = result.get('hurst_parameter', result.get('H', None))
else:
estimated_H = result
if estimated_H is not None:
error = abs(estimated_H - true_H)
execution_time = end_time - start_time
print(f" {estimator_name:12}: H = {estimated_H:.4f}, Error = {error:.4f}, Time = {execution_time:.3f}s")
results.append({
'Data': data_name,
'True_H': true_H,
'Estimator': estimator_name,
'Estimated_H': estimated_H,
'Error': error,
'Execution_Time': execution_time,
'Category': 'Neural'
})
else:
print(f" {estimator_name:12}: Failed to estimate")
except Exception as e:
print(f" {estimator_name:12}: Error - {str(e)[:50]}...")
# Update results DataFrame
results_df = pd.DataFrame(results)
print(f"\n📈 Total successful estimations: {len(results_df)}")
🧠 Neural Network Estimator Results:
============================================================
H=0.3 (True H = 0.3):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
CNN : H = 0.4183, Error = 0.1183, Time = 0.007s
⚠️ No pretrained LSTM model found. Using neural network estimation.
LSTM : H = 0.2731, Error = 0.0269, Time = 0.001s
⚠️ No pretrained GRU model found. Using neural network estimation.
GRU : H = 0.3310, Error = 0.0310, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
Transformer : H = 0.3970, Error = 0.0970, Time = 0.002s
H=0.5 (True H = 0.5):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
CNN : H = 0.5033, Error = 0.0033, Time = 0.004s
⚠️ No pretrained LSTM model found. Using neural network estimation.
LSTM : H = 0.3874, Error = 0.1126, Time = 0.000s
⚠️ No pretrained GRU model found. Using neural network estimation.
GRU : H = 0.4125, Error = 0.0875, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
Transformer : H = 0.4451, Error = 0.0549, Time = 0.002s
H=0.7 (True H = 0.7):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
CNN : H = 0.6269, Error = 0.0731, Time = 0.003s
⚠️ No pretrained LSTM model found. Using neural network estimation.
LSTM : H = 0.5939, Error = 0.1061, Time = 0.000s
⚠️ No pretrained GRU model found. Using neural network estimation.
GRU : H = 0.5403, Error = 0.1597, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
Transformer : H = 0.5518, Error = 0.1482, Time = 0.001s
H=0.9 (True H = 0.9):
----------------------------------------
⚠️ No pretrained CNN model found. Using neural network estimation.
CNN : H = 0.6957, Error = 0.2043, Time = 0.002s
⚠️ No pretrained LSTM model found. Using neural network estimation.
LSTM : H = 0.7715, Error = 0.1285, Time = 0.000s
⚠️ No pretrained GRU model found. Using neural network estimation.
GRU : H = 0.6338, Error = 0.2662, Time = 0.000s
⚠️ No pretrained Transformer model found. Using neural network estimation.
Transformer : H = 0.6688, Error = 0.2312, Time = 0.001s
📈 Total successful estimations: 80
6. Performance Comparison
Let’s create comprehensive visualizations comparing all estimator categories.
# Create comprehensive performance comparison
if len(results_df) > 0:
print("📊 Creating performance comparison visualizations...")
# Create figure with subplots
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
# 1. Error distribution by category
ax1 = axes[0, 0]
for category in results_df['Category'].unique():
category_data = results_df[results_df['Category'] == category]['Error']
ax1.hist(category_data, alpha=0.7, label=category, bins=15)
ax1.set_xlabel('Absolute Error')
ax1.set_ylabel('Frequency')
ax1.set_title('Error Distribution by Category')
ax1.legend()
ax1.grid(True, alpha=0.3)
# 2. Execution time by category
ax2 = axes[0, 1]
for category in results_df['Category'].unique():
category_data = results_df[results_df['Category'] == category]['Execution_Time']
ax2.hist(category_data, alpha=0.7, label=category, bins=15)
ax2.set_xlabel('Execution Time (seconds)')
ax2.set_ylabel('Frequency')
ax2.set_title('Execution Time Distribution by Category')
ax2.legend()
ax2.grid(True, alpha=0.3)
# 3. Error vs True H
ax3 = axes[1, 0]
for category in results_df['Category'].unique():
category_data = results_df[results_df['Category'] == category]
ax3.scatter(category_data['True_H'], category_data['Error'],
alpha=0.7, label=category, s=50)
ax3.set_xlabel('True Hurst Parameter')
ax3.set_ylabel('Absolute Error')
ax3.set_title('Error vs True Hurst Parameter')
ax3.legend()
ax3.grid(True, alpha=0.3)
# 4. Performance summary by estimator
ax4 = axes[1, 1]
performance_by_estimator = results_df.groupby('Estimator')['Error'].mean().sort_values()
ax4.bar(range(len(performance_by_estimator)), performance_by_estimator.values, alpha=0.7)
ax4.set_xlabel('Estimator')
ax4.set_ylabel('Mean Absolute Error')
ax4.set_title('Mean Error by Estimator')
ax4.set_xticks(range(len(performance_by_estimator)))
ax4.set_xticklabels(performance_by_estimator.index, rotation=45, ha='right')
ax4.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('outputs/estimator_performance_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
# Performance summary table
print("\n📊 Performance Summary by Category:")
category_summary = results_df.groupby('Category').agg({
'Error': ['mean', 'std', 'min', 'max'],
'Execution_Time': ['mean', 'std']
}).round(4)
print(category_summary)
# Best performing estimators
print("\n🏆 Top 5 Best Performing Estimators (by mean error):")
best_estimators = results_df.groupby('Estimator')['Error'].mean().sort_values().head()
for i, (estimator, error) in enumerate(best_estimators.items(), 1):
print(f" {i}. {estimator}: {error:.4f}")
# Save results
results_df.to_csv('outputs/estimator_results.csv', index=False)
print("\n💾 Results saved to outputs/estimator_results.csv")
else:
print("❌ No successful estimations to compare")
📊 Creating performance comparison visualizations...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 12
10 for category in results_df['Category'].unique():
11 category_data = results_df[results_df['Category'] == category]['Error']
---> 12 ax1.hist(category_data, alpha=0.7, label=category, bins=15)
13 ax1.set_xlabel('Absolute Error')
14 ax1.set_ylabel('Frequency')
File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/matplotlib/_api/deprecation.py:453, in make_keyword_only.<locals>.wrapper(*args, **kwargs)
447 if len(args) > name_idx:
448 warn_deprecated(
449 since, message="Passing the %(name)s %(obj_type)s "
450 "positionally is deprecated since Matplotlib %(since)s; the "
451 "parameter will become keyword-only in %(removal)s.",
452 name=name, obj_type=f"parameter of {func.__name__}()")
--> 453 return func(*args, **kwargs)
File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/matplotlib/__init__.py:1524, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
1521 @functools.wraps(func)
1522 def inner(ax, *args, data=None, **kwargs):
1523 if data is None:
-> 1524 return func(
1525 ax,
1526 *map(cbook.sanitize_sequence, args),
1527 **{k: cbook.sanitize_sequence(v) for k, v in kwargs.items()})
1529 bound = new_sig.bind(ax, *args, **kwargs)
1530 auto_label = (bound.arguments.get(label_namer)
1531 or bound.kwargs.get(label_namer))
File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/matplotlib/axes/_axes.py:7132, in Axes.hist(self, x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
7128 # Loop through datasets
7129 for i in range(nx):
7130 # this will automatically overwrite bins,
7131 # so that each histogram uses the same bins
-> 7132 m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
7133 tops.append(m)
7134 tops = np.array(tops, float) # causes problems later if it's an int
File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/numpy/lib/_histograms_impl.py:792, in histogram(a, bins, range, density, weights)
687 r"""
688 Compute the histogram of a dataset.
689
(...) 788
789 """
790 a, weights = _ravel_and_check_weights(a, weights)
--> 792 bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
794 # Histogram is an integer or a float array depending on the weights.
795 if weights is None:
File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/numpy/lib/_histograms_impl.py:425, in _get_bin_edges(a, bins, range, weights)
422 if n_equal_bins < 1:
423 raise ValueError('`bins` must be positive, when an integer')
--> 425 first_edge, last_edge = _get_outer_edges(a, range)
427 elif np.ndim(bins) == 1:
428 bin_edges = np.asarray(bins)
File ~/miniconda3/envs/lrdbenchmark/lib/python3.13/site-packages/numpy/lib/_histograms_impl.py:317, in _get_outer_edges(a, range)
315 first_edge, last_edge = a.min(), a.max()
316 if not (np.isfinite(first_edge) and np.isfinite(last_edge)):
--> 317 raise ValueError(
318 f"autodetected range of [{first_edge}, {last_edge}] is not finite")
320 # expand empty range to avoid divide by zero
321 if first_edge == last_edge:
ValueError: autodetected range of [nan, nan] is not finite
7. Decision Guidelines
When to Use Which Estimator
Classical Estimators
Best for: Standard LRD processes, interpretable results, fast computation
Use when: You need theoretical guarantees, have clean data, want fast results
Recommended: R/S Analysis, DFA, GPH for most applications
Machine Learning Estimators
Best for: Complex patterns, non-standard LRD processes, pre-trained models
Use when: You have diverse data types, need robust estimation, have computational resources
Recommended: Random Forest for general use, SVR for smooth patterns
Neural Network Estimators
Best for: High-dimensional data, complex non-linear patterns, large datasets
Use when: You have sufficient data, need state-of-the-art accuracy, can afford training time
Recommended: CNN for spatial patterns, LSTM for temporal sequences, Transformer for attention-based patterns
Performance Trade-offs
Accuracy vs Speed: Classical < ML < Neural (generally)
Interpretability: Classical > ML > Neural
Robustness: Depends on data quality and estimator choice
Computational Requirements: Classical < ML < Neural
8. Summary and Next Steps
Key Takeaways
Estimator Diversity: lrdbenchmark provides comprehensive coverage across three categories:
Classical: Fast, interpretable, theoretically grounded
Machine Learning: Robust, flexible, pre-trained models
Neural Networks: High accuracy, complex patterns, state-of-the-art
Performance Characteristics:
Classical estimators are fastest and most interpretable
ML estimators provide good balance of accuracy and robustness
Neural networks offer highest accuracy for complex patterns
Selection Guidelines:
Use classical estimators for standard LRD analysis
Use ML estimators for diverse data types and robustness
Use neural networks for complex patterns and high accuracy requirements
Next Steps
Benchmarking: Compare estimators systematically across different data types
Custom Estimators: Learn how to extend the library with custom estimators
Real-world Application: Apply estimators to actual time series data
Performance Optimization: Explore advanced optimization techniques
Files Generated
outputs/estimator_performance_comparison.png: Comprehensive performance visualizationoutputs/estimator_results.csv: Detailed results tablePerformance metrics and rankings
References
Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers, 116(1), 770-808.
Peng, C. K., et al. (1994). Mosaic organization of DNA nucleotides. Physical review E, 49(2), 1685.
Geweke, J., & Porter-Hudak, S. (1983). The estimation and application of long memory time series models. Journal of time series analysis, 4(4), 221-238.
Abry, P., & Veitch, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on information theory, 44(1), 2-15.
Next Tutorial: Custom Models and Estimators — learn how to extend the library with custom data models and estimators.