Machine Learning Estimators

LRDBenchmark provides production-ready machine learning estimators for Long-Range Dependence (LRD) estimation. These estimators achieve excellent performance with perfect robustness, with Gradient Boosting achieving the best ML performance at 0.193 MAE.

Overview

The machine learning estimators use advanced feature engineering with 50-70 engineered features per model, including:

Statistical Features: Mean, standard deviation, skewness, kurtosis
Time Series Features: Autocorrelation at multiple lags, variance of increments
Spectral Features: Power spectrum analysis, frequency band ratios, spectral slope
DFA Features: Detrended fluctuation analysis with slope calculation
Wavelet Features: Wavelet variance at different scales, wavelet slope
R/S Analysis Features: Rescaled range analysis with slope calculation
Additional Features: Trend analysis, seasonality detection, entropy measures

SVR Estimator

Support Vector Regression estimator with RBF kernel and comprehensive feature engineering.

class lrdbenchmark.analysis.machine_learning.svr_estimator_unified.SVREstimator(model_path: str | None = None)[source]

Bases: BaseEstimator

SVR estimator using unified feature extraction. Works with pre-trained models expecting 29 features.

__init__(model_path: str | None = None)[source]

Initialize the SVR estimator.

Parameters:: model_path – Path to the pre-trained model. If None, uses default path.

_validate_parameters() → None[source]: Validate estimator parameters.

_get_default_model_path() → str | None[source]: Resolve the default pretrained model path, downloading if required.

_load_model()[source]: Load the pre-trained SVR model.

_update_feature_metadata()[source]: Update expected feature count and default feature names.

_extract_features_for_model(data: ndarray) → ndarray[source]: Extract feature vector that matches the pretrained model.

estimate(data: ndarray) → Dict[str, Any][source]

Estimate Hurst parameter using SVR with unified features.

Parameters:: data – Time series data
Returns:: Dictionary containing estimation results

get_support_vectors() → ndarray | None[source]

Get support vectors from the SVR model.

Returns:: Array of support vectors or None if model not loaded

get_feature_names() → List[str] | None[source]

Get feature names used by the model.

Returns:: List of feature names or None if not available

is_model_available() → bool[source]

Check if the pre-trained model is available.

Returns:: True if model is available and loaded

Performance: 0.202 MAE, 100% success rate, 0.009s execution time

Key Features: * RBF kernel with configurable parameters (C, gamma, epsilon) * 50+ engineered features including spectral and DFA analysis * Model persistence with save/load functionality * Robust error handling with fallback to R/S analysis

Example Usage:

from lrdbenchmark import SVREstimator
import numpy as np

# Initialize estimator
svr = SVREstimator(kernel='rbf', C=1.0, gamma='scale')

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
svr.train(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = svr.predict(new_data)

Gradient Boosting Estimator

Gradient Boosting Regressor with comprehensive feature engineering - Best Overall Performance.

class lrdbenchmark.analysis.machine_learning.gradient_boosting_estimator_unified.GradientBoostingEstimator(model_path: str | None = None)[source]

Bases: BaseEstimator

Gradient Boosting estimator using unified feature extraction. Works with pre-trained models expecting 54 features.

__init__(model_path: str | None = None)[source]

Initialize the Gradient Boosting estimator.

Parameters:: model_path – Path to the pre-trained model. If None, uses default path.

_validate_parameters() → None[source]: Validate estimator parameters.

_get_default_model_path() → str | None[source]: Resolve the default pretrained model path, downloading if required.

_load_model()[source]: Load the pre-trained Gradient Boosting model.

_update_feature_metadata()[source]: Update expected feature count and default feature names.

_extract_features_for_model(data: ndarray) → ndarray[source]: Extract feature vector that matches the pretrained model.

estimate(data: ndarray) → Dict[str, Any][source]

Estimate Hurst parameter using Gradient Boosting with unified features.

Parameters:: data – Time series data
Returns:: Dictionary containing estimation results

get_feature_importance() → ndarray | None[source]

Get feature importance from the Gradient Boosting model.

Returns:: Array of feature importances or None if model not loaded

get_feature_names() → List[str] | None[source]

Get feature names used by the model.

Returns:: List of feature names or None if not available

is_model_available() → bool[source]

Check if the pre-trained model is available.

Returns:: True if model is available and loaded

Performance: 0.193 MAE (Best ML), 100% success rate, 0.013s execution time

Key Features: * Configurable parameters (n_estimators, learning_rate, max_depth) * 60+ engineered features including advanced spectral and DFA analysis * Feature importance analysis * Model persistence with save/load functionality * Robust error handling with fallback to R/S analysis

Example Usage:

from lrdbenchmark import GradientBoostingEstimator
import numpy as np

# Initialize estimator
gb = GradientBoostingEstimator(n_estimators=50, learning_rate=0.1)

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
gb.train(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = gb.predict(new_data)

# Get feature importance
importance = gb.get_feature_importance()

Random Forest Estimator

Random Forest Regressor with comprehensive feature engineering and feature importance analysis.

class lrdbenchmark.analysis.machine_learning.random_forest_estimator_unified.RandomForestEstimator(model_path: str | None = None)[source]

Bases: BaseEstimator

Random Forest estimator using unified feature extraction. Works with pre-trained models expecting 76 features.

__init__(model_path: str | None = None)[source]

Initialize the Random Forest estimator.

Parameters:: model_path – Path to the pre-trained model. If None, uses default path.

_validate_parameters() → None[source]: Validate estimator parameters.

_get_default_model_path() → str | None[source]: Resolve the default pretrained model path, downloading if required.

_load_model()[source]: Load the pre-trained Random Forest model.

_update_feature_metadata()[source]: Update expected feature count and default feature names.

_extract_features_for_model(data: ndarray) → ndarray[source]: Extract feature vector that matches the pretrained model.

estimate(data: ndarray) → Dict[str, Any][source]

Estimate Hurst parameter using Random Forest with unified features.

Parameters:: data – Time series data
Returns:: Dictionary containing estimation results

get_feature_importance() → ndarray | None[source]

Get feature importance from the Random Forest model.

Returns:: Array of feature importances or None if model not loaded

get_feature_names() → List[str] | None[source]

Get feature names used by the model.

Returns:: List of feature names or None if not available

is_model_available() → bool[source]

Check if the pre-trained model is available.

Returns:: True if model is available and loaded

Performance: 0.202 MAE, 100% success rate, 2.099s execution time

Key Features: * Configurable parameters (n_estimators, max_depth, min_samples_split) * 70+ engineered features including fractal dimension and approximate entropy * Feature importance analysis * Model persistence with save/load functionality * Robust error handling with fallback to R/S analysis

Example Usage:

from lrdbenchmark import RandomForestEstimator
import numpy as np

# Initialize estimator
rf = RandomForestEstimator(n_estimators=50, max_depth=5)

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
rf.train(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = rf.predict(new_data)

# Get feature importance
importance = rf.get_feature_importance()

Neural Network Factory

For advanced neural network configuration and training, use the Neural Network Factory which provides a comprehensive framework for creating and managing various neural network architectures.

Note

See Neural Network Factory API for complete documentation of the Neural Network Factory API, including NNConfig, NNArchitecture, and create_all_benchmark_networks functions.

Example Usage:

from lrdbenchmark import NeuralNetworkFactory, FBMModel
from lrdbenchmark.analysis.machine_learning.neural_network_factory import NNConfig, NNArchitecture
import numpy as np

# Create factory
factory = NeuralNetworkFactory()

# Configure network
config = NNConfig(
    architecture=NNArchitecture.CNN,
    input_length=500,
    hidden_dims=[64, 32],
    learning_rate=0.001,
    epochs=20
)

# Create network
network = factory.create_network(config)

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
history = network.train_model(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = network.predict(new_data)

Performance Comparison

|--------|————|----------------|————–|----------| | LSTM | 0.097 | 0.0012s | 100% | Neural Networks | | CNN | 0.103 | 0.0064s | 100% | Neural Networks | | Transformer | 0.106 | 0.0026s | 100% | Neural Networks | | GRU | 0.108 | 0.0007s | 100% | Neural Networks | | R/S | 0.099 | 0.348s | 100% | Classical | | GradientBoosting | 0.193 | 0.013s | 100% | ML | | SVR | 0.202 | 0.009s | 100% | ML | | Whittle | 0.200 | 0.0002s | 100% | Classical | | Periodogram | 0.205 | 0.0005s | 100% | Classical | | CWT | 0.269 | 0.063s | 100% | Classical |

Key Advantages

Excellent Performance: Strong performance with perfect robustness
Advanced Feature Engineering: 50-70 engineered features per model
Production Ready: Model persistence, error handling, and deployment capabilities
Comprehensive Testing: 100% success rate across all test cases
Research Quality: Publication-ready results with detailed performance metrics

Best Practices

For Highest Accuracy: Use LSTM Neural Network (0.097 MAE)
For Fast ML Performance: Use SVR (0.009s execution time)
For Feature Analysis: Use Random Forest (feature importance available)
For Production Deployment: Use Production ML System with train-once, apply-many workflow
For Real-time Applications: Use GRU Neural Network (0.0007s execution time)

Machine Learning Estimators

Overview

SVR Estimator

Gradient Boosting Estimator

Random Forest Estimator

Neural Network Factory

Performance Comparison

Key Advantages

Best Practices

See Also