Machine Learning Estimators

LRDBenchmark provides production-ready machine learning estimators for Long-Range Dependence (LRD) estimation. These estimators achieve excellent performance with perfect robustness, with Gradient Boosting achieving the best ML performance at 0.193 MAE.

Overview

The machine learning estimators use advanced feature engineering with 50-70 engineered features per model, including:

  • Statistical Features: Mean, standard deviation, skewness, kurtosis

  • Time Series Features: Autocorrelation at multiple lags, variance of increments

  • Spectral Features: Power spectrum analysis, frequency band ratios, spectral slope

  • DFA Features: Detrended fluctuation analysis with slope calculation

  • Wavelet Features: Wavelet variance at different scales, wavelet slope

  • R/S Analysis Features: Rescaled range analysis with slope calculation

  • Additional Features: Trend analysis, seasonality detection, entropy measures

SVR Estimator

Support Vector Regression estimator with RBF kernel and comprehensive feature engineering.

class lrdbenchmark.analysis.machine_learning.svr_estimator_unified.SVREstimator(model_path: str | None = None)[source]

Bases: BaseEstimator

SVR estimator using unified feature extraction. Works with pre-trained models expecting 29 features.

__init__(model_path: str | None = None)[source]

Initialize the SVR estimator.

Parameters:

model_path – Path to the pre-trained model. If None, uses default path.

_validate_parameters() None[source]

Validate estimator parameters.

_get_default_model_path() str | None[source]

Resolve the default pretrained model path, downloading if required.

_load_model()[source]

Load the pre-trained SVR model.

_update_feature_metadata()[source]

Update expected feature count and default feature names.

_extract_features_for_model(data: ndarray) ndarray[source]

Extract feature vector that matches the pretrained model.

estimate(data: ndarray) Dict[str, Any][source]

Estimate Hurst parameter using SVR with unified features.

Parameters:

data – Time series data

Returns:

Dictionary containing estimation results

get_support_vectors() ndarray | None[source]

Get support vectors from the SVR model.

Returns:

Array of support vectors or None if model not loaded

get_feature_names() List[str] | None[source]

Get feature names used by the model.

Returns:

List of feature names or None if not available

is_model_available() bool[source]

Check if the pre-trained model is available.

Returns:

True if model is available and loaded

Performance: 0.202 MAE, 100% success rate, 0.009s execution time

Key Features: * RBF kernel with configurable parameters (C, gamma, epsilon) * 50+ engineered features including spectral and DFA analysis * Model persistence with save/load functionality * Robust error handling with fallback to R/S analysis

Example Usage:

from lrdbenchmark import SVREstimator
import numpy as np

# Initialize estimator
svr = SVREstimator(kernel='rbf', C=1.0, gamma='scale')

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
svr.train(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = svr.predict(new_data)

Gradient Boosting Estimator

Gradient Boosting Regressor with comprehensive feature engineering - Best Overall Performance.

class lrdbenchmark.analysis.machine_learning.gradient_boosting_estimator_unified.GradientBoostingEstimator(model_path: str | None = None)[source]

Bases: BaseEstimator

Gradient Boosting estimator using unified feature extraction. Works with pre-trained models expecting 54 features.

__init__(model_path: str | None = None)[source]

Initialize the Gradient Boosting estimator.

Parameters:

model_path – Path to the pre-trained model. If None, uses default path.

_validate_parameters() None[source]

Validate estimator parameters.

_get_default_model_path() str | None[source]

Resolve the default pretrained model path, downloading if required.

_load_model()[source]

Load the pre-trained Gradient Boosting model.

_update_feature_metadata()[source]

Update expected feature count and default feature names.

_extract_features_for_model(data: ndarray) ndarray[source]

Extract feature vector that matches the pretrained model.

estimate(data: ndarray) Dict[str, Any][source]

Estimate Hurst parameter using Gradient Boosting with unified features.

Parameters:

data – Time series data

Returns:

Dictionary containing estimation results

get_feature_importance() ndarray | None[source]

Get feature importance from the Gradient Boosting model.

Returns:

Array of feature importances or None if model not loaded

get_feature_names() List[str] | None[source]

Get feature names used by the model.

Returns:

List of feature names or None if not available

is_model_available() bool[source]

Check if the pre-trained model is available.

Returns:

True if model is available and loaded

Performance: 0.193 MAE (Best ML), 100% success rate, 0.013s execution time

Key Features: * Configurable parameters (n_estimators, learning_rate, max_depth) * 60+ engineered features including advanced spectral and DFA analysis * Feature importance analysis * Model persistence with save/load functionality * Robust error handling with fallback to R/S analysis

Example Usage:

from lrdbenchmark import GradientBoostingEstimator
import numpy as np

# Initialize estimator
gb = GradientBoostingEstimator(n_estimators=50, learning_rate=0.1)

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
gb.train(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = gb.predict(new_data)

# Get feature importance
importance = gb.get_feature_importance()

Random Forest Estimator

Random Forest Regressor with comprehensive feature engineering and feature importance analysis.

class lrdbenchmark.analysis.machine_learning.random_forest_estimator_unified.RandomForestEstimator(model_path: str | None = None)[source]

Bases: BaseEstimator

Random Forest estimator using unified feature extraction. Works with pre-trained models expecting 76 features.

__init__(model_path: str | None = None)[source]

Initialize the Random Forest estimator.

Parameters:

model_path – Path to the pre-trained model. If None, uses default path.

_validate_parameters() None[source]

Validate estimator parameters.

_get_default_model_path() str | None[source]

Resolve the default pretrained model path, downloading if required.

_load_model()[source]

Load the pre-trained Random Forest model.

_update_feature_metadata()[source]

Update expected feature count and default feature names.

_extract_features_for_model(data: ndarray) ndarray[source]

Extract feature vector that matches the pretrained model.

estimate(data: ndarray) Dict[str, Any][source]

Estimate Hurst parameter using Random Forest with unified features.

Parameters:

data – Time series data

Returns:

Dictionary containing estimation results

get_feature_importance() ndarray | None[source]

Get feature importance from the Random Forest model.

Returns:

Array of feature importances or None if model not loaded

get_feature_names() List[str] | None[source]

Get feature names used by the model.

Returns:

List of feature names or None if not available

is_model_available() bool[source]

Check if the pre-trained model is available.

Returns:

True if model is available and loaded

Performance: 0.202 MAE, 100% success rate, 2.099s execution time

Key Features: * Configurable parameters (n_estimators, max_depth, min_samples_split) * 70+ engineered features including fractal dimension and approximate entropy * Feature importance analysis * Model persistence with save/load functionality * Robust error handling with fallback to R/S analysis

Example Usage:

from lrdbenchmark import RandomForestEstimator
import numpy as np

# Initialize estimator
rf = RandomForestEstimator(n_estimators=50, max_depth=5)

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
rf.train(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = rf.predict(new_data)

# Get feature importance
importance = rf.get_feature_importance()

Neural Network Factory

For advanced neural network configuration and training, use the Neural Network Factory which provides a comprehensive framework for creating and managing various neural network architectures.

Note

See Neural Network Factory API for complete documentation of the Neural Network Factory API, including NNConfig, NNArchitecture, and create_all_benchmark_networks functions.

Example Usage:

from lrdbenchmark import NeuralNetworkFactory, FBMModel
from lrdbenchmark.analysis.machine_learning.neural_network_factory import NNConfig, NNArchitecture
import numpy as np

# Create factory
factory = NeuralNetworkFactory()

# Configure network
config = NNConfig(
    architecture=NNArchitecture.CNN,
    input_length=500,
    hidden_dims=[64, 32],
    learning_rate=0.001,
    epochs=20
)

# Create network
network = factory.create_network(config)

# Generate training data
X_train = np.random.randn(100, 500)
y_train = np.random.uniform(0.2, 0.8, 100)

# Train model
history = network.train_model(X_train, y_train)

# Make prediction
new_data = np.random.randn(1, 500)
prediction = network.predict(new_data)

Performance Comparison

Method | Mean Error | Execution Time | Success Rate | Category |

|--------|————|----------------|————–|----------| | LSTM | 0.097 | 0.0012s | 100% | Neural Networks | | CNN | 0.103 | 0.0064s | 100% | Neural Networks | | Transformer | 0.106 | 0.0026s | 100% | Neural Networks | | GRU | 0.108 | 0.0007s | 100% | Neural Networks | | R/S | 0.099 | 0.348s | 100% | Classical | | GradientBoosting | 0.193 | 0.013s | 100% | ML | | SVR | 0.202 | 0.009s | 100% | ML | | Whittle | 0.200 | 0.0002s | 100% | Classical | | Periodogram | 0.205 | 0.0005s | 100% | Classical | | CWT | 0.269 | 0.063s | 100% | Classical |

Key Advantages

  • Excellent Performance: Strong performance with perfect robustness

  • Advanced Feature Engineering: 50-70 engineered features per model

  • Production Ready: Model persistence, error handling, and deployment capabilities

  • Comprehensive Testing: 100% success rate across all test cases

  • Research Quality: Publication-ready results with detailed performance metrics

Best Practices

  1. For Highest Accuracy: Use LSTM Neural Network (0.097 MAE)

  2. For Fast ML Performance: Use SVR (0.009s execution time)

  3. For Feature Analysis: Use Random Forest (feature importance available)

  4. For Production Deployment: Use Production ML System with train-once, apply-many workflow

  5. For Real-time Applications: Use GRU Neural Network (0.0007s execution time)

See Also