Data Models API
lrdbenchmark provides several synthetic data models for generating time series with known long-range dependence properties.
Base Model
- class lrdbenchmark.models.data_models.base_model.BaseModel(**kwargs)[source]
Bases:
ABCAbstract base class for all stochastic models.
This class defines the interface that all stochastic models must implement, including methods for parameter validation, data generation, and model information retrieval.
- __init__(**kwargs)[source]
Initialize the base model.
- Parameters:
**kwargs (dict) – Model-specific parameters
- abstractmethod _validate_parameters() None[source]
Validate model parameters.
This method should be implemented by each model to ensure that the provided parameters are valid for the specific model.
- abstractmethod generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) ndarray[source]
Generate synthetic data from the model.
- Parameters:
length (int, optional) – Length of the time series to generate (preferred parameter name)
seed (int, optional) – Random seed for reproducibility
n (int, optional) – Alternate parameter name for length (for backward compatibility)
rng (numpy.random.Generator, optional) – Pre-configured generator to use. When provided it takes precedence over
seedand is used directly to drive the simulation.
- Returns:
Generated time series
- Return type:
np.ndarray
Notes
Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.
- _analyze_convergence(data: ndarray, window_size: int = 500) int[source]
Analyze convergence of statistical properties to find optimal starting point.
- generate_converged(length: int, seed: int | None = None, convergence_factor: float = 2.0, rng: Generator | None = None) ndarray[source]
Generate converged data by generating extra data and discarding initial transients.
- Parameters:
- Returns:
Generated time series of length n with converged behavior
- Return type:
np.ndarray
- generate_analysis_ready(length: int, seed: int | None = None, rng: Generator | None = None) ndarray[source]
Generate data ready for analysis (converged by default).
This is the recommended method for generating data for analysis, as it automatically handles convergence and returns settled data.
- expected_hurst() float | None[source]
Return the theoretical Hurst exponent implied by the current parameters.
By default this method looks for an
Hentry inself.parameters. Models where ( H ) is derived from other parameters (e.g., ARFIMA with fractional differencingd) should override this method accordingly.
- static _resolve_generator(seed: int | None, rng: Generator | None) Generator[source]
Choose or construct the generator to use for simulation.
Priority: explicit
rngargument >seed> fresh generator.
- generate_batch(n_series: int, length: int, seed: int | None = None, rng: Generator | None = None) ndarray[source]
Generate multiple time series from the model.
- generate_streaming(length: int, chunk_size: int = 1000, seed: int | None = None, rng: Generator | None = None)[source]
Generate data in streaming fashion for very large datasets.
- abstractmethod get_theoretical_properties() Dict[str, Any][source]
Get theoretical properties of the model.
- Returns:
Dictionary containing theoretical properties such as autocorrelation function, power spectral density, etc.
- Return type:
Fractional Brownian Motion (FBM)
- class lrdbenchmark.models.data_models.fbm_model.FractionalBrownianMotion(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]
Bases:
BaseModelFractional Brownian Motion (fBm) generator.
Generates fBm by cumulatively summing fGn increments.
- generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) ndarray[source]
Generate fBm time series.
Fractional Gaussian Noise (FGN)
- class lrdbenchmark.models.data_models.fgn_model.FractionalGaussianNoise(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]
Bases:
BaseModelFractional Gaussian Noise (fGn) generator using the Davies-Harte method.
Generates exact fGn with Hurst parameter H using the Davies-Harte algorithm, which exploits the property that the circulant matrix of covariances can be diagonalized by the Fourier transform.
- generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) ndarray[source]
Generate fGn time series.
ARFIMA Model
- class lrdbenchmark.models.data_models.arfima_model.ARFIMAModel(d: float, ar_params: List[float] | None = None, ma_params: List[float] | None = None, sigma: float = 1.0, method: str = 'spectral')[source]
Bases:
BaseModelAutoregressive Fractionally Integrated Moving Average (ARFIMA) model.
ARFIMA(p,d,q) process combines autoregressive (AR), fractionally integrated (FI), and moving average (MA) components. The fractional integration parameter
dcontrols long-range dependence and implies a Hurst indexH = d + 0.5.- Parameters:
d (float) – Fractional integration parameter (-0.5 < d < 0.5). The implied Hurst exponent is d + 0.5.
ar_params (List[float], optional) – Autoregressive parameters (default: [])
ma_params (List[float], optional) – Moving average parameters (default: [])
sigma (float, optional) – Standard deviation of innovations (default: 1.0)
method (str, optional) – Generation method (default: ‘spectral’)
- __init__(d: float, ar_params: List[float] | None = None, ma_params: List[float] | None = None, sigma: float = 1.0, method: str = 'spectral')[source]
Initialize the ARFIMA model.
- Parameters:
- generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) ndarray[source]
Generate ARFIMA time series.
- Parameters:
- Returns:
Generated ARFIMA time series
- Return type:
np.ndarray
Notes
Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.
- __init__(d: float, ar_params: List[float] | None = None, ma_params: List[float] | None = None, sigma: float = 1.0, method: str = 'spectral')[source]
Initialize the ARFIMA model.
- Parameters:
- generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) ndarray[source]
Generate ARFIMA time series.
- Parameters:
- Returns:
Generated ARFIMA time series
- Return type:
np.ndarray
Notes
Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.
- _spectral_method(length: int, d: float, ar_params: List[float], ma_params: List[float], sigma: float) ndarray[source]
Generate ARFIMA using efficient spectral method.
This method generates the process in the frequency domain using FFT, which is much more efficient than time-domain simulation.
- _simulation_method(length: int, d: float, ar_params: List[float], ma_params: List[float], sigma: float) ndarray[source]
Generate ARFIMA using efficient simulation method.
This method uses FFT-based fractional differencing and efficient AR/MA filtering with scipy.
- _fractional_differencing_fft(data: ndarray, d: float) ndarray[source]
Apply fractional differencing operator (1-L)^d using FFT.
This is much more efficient than the recursive method.
- Parameters:
data (np.ndarray) – Input time series
d (float) – Fractional integration parameter
- Returns:
Fractionally differenced series
- Return type:
np.ndarray
- _apply_ar_filter_efficient(data: ndarray, ar_params: List[float]) ndarray[source]
Apply autoregressive filter efficiently using scipy.
- Parameters:
data (np.ndarray) – Input time series
ar_params (List[float]) – AR parameters
- Returns:
AR filtered series
- Return type:
np.ndarray
- _apply_ma_filter_efficient(data: ndarray, ma_params: List[float]) ndarray[source]
Apply moving average filter efficiently using scipy.
- Parameters:
data (np.ndarray) – Input time series
ma_params (List[float]) – MA parameters
- Returns:
MA filtered series
- Return type:
np.ndarray
- _compute_spectral_density(freqs: ndarray, d: float, ar_params: List[float], ma_params: List[float], sigma: float) ndarray[source]
Compute spectral density of ARFIMA process.
- get_theoretical_properties() Dict[str, Any][source]
Get theoretical properties of ARFIMA process.
- Returns:
Dictionary containing theoretical properties
- Return type:
Multifractal Random Walk (MRW)
- class lrdbenchmark.models.data_models.mrw_model.MultifractalRandomWalk(H: float, lambda_param: float, sigma: float = 1.0, method: str = 'cascade')[source]
Bases:
BaseModelMultifractal Random Walk (MRW) model.
MRW is a multifractal process that exhibits scale-invariant properties and is characterized by a log-normal volatility cascade. It is defined by the Hurst parameter H and the intermittency parameter λ.
- Parameters:
- __init__(H: float, lambda_param: float, sigma: float = 1.0, method: str = 'cascade')[source]
Initialize the Multifractal Random Walk model.
- generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) ndarray[source]
Generate multifractal random walk.
- Parameters:
- Returns:
Generated MRW time series
- Return type:
np.ndarray
Notes
Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.
- __init__(H: float, lambda_param: float, sigma: float = 1.0, method: str = 'cascade')[source]
Initialize the Multifractal Random Walk model.
- generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) ndarray[source]
Generate multifractal random walk.
- Parameters:
- Returns:
Generated MRW time series
- Return type:
np.ndarray
Notes
Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.
- _cascade_method(length: int, H: float, lambda_param: float, sigma: float) ndarray[source]
Generate MRW using volatility cascade method.
This method constructs a log-normal volatility cascade and applies it to a fractional Brownian motion.
- _generate_volatility_cascade(length: int, lambda_param: float) ndarray[source]
Generate log-normal volatility cascade.
- _generate_fbm(length: int, H: float, sigma: float) ndarray[source]
Generate fractional Brownian motion using circulant embedding.
- _direct_method(length: int, H: float, lambda_param: float, sigma: float) ndarray[source]
Generate MRW using direct method.
This method directly generates the MRW process using the multifractal formalism.
Convenience Aliases
For easier usage, lrdbenchmark provides shortened aliases for all data models:
from lrdbenchmark import FBMModel, FGNModel, ARFIMAModel, MRWModel
# These are equivalent to the full class names
fbm = FBMModel(H=0.7, sigma=1.0)
fgn = FGNModel(H=0.6, sigma=1.0)
arfima = ARFIMAModel(d=0.3, sigma=1.0)
mrw = MRWModel(H=0.7, lambda_param=0.1, sigma=1.0)
Convenience Functions
- lrdbenchmark.models.data_models.create_fbm_model(H=0.7, sigma=1.0)[source]
Create FBMModel with default parameters
- lrdbenchmark.models.data_models.create_fgn_model(H=0.6, sigma=1.0)[source]
Create FGNModel with default parameters
Usage Examples
Basic Usage
from lrdbenchmark import FBMModel, FGNModel
# Generate FBM data
fbm_model = FBMModel(H=0.7, sigma=1.0)
fbm_data = fbm_model.generate(1000, seed=42)
# Generate FGN data
fgn_model = FGNModel(H=0.6, sigma=1.0)
fgn_data = fgn_model.generate(1000, seed=42)
Multiple Realizations
from lrdbenchmark import FBMModel
import numpy as np
# Generate multiple realizations
model = FBMModel(H=0.7, sigma=1.0)
realizations = []
for i in range(10):
data = model.generate(1000, seed=i)
realizations.append(data)
realizations = np.array(realizations)
Parameter Sweeps
from lrdbenchmark import FBMModel
import matplotlib.pyplot as plt
# Generate data with different H values
H_values = [0.3, 0.5, 0.7, 0.9]
datasets = {}
for H in H_values:
model = FBMModel(H=H, sigma=1.0)
datasets[f'H={H}'] = model.generate(1000, seed=42)
# Plot results
plt.figure(figsize=(12, 8))
for name, data in datasets.items():
plt.plot(data[:200], label=name, alpha=0.7)
plt.title('FBM with Different H Values')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()
Model Comparison
from lrdbenchmark import FBMModel, FGNModel, ARFIMAModel, MRWModel
import matplotlib.pyplot as plt
# Generate data from different models
models = {
'FBM': FBMModel(H=0.7, sigma=1.0),
'FGN': FGNModel(H=0.7, sigma=1.0),
'ARFIMA': ARFIMAModel(d=0.3, sigma=1.0),
'MRW': MRWModel(H=0.7, lambda_param=0.1, sigma=1.0)
}
datasets = {}
for name, model in models.items():
datasets[name] = model.generate(1000, seed=42)
# Plot comparison
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes = axes.flatten()
for i, (name, data) in enumerate(datasets.items()):
axes[i].plot(data[:200])
axes[i].set_title(name)
axes[i].grid(True)
plt.tight_layout()
plt.show()
Error Handling
from lrdbenchmark import FBMModel
try:
# Invalid H value
model = FBMModel(H=1.5, sigma=1.0)
except ValueError as e:
print(f"Error: {e}")
try:
# Invalid sigma value
model = FBMModel(H=0.7, sigma=-1.0)
except ValueError as e:
print(f"Error: {e}")
Performance Considerations
Memory Usage: Large datasets may require significant memory
GPU Acceleration: Some models support GPU acceleration when available
Parallel Generation: Use multiple processes for generating many realizations
Seed Management: Use different seeds for independent realizations
Note
All models generate numpy arrays by default. For GPU acceleration, the data can be converted to the appropriate tensor format.
Warning
Very large datasets (>1M samples) may cause memory issues. Consider generating data in chunks or using streaming approaches.