Data Models API

lrdbenchmark provides several synthetic data models for generating time series with known long-range dependence properties.

Base Model

class lrdbenchmark.models.data_models.base_model.BaseModel(**kwargs)[source]

Bases: ABC

Abstract base class for all stochastic models.

This class defines the interface that all stochastic models must implement, including methods for parameter validation, data generation, and model information retrieval.

__init__(**kwargs)[source]

Initialize the base model.

Parameters:: **kwargs (dict) – Model-specific parameters

abstractmethod _validate_parameters() → None[source]

Validate model parameters.

This method should be implemented by each model to ensure that the provided parameters are valid for the specific model.

abstractmethod generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]

Generate synthetic data from the model.

Parameters:

length (int, optional) – Length of the time series to generate (preferred parameter name)
seed (int, optional) – Random seed for reproducibility
n (int, optional) – Alternate parameter name for length (for backward compatibility)
rng (numpy.random.Generator, optional) – Pre-configured generator to use. When provided it takes precedence over seed and is used directly to drive the simulation.

Returns:

Generated time series

Return type:

np.ndarray

Notes

Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.

_analyze_convergence(data: ndarray, window_size: int = 500) → int[source]

Analyze convergence of statistical properties to find optimal starting point.

Parameters:

data (np.ndarray) – Time series data to analyze
window_size (int, default=500) – Size of sliding window for analysis

Returns:

Optimal starting point for analysis

Return type:

int

generate_converged(length: int, seed: int | None = None, convergence_factor: float = 2.0, rng: Generator | None = None) → ndarray[source]

Generate converged data by generating extra data and discarding initial transients.

Parameters:

length (int) – Desired length of the final time series
seed (int, optional) – Random seed for reproducibility
convergence_factor (float, default=2.0) – Factor to multiply length by for convergence analysis

Returns:

Generated time series of length n with converged behavior

Return type:

np.ndarray

generate_analysis_ready(length: int, seed: int | None = None, rng: Generator | None = None) → ndarray[source]

Generate data ready for analysis (converged by default).

This is the recommended method for generating data for analysis, as it automatically handles convergence and returns settled data.

Parameters:

length (int) – Desired length of the final time series
seed (int, optional) – Random seed for reproducibility

Returns:

Generated time series of length n with converged behavior

Return type:

np.ndarray

expected_hurst() → float | None[source]

Return the theoretical Hurst exponent implied by the current parameters.

By default this method looks for an H entry in self.parameters. Models where ( H ) is derived from other parameters (e.g., ARFIMA with fractional differencing d) should override this method accordingly.

static _resolve_generator(seed: int | None, rng: Generator | None) → Generator[source]

Choose or construct the generator to use for simulation.

Priority: explicit rng argument > seed > fresh generator.

generate_batch(n_series: int, length: int, seed: int | None = None, rng: Generator | None = None) → ndarray[source]

Generate multiple time series from the model.

Parameters:

n_series (int) – Number of time series to generate
length (int) – Length of each time series
seed (int, optional) – Random seed for reproducibility

Returns:

Generated time series array of shape (n_series, length)

Return type:

np.ndarray

generate_streaming(length: int, chunk_size: int = 1000, seed: int | None = None, rng: Generator | None = None)[source]

Generate data in streaming fashion for very large datasets.

Parameters:

length (int) – Total length of the time series to generate
chunk_size (int, default=1000) – Size of each chunk
seed (int, optional) – Random seed for reproducibility

Yields:

np.ndarray – Chunks of generated data

abstractmethod get_theoretical_properties() → Dict[str, Any][source]

Get theoretical properties of the model.

Returns:: Dictionary containing theoretical properties such as autocorrelation function, power spectral density, etc.
Return type:: dict

get_parameters() → Dict[str, Any][source]

Get current model parameters.

Returns:: Current model parameters
Return type:: dict

set_parameters(**kwargs) → None[source]

Set model parameters.

Parameters:: **kwargs (dict) – New parameter values

Fractional Brownian Motion (FBM)

class lrdbenchmark.models.data_models.fbm_model.FractionalBrownianMotion(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]

Bases: BaseModel

Fractional Brownian Motion (fBm) generator.

Generates fBm by cumulatively summing fGn increments.

__init__(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]

Initialize the fBm model.

Parameters:

H (float) – Hurst parameter (0 < H < 1)
sigma (float) – Standard deviation of the increments (fGn)

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]: Generate fBm time series.

__init__(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]

Initialize the fBm model.

Parameters:

H (float) – Hurst parameter (0 < H < 1)
sigma (float) – Standard deviation of the increments (fGn)

_validate_parameters() → None[source]: Validate model parameters.

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]: Generate fBm time series.

get_theoretical_properties() → Dict[str, Any][source]: Get theoretical properties of fBm.

get_increments(data: ndarray) → ndarray[source]: Get increments (fGn).

Fractional Gaussian Noise (FGN)

class lrdbenchmark.models.data_models.fgn_model.FractionalGaussianNoise(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]

Bases: BaseModel

Fractional Gaussian Noise (fGn) generator using the Davies-Harte method.

Generates exact fGn with Hurst parameter H using the Davies-Harte algorithm, which exploits the property that the circulant matrix of covariances can be diagonalized by the Fourier transform.

__init__(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]

Initialize the fGn model.

Parameters:

H (float) – Hurst parameter (0 < H < 1)
sigma (float) – Standard deviation of the process

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]: Generate fGn time series.

__init__(H: float = 0.7, sigma: float = 1.0, **kwargs)[source]

Initialize the fGn model.

Parameters:

H (float) – Hurst parameter (0 < H < 1)
sigma (float) – Standard deviation of the process

_validate_parameters() → None[source]: Validate model parameters.

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]: Generate fGn time series.

_davies_harte(N: int, rng: Generator) → ndarray[source]: Internal Davies-Harte implementation.

get_theoretical_properties() → Dict[str, Any][source]: Get theoretical properties of fGn.

ARFIMA Model

class lrdbenchmark.models.data_models.arfima_model.ARFIMAModel(d: float, ar_params: List[float] | None = None, ma_params: List[float] | None = None, sigma: float = 1.0, method: str = 'spectral')[source]

Bases: BaseModel

Autoregressive Fractionally Integrated Moving Average (ARFIMA) model.

ARFIMA(p,d,q) process combines autoregressive (AR), fractionally integrated (FI), and moving average (MA) components. The fractional integration parameter d controls long-range dependence and implies a Hurst index H = d + 0.5.

Parameters:

d (float) – Fractional integration parameter (-0.5 < d < 0.5). The implied Hurst exponent is d + 0.5.
ar_params (List[float], optional) – Autoregressive parameters (default: [])
ma_params (List[float], optional) – Moving average parameters (default: [])
sigma (float, optional) – Standard deviation of innovations (default: 1.0)
method (str, optional) – Generation method (default: ‘spectral’)

__init__(d: float, ar_params: List[float] | None = None, ma_params: List[float] | None = None, sigma: float = 1.0, method: str = 'spectral')[source]

Initialize the ARFIMA model.

Parameters:

d (float) – Fractional integration parameter (-0.5 < d < 0.5)
ar_params (List[float], optional) – Autoregressive parameters
ma_params (List[float], optional) – Moving average parameters
sigma (float, optional) – Standard deviation of innovations
method (str, optional) – Generation method

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]

Generate ARFIMA time series.

Parameters:

length (int, optional) – Length of the time series to generate
seed (int, optional) – Random seed for reproducibility
n (int, optional) – Alternate parameter name for length (for backward compatibility)

Returns:

Generated ARFIMA time series

Return type:

np.ndarray

Notes

Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.

__init__(d: float, ar_params: List[float] | None = None, ma_params: List[float] | None = None, sigma: float = 1.0, method: str = 'spectral')[source]

Initialize the ARFIMA model.

Parameters:

d (float) – Fractional integration parameter (-0.5 < d < 0.5)
ar_params (List[float], optional) – Autoregressive parameters
ma_params (List[float], optional) – Moving average parameters
sigma (float, optional) – Standard deviation of innovations
method (str, optional) – Generation method

_validate_parameters() → None[source]: Validate model parameters.

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]

Generate ARFIMA time series.

Parameters:

length (int, optional) – Length of the time series to generate
seed (int, optional) – Random seed for reproducibility
n (int, optional) – Alternate parameter name for length (for backward compatibility)

Returns:

Generated ARFIMA time series

Return type:

np.ndarray

Notes

Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.

_spectral_method(length: int, d: float, ar_params: List[float], ma_params: List[float], sigma: float) → ndarray[source]

Generate ARFIMA using efficient spectral method.

This method generates the process in the frequency domain using FFT, which is much more efficient than time-domain simulation.

_simulation_method(length: int, d: float, ar_params: List[float], ma_params: List[float], sigma: float) → ndarray[source]

Generate ARFIMA using efficient simulation method.

This method uses FFT-based fractional differencing and efficient AR/MA filtering with scipy.

_fractional_differencing_fft(data: ndarray, d: float) → ndarray[source]

Apply fractional differencing operator (1-L)^d using FFT.

This is much more efficient than the recursive method.

Parameters:

data (np.ndarray) – Input time series
d (float) – Fractional integration parameter

Returns:

Fractionally differenced series

Return type:

np.ndarray

_apply_ar_filter_efficient(data: ndarray, ar_params: List[float]) → ndarray[source]

Apply autoregressive filter efficiently using scipy.

Parameters:

data (np.ndarray) – Input time series
ar_params (List[float]) – AR parameters

Returns:

AR filtered series

Return type:

np.ndarray

_apply_ma_filter_efficient(data: ndarray, ma_params: List[float]) → ndarray[source]

Apply moving average filter efficiently using scipy.

Parameters:

data (np.ndarray) – Input time series
ma_params (List[float]) – MA parameters

Returns:

MA filtered series

Return type:

np.ndarray

_compute_spectral_density(freqs: ndarray, d: float, ar_params: List[float], ma_params: List[float], sigma: float) → ndarray[source]

Compute spectral density of ARFIMA process.

Parameters:

freqs (np.ndarray) – Frequencies
d (float) – Fractional integration parameter
ar_params (List[float]) – AR parameters
ma_params (List[float]) – MA parameters
sigma (float) – Standard deviation

Returns:

Spectral density

Return type:

np.ndarray

get_theoretical_properties() → Dict[str, Any][source]

Get theoretical properties of ARFIMA process.

Returns:: Dictionary containing theoretical properties
Return type:: dict

get_increments(arfima: ndarray) → ndarray[source]

Get the increments of ARFIMA process.

Parameters:: arfima (np.ndarray) – ARFIMA time series
Returns:: Increments (differences)
Return type:: np.ndarray

_require_scipy() → None[source]: Ensure SciPy is available before running simulation-heavy code.

expected_hurst() → float[source]

Return the implied Hurst exponent H = d + 0.5.

The fractional differencing parameter d lives in (-0.5, 0.5), so the resulting H is always in (0, 1).

Multifractal Random Walk (MRW)

class lrdbenchmark.models.data_models.mrw_model.MultifractalRandomWalk(H: float, lambda_param: float, sigma: float = 1.0, method: str = 'cascade')[source]

Bases: BaseModel

Multifractal Random Walk (MRW) model.

MRW is a multifractal process that exhibits scale-invariant properties and is characterized by a log-normal volatility cascade. It is defined by the Hurst parameter H and the intermittency parameter λ.

Parameters:

H (float) – Hurst parameter (0 < H < 1)
lambda_param (float) – Intermittency parameter (λ > 0)
sigma (float, optional) – Base volatility (default: 1.0)
method (str, optional) – Generation method (default: ‘cascade’)

__init__(H: float, lambda_param: float, sigma: float = 1.0, method: str = 'cascade')[source]

Initialize the Multifractal Random Walk model.

Parameters:

H (float) – Hurst parameter (0 < H < 1)
lambda_param (float) – Intermittency parameter (λ > 0)
sigma (float, optional) – Base volatility (default: 1.0)
method (str, optional) – Generation method (default: ‘cascade’)

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]

Generate multifractal random walk.

Parameters:

length (int, optional) – Length of the time series to generate
seed (int, optional) – Random seed for reproducibility
n (int, optional) – Alternate parameter name for length (for backward compatibility)

Returns:

Generated MRW time series

Return type:

np.ndarray

Notes

Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.

__init__(H: float, lambda_param: float, sigma: float = 1.0, method: str = 'cascade')[source]

Initialize the Multifractal Random Walk model.

Parameters:

H (float) – Hurst parameter (0 < H < 1)
lambda_param (float) – Intermittency parameter (λ > 0)
sigma (float, optional) – Base volatility (default: 1.0)
method (str, optional) – Generation method (default: ‘cascade’)

_validate_parameters() → None[source]: Validate model parameters.

generate(length: int | None = None, seed: int | None = None, n: int | None = None, rng: Generator | None = None) → ndarray[source]

Generate multifractal random walk.

Parameters:

length (int, optional) – Length of the time series to generate
seed (int, optional) – Random seed for reproducibility
n (int, optional) – Alternate parameter name for length (for backward compatibility)

Returns:

Generated MRW time series

Return type:

np.ndarray

Notes

Either ‘length’ or ‘n’ must be provided. If both are provided, ‘length’ takes precedence.

_cascade_method(length: int, H: float, lambda_param: float, sigma: float) → ndarray[source]

Generate MRW using volatility cascade method.

This method constructs a log-normal volatility cascade and applies it to a fractional Brownian motion.

_generate_volatility_cascade(length: int, lambda_param: float) → ndarray[source]

Generate log-normal volatility cascade.

Parameters:

length (int) – Length of the time series
lambda_param (float) – Intermittency parameter

Returns:

Log-volatility cascade

Return type:

lengthp.ndarray

expected_hurst() → float[source]: Return the configured large-scale Hurst exponent.

_generate_fbm(length: int, H: float, sigma: float) → ndarray[source]

Generate fractional Brownian motion using circulant embedding.

Parameters:

length (int) – Length of the time series
H (float) – Hurst parameter
sigma (float) – Standard deviation

Returns:

Fractional Brownian motion

Return type:

lengthp.ndarray

_direct_method(length: int, H: float, lambda_param: float, sigma: float) → ndarray[source]

Generate MRW using direct method.

This method directly generates the MRW process using the multifractal formalism.

get_theoretical_properties() → Dict[str, Any][source]

Get theoretical properties of MRW.

Returns:: Dictionary containing theoretical properties
Return type:: dict

get_increments(mrw: ndarray) → ndarray[source]

Get the increments of MRW.

Parameters:: mrw (np.ndarray) – Multifractal random walk time series
Returns:: Increments
Return type:: lengthp.ndarray

Convenience Aliases

For easier usage, lrdbenchmark provides shortened aliases for all data models:

from lrdbenchmark import FBMModel, FGNModel, ARFIMAModel, MRWModel

# These are equivalent to the full class names
fbm = FBMModel(H=0.7, sigma=1.0)
fgn = FGNModel(H=0.6, sigma=1.0)
arfima = ARFIMAModel(d=0.3, sigma=1.0)
mrw = MRWModel(H=0.7, lambda_param=0.1, sigma=1.0)

Convenience Functions

lrdbenchmark.models.data_models.create_fbm_model(H=0.7, sigma=1.0)[source]: Create FBMModel with default parameters

lrdbenchmark.models.data_models.create_fgn_model(H=0.6, sigma=1.0)[source]: Create FGNModel with default parameters

lrdbenchmark.models.data_models.create_arfima_model(d=0.2, sigma=1.0)[source]: Create ARFIMAModel with default parameters

lrdbenchmark.models.data_models.create_mrw_model(H=0.7, lambda_param=0.1, sigma=1.0)[source]: Create MRWModel with default parameters

Usage Examples

Basic Usage

from lrdbenchmark import FBMModel, FGNModel

# Generate FBM data
fbm_model = FBMModel(H=0.7, sigma=1.0)
fbm_data = fbm_model.generate(1000, seed=42)

# Generate FGN data
fgn_model = FGNModel(H=0.6, sigma=1.0)
fgn_data = fgn_model.generate(1000, seed=42)

Multiple Realizations

from lrdbenchmark import FBMModel
import numpy as np

# Generate multiple realizations
model = FBMModel(H=0.7, sigma=1.0)
realizations = []

for i in range(10):
    data = model.generate(1000, seed=i)
    realizations.append(data)

realizations = np.array(realizations)

Parameter Sweeps

from lrdbenchmark import FBMModel
import matplotlib.pyplot as plt

# Generate data with different H values
H_values = [0.3, 0.5, 0.7, 0.9]
datasets = {}

for H in H_values:
    model = FBMModel(H=H, sigma=1.0)
    datasets[f'H={H}'] = model.generate(1000, seed=42)

# Plot results
plt.figure(figsize=(12, 8))
for name, data in datasets.items():
    plt.plot(data[:200], label=name, alpha=0.7)

plt.title('FBM with Different H Values')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

Model Comparison

from lrdbenchmark import FBMModel, FGNModel, ARFIMAModel, MRWModel
import matplotlib.pyplot as plt

# Generate data from different models
models = {
    'FBM': FBMModel(H=0.7, sigma=1.0),
    'FGN': FGNModel(H=0.7, sigma=1.0),
    'ARFIMA': ARFIMAModel(d=0.3, sigma=1.0),
    'MRW': MRWModel(H=0.7, lambda_param=0.1, sigma=1.0)
}

datasets = {}
for name, model in models.items():
    datasets[name] = model.generate(1000, seed=42)

# Plot comparison
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes = axes.flatten()

for i, (name, data) in enumerate(datasets.items()):
    axes[i].plot(data[:200])
    axes[i].set_title(name)
    axes[i].grid(True)

plt.tight_layout()
plt.show()

Error Handling

from lrdbenchmark import FBMModel

try:
    # Invalid H value
    model = FBMModel(H=1.5, sigma=1.0)
except ValueError as e:
    print(f"Error: {e}")

try:
    # Invalid sigma value
    model = FBMModel(H=0.7, sigma=-1.0)
except ValueError as e:
    print(f"Error: {e}")

Performance Considerations

Memory Usage: Large datasets may require significant memory
GPU Acceleration: Some models support GPU acceleration when available
Parallel Generation: Use multiple processes for generating many realizations
Seed Management: Use different seeds for independent realizations

Note

All models generate numpy arrays by default. For GPU acceleration, the data can be converted to the appropriate tensor format.

Warning

Very large datasets (>1M samples) may cause memory issues. Consider generating data in chunks or using streaming approaches.