Generation Module API
This module provides data generators for testing LRD estimators under various conditions.
Time Series Generator
- class lrdbenchmark.generation.TimeSeriesGenerator(random_state: int | None = None)[source]
Bases:
objectUnified Time Series Generator for LRD Benchmark.
This class handles the end-to-end generation process: 1. Base signal generation (FBM, FGN, ARFIMA, MRW) 2. Contamination application (Noise, Trends, Artifacts) 3. Preprocessing (Detrending, Winsorizing, Normalization) – “Baked In”
- __init__(random_state: int | None = None)[source]
Initialize the generator.
- Parameters:
random_state (int, optional) – Global random seed.
- generate(model: str, length: int, params: Dict[str, Any], contamination: List[Dict[str, Any]] | None = None, preprocess: bool = True, preprocess_params: Dict[str, Any] | None = None, seed: int | None = None) Dict[str, Any][source]
Generate a processed time series.
- Parameters:
model (str) – Model name (‘fbm’, ‘fgn’, ‘arfima’, ‘mrw’). Case-insensitive.
length (int) – Length of the time series.
params (dict) – Parameters for the model (e.g., {‘H’: 0.7}).
contamination (list of dicts, optional) –
List of contamination specs to apply sequentially. Each dict should have:
’scenario’: ConfoundingScenario enum or str name
’intensity’: float (0.0 to 1.0)
’params’: dict (scenario-specific parameters)
preprocess (bool, default=True) – Whether to apply the baked-in preprocessing pipeline.
preprocess_params (dict, optional) – Overrides for preprocessing configuration (e.g. {‘enable_detrend’: False}).
seed (int, optional) – Specific seed for this generation.
- Returns:
Result dictionary containing: - ‘signal’: The final processed numpy array - ‘clean_signal’: The clean signal before contamination - ‘contaminated_signal’: Signal after contamination but before preprocessing - ‘metadata’: Full generation metadata (true params, contamination info, preprocessing info)
- Return type:
Nonstationarity Generators
Generators for time-varying Hurst parameter testing.
RegimeSwitchingProcess
- class lrdbenchmark.generation.RegimeSwitchingProcess(h_regimes: List[float], change_points: List[float] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]
Generate time series with abrupt H transitions at specified change points.
Useful for testing estimator behavior under broken stationarity conditions.
Example
>>> gen = RegimeSwitchingProcess(h_regimes=[0.3, 0.8], change_points=[0.5]) >>> result = gen.generate(1000) >>> # First half has H=0.3, second half has H=0.8
ContinuousDriftProcess
- class lrdbenchmark.generation.ContinuousDriftProcess(h_start: float = 0.3, h_end: float = 0.8, drift_type: str | DriftType = DriftType.LINEAR, drift_params: Dict[str, Any] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]
Generate time series with smoothly varying H(t).
Supports linear, sinusoidal, logistic, and exponential drift patterns. Useful for testing estimator behavior under gradual nonstationarity.
Example
>>> gen = ContinuousDriftProcess(h_start=0.3, h_end=0.8, drift_type='linear') >>> result = gen.generate(1000) >>> # H increases linearly from 0.3 to 0.8
- __init__(h_start: float = 0.3, h_end: float = 0.8, drift_type: str | DriftType = DriftType.LINEAR, drift_params: Dict[str, Any] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]
Initialize continuous drift process.
- Parameters:
h_start (float) – Initial H value
h_end (float) – Final H value
drift_type (str or DriftType) – Type of drift: ‘linear’, ‘sinusoidal’, ‘logistic’, ‘exponential’
drift_params (dict, optional) – Additional parameters for drift function: - sinusoidal: {‘frequency’: float, ‘phase’: float} - logistic: {‘steepness’: float, ‘midpoint’: float} - exponential: {‘rate’: float}
sigma (float) – Standard deviation scaling
random_state (int, optional) – Random seed
StructuralBreakProcess
- class lrdbenchmark.generation.StructuralBreakProcess(h_before: float = 0.7, h_after: float = 0.4, break_position: float = 0.5, break_severity: float = 0.0, variance_change: float = 1.0, n_breaks: int = 1, sigma: float = 1.0, random_state: int | None = None)[source]
Generate time series with structural breaks (abrupt level/variance shifts).
Combines H regime switching with additional level shifts and variance changes to create more realistic nonstationary scenarios.
Example
>>> gen = StructuralBreakProcess( ... h_before=0.7, h_after=0.4, ... break_position=0.5, break_severity=0.3 ... ) >>> result = gen.generate(1000)
- __init__(h_before: float = 0.7, h_after: float = 0.4, break_position: float = 0.5, break_severity: float = 0.0, variance_change: float = 1.0, n_breaks: int = 1, sigma: float = 1.0, random_state: int | None = None)[source]
Initialize structural break process.
- Parameters:
h_before (float) – H value before break(s)
h_after (float) – H value after break(s)
break_position (float) – Relative position of break in (0, 1). For multiple breaks, positions are evenly distributed.
break_severity (float) – Magnitude of level shift at break (0 = no shift)
variance_change (float) – Multiplicative factor for variance after break (1 = no change)
n_breaks (int) – Number of structural breaks
sigma (float) – Standard deviation scaling
random_state (int, optional) – Random seed
EnsembleTimeAverageProcess
- class lrdbenchmark.generation.EnsembleTimeAverageProcess(H: float = 0.7, aging_exponent: float = 0.5, aging_type: str = 'power_law', sigma: float = 1.0, random_state: int | None = None)[source]
Generate data for testing ergodicity violation.
In equilibrium systems, ensemble averages equal time averages. This process generates data where this equivalence is broken, which is characteristic of aging/nonequilibrium systems where classical estimators fail.
Example
>>> gen = EnsembleTimeAverageProcess(H=0.7, aging_exponent=0.5) >>> result = gen.generate(1000) >>> # Signal exhibits aging behavior
- __init__(H: float = 0.7, aging_exponent: float = 0.5, aging_type: str = 'power_law', sigma: float = 1.0, random_state: int | None = None)[source]
Initialize ensemble-time average process.
- generate_ensemble(n_realizations: int, length: int, seed: int | None = None) Dict[str, Any][source]
Generate ensemble of realizations for ergodicity testing.
- Parameters:
- Returns:
Dictionary containing: - ‘ensemble’: Array of shape (n_realizations, length) - ‘ensemble_mean’: Mean across realizations at each time - ‘time_mean’: Mean across time for each realization - ‘h_trajectory’: True H trajectory
- Return type:
Critical Regime Models
Physics-motivated generators for critical and nonequilibrium regimes.
OrnsteinUhlenbeckProcess
- class lrdbenchmark.generation.OrnsteinUhlenbeckProcess(theta_start: float = 0.1, theta_end: float = 1.0, sigma: float = 1.0, transition_type: str = 'linear', dt: float = 0.01, random_state: int | None = None)[source]
Ornstein-Uhlenbeck process with time-varying friction coefficient.
Models transient criticality where the system transitions between different relaxation regimes.
- The SDE is:
dX_t = -θ(t) * X_t * dt + σ * dW_t
where θ(t) is the time-varying friction coefficient.
Example
>>> gen = OrnsteinUhlenbeckProcess(theta_start=0.1, theta_end=1.0) >>> result = gen.generate(1000)
- __init__(theta_start: float = 0.1, theta_end: float = 1.0, sigma: float = 1.0, transition_type: str = 'linear', dt: float = 0.01, random_state: int | None = None)[source]
Initialize OU process with time-varying friction.
- Parameters:
theta_start (float) – Initial friction coefficient (low = critical-like)
theta_end (float) – Final friction coefficient
sigma (float) – Noise intensity
transition_type (str) – How θ transitions: ‘linear’, ‘exponential’, ‘step’
dt (float) – Time step for simulation
random_state (int, optional) – Random seed
SubordinatedProcess
- class lrdbenchmark.generation.SubordinatedProcess(alpha: float = 0.7, sigma: float = 1.0, random_state: int | None = None)[source]
Subordinated Brownian motion for modeling nonequilibrium phenomena.
The process X(S(t)) where X is Brownian motion and S(t) is an inverse stable subordinator, producing subdiffusive behavior.
This models systems with trapping events and anomalous diffusion where classical ergodicity breaks down.
Example
>>> gen = SubordinatedProcess(alpha=0.7) >>> result = gen.generate(1000)
- __init__(alpha: float = 0.7, sigma: float = 1.0, random_state: int | None = None)[source]
Initialize subordinated process.
FractionalLevyMotion
- class lrdbenchmark.generation.FractionalLevyMotion(H: float = 0.7, alpha: float = 1.5, beta: float = 0.0, scale: float = 1.0, use_hpfracc: bool = True, random_state: int | None = None)[source]
Linear Fractional Stable Motion (LFSM) via FFT-based spectral method.
Generates heavy-tailed, non-Gaussian processes with long-range dependence by applying fractional integration to symmetric α-stable noise in the frequency domain.
- The algorithm:
Generate symmetric α-stable noise Z
FFT to frequency domain: Z̃ = FFT(Z)
Apply spectral kernel: X̃ = Z̃ * |ω|^{-d} where d = H - 1/α
IFFT back to time domain: X = IFFT(X̃)
When α = 2 (Gaussian case), d = H - 0.5, recovering fractional Brownian motion.
- Parameters:
H (float) – Hurst (self-similarity) parameter, 0 < H < 1
alpha (float) – Stability index, 0 < alpha <= 2. α=2 is Gaussian (fBm).
beta (float) – Skewness parameter, -1 <= beta <= 1. Use 0 for symmetric.
scale (float) – Scale parameter for the stable distribution
use_hpfracc (bool) – If True, attempt to use hpfracc library for optimized operations. Falls back to NumPy if hpfracc is not available.
random_state (int, optional) – Random seed for reproducibility
Example
>>> gen = FractionalLevyMotion(H=0.7, alpha=1.5) >>> result = gen.generate(1000) >>> signal = result['signal']
Notes
The relationship between H (Hurst parameter), α (stability index), and d (fractional integration order) is: d = H - 1/α
For LFSM, the valid parameter range requires: 0 < H < 1 and 1/α < H < 1 to ensure d > 0 (fractional integration, not differentiation).
References
Samorodnitsky, G. & Taqqu, M. S. (1994). Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall.
- __init__(H: float = 0.7, alpha: float = 1.5, beta: float = 0.0, scale: float = 1.0, use_hpfracc: bool = True, random_state: int | None = None)[source]
Initialize Linear Fractional Stable Motion generator.
- Parameters:
H (float) – Hurst-like self-similarity parameter (0 < H < 1)
alpha (float) – Stability index (0 < alpha <= 2). α=2 is Gaussian.
beta (float) – Skewness parameter (-1 <= beta <= 1)
scale (float) – Scale parameter
use_hpfracc (bool) – Whether to use hpfracc library if available
random_state (int, optional) – Random seed
- _generate_stable_rv(size: int, rng: Generator) ndarray[source]
Generate stable random variables using Chambers-Mallows-Stuck algorithm.
For symmetric α-stable (beta=0), this produces the Lévy driver noise.
- _apply_spectral_kernel(noise: ndarray) ndarray[source]
Apply fractional integration kernel |ω|^{-d} in frequency domain.
This is the core of the spectral method for LFSM generation.
- _apply_spectral_kernel_hpfracc(noise: ndarray) ndarray[source]
Apply fractional integration using hpfracc’s optimized methods.
Uses Riemann-Liouville fractional integral when available.
SOCAvalancheModel
- class lrdbenchmark.generation.SOCAvalancheModel(grid_size: int = 32, threshold: int = 4, random_state: int | None = None)[source]
Self-Organized Criticality avalanche model (Bak-Tang-Wiesenfeld).
Simulates a sandpile model producing scale-free avalanche dynamics. The resulting time series of avalanche sizes exhibits power-law correlations characteristic of critical systems.
Example
>>> gen = SOCAvalancheModel(grid_size=64) >>> result = gen.generate(1000)
- __init__(grid_size: int = 32, threshold: int = 4, random_state: int | None = None)[source]
Initialize SOC sandpile model.
- _run_sandpile(n_avalanches: int, rng: Generator) ndarray[source]
Run sandpile simulation and record avalanche sizes.
Surrogate Data Generators
Generators for hypothesis testing of LRD and nonlinearity.
IAFFTSurrogate
- class lrdbenchmark.generation.IAFFTSurrogate(max_iterations: int = 100, convergence_tol: float = 1e-06, random_state: int | None = None)[source]
Iterative Amplitude Adjusted Fourier Transform (IAAFT) surrogate generator.
Generates surrogates that preserve both the power spectrum and the amplitude distribution of the original time series, while destroying any nonlinear temporal structure.
Reference: Schreiber & Schmitz (1996)
Example
>>> gen = IAFFTSurrogate() >>> surrogate = gen.generate(original_data)
- __init__(max_iterations: int = 100, convergence_tol: float = 1e-06, random_state: int | None = None)[source]
Initialize IAAFT surrogate generator.
PhaseRandomizedSurrogate
- class lrdbenchmark.generation.PhaseRandomizedSurrogate(random_state: int | None = None)[source]
Phase randomization surrogate generator.
Generates surrogates by randomizing the Fourier phases while preserving the power spectrum (amplitude). Destroys all temporal correlations except those captured by the power spectrum.
Example
>>> gen = PhaseRandomizedSurrogate() >>> surrogate = gen.generate(original_data)
- __init__(random_state: int | None = None)[source]
Initialize phase randomization generator.
- Parameters:
random_state (int, optional) – Random seed
ARSurrogate
- class lrdbenchmark.generation.ARSurrogate(order: int = 10, random_state: int | None = None)[source]
Autoregressive (AR) surrogate generator.
Fits an AR model to the original data and generates surrogates from the fitted model. Provides a linear null hypothesis.
Example
>>> gen = ARSurrogate(order=10) >>> surrogate = gen.generate(original_data)
- __init__(order: int = 10, random_state: int | None = None)[source]
Initialize AR surrogate generator.
Factory Functions
- lrdbenchmark.generation.create_nonstationary_process(process_type: str, **kwargs) NonstationaryProcessBase[source]
Factory function to create nonstationary processes.
- Parameters:
process_type (str) – Type of process: ‘regime_switching’, ‘continuous_drift’, ‘structural_break’, ‘ensemble_time_average’
**kwargs – Process-specific parameters
- Returns:
Configured process instance
- Return type:
NonstationaryProcessBase