Generation Module API

This module provides data generators for testing LRD estimators under various conditions.

Time Series Generator

class lrdbenchmark.generation.TimeSeriesGenerator(random_state: int | None = None)[source]

Bases: object

Unified Time Series Generator for LRD Benchmark.

This class handles the end-to-end generation process: 1. Base signal generation (FBM, FGN, ARFIMA, MRW) 2. Contamination application (Noise, Trends, Artifacts) 3. Preprocessing (Detrending, Winsorizing, Normalization) – “Baked In”

__init__(random_state: int | None = None)[source]

Initialize the generator.

Parameters:: random_state (int, optional) – Global random seed.

generate(model: str, length: int, params: Dict[str, Any], contamination: List[Dict[str, Any]] | None = None, preprocess: bool = True, preprocess_params: Dict[str, Any] | None = None, seed: int | None = None) → Dict[str, Any][source]

Generate a processed time series.

Parameters:

model (str) – Model name (‘fbm’, ‘fgn’, ‘arfima’, ‘mrw’). Case-insensitive.
length (int) – Length of the time series.
params (dict) – Parameters for the model (e.g., {‘H’: 0.7}).
contamination (list of dicts, optional) –
List of contamination specs to apply sequentially. Each dict should have:
- ’scenario’: ConfoundingScenario enum or str name
- ’intensity’: float (0.0 to 1.0)
- ’params’: dict (scenario-specific parameters)
preprocess (bool, default=True) – Whether to apply the baked-in preprocessing pipeline.
preprocess_params (dict, optional) – Overrides for preprocessing configuration (e.g. {‘enable_detrend’: False}).
seed (int, optional) – Specific seed for this generation.

Returns:

Result dictionary containing: - ‘signal’: The final processed numpy array - ‘clean_signal’: The clean signal before contamination - ‘contaminated_signal’: Signal after contamination but before preprocessing - ‘metadata’: Full generation metadata (true params, contamination info, preprocessing info)

Return type:

dict

Nonstationarity Generators

Generators for time-varying Hurst parameter testing.

RegimeSwitchingProcess

class lrdbenchmark.generation.RegimeSwitchingProcess(h_regimes: List[float], change_points: List[float] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Generate time series with abrupt H transitions at specified change points.

Useful for testing estimator behavior under broken stationarity conditions.

Example

>>> gen = RegimeSwitchingProcess(h_regimes=[0.3, 0.8], change_points=[0.5])
>>> result = gen.generate(1000)
>>> # First half has H=0.3, second half has H=0.8

__init__(h_regimes: List[float], change_points: List[float] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize regime switching process.

Parameters:

h_regimes (list of float) – H values for each regime (must be in (0, 1))
change_points (list of float, optional) – Relative positions of change points in (0, 1). If None, creates equal-length regimes.
sigma (float) – Standard deviation scaling
random_state (int, optional) – Random seed

_get_h_trajectory(length: int) → ndarray[source]: Get step-function H trajectory.

_get_metadata() → Dict[str, Any][source]: Get regime switching metadata.

ContinuousDriftProcess

class lrdbenchmark.generation.ContinuousDriftProcess(h_start: float = 0.3, h_end: float = 0.8, drift_type: str | DriftType = DriftType.LINEAR, drift_params: Dict[str, Any] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Generate time series with smoothly varying H(t).

Supports linear, sinusoidal, logistic, and exponential drift patterns. Useful for testing estimator behavior under gradual nonstationarity.

Example

>>> gen = ContinuousDriftProcess(h_start=0.3, h_end=0.8, drift_type='linear')
>>> result = gen.generate(1000)
>>> # H increases linearly from 0.3 to 0.8

__init__(h_start: float = 0.3, h_end: float = 0.8, drift_type: str | DriftType = DriftType.LINEAR, drift_params: Dict[str, Any] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize continuous drift process.

Parameters:

h_start (float) – Initial H value
h_end (float) – Final H value
drift_type (str or DriftType) – Type of drift: ‘linear’, ‘sinusoidal’, ‘logistic’, ‘exponential’
drift_params (dict, optional) – Additional parameters for drift function: - sinusoidal: {‘frequency’: float, ‘phase’: float} - logistic: {‘steepness’: float, ‘midpoint’: float} - exponential: {‘rate’: float}
sigma (float) – Standard deviation scaling
random_state (int, optional) – Random seed

_get_h_trajectory(length: int) → ndarray[source]: Get smooth H trajectory based on drift type.

_get_metadata() → Dict[str, Any][source]: Get continuous drift metadata.

StructuralBreakProcess

class lrdbenchmark.generation.StructuralBreakProcess(h_before: float = 0.7, h_after: float = 0.4, break_position: float = 0.5, break_severity: float = 0.0, variance_change: float = 1.0, n_breaks: int = 1, sigma: float = 1.0, random_state: int | None = None)[source]

Generate time series with structural breaks (abrupt level/variance shifts).

Combines H regime switching with additional level shifts and variance changes to create more realistic nonstationary scenarios.

Example

>>> gen = StructuralBreakProcess(
...     h_before=0.7, h_after=0.4,
...     break_position=0.5, break_severity=0.3
... )
>>> result = gen.generate(1000)

__init__(h_before: float = 0.7, h_after: float = 0.4, break_position: float = 0.5, break_severity: float = 0.0, variance_change: float = 1.0, n_breaks: int = 1, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize structural break process.

Parameters:

h_before (float) – H value before break(s)
h_after (float) – H value after break(s)
break_position (float) – Relative position of break in (0, 1). For multiple breaks, positions are evenly distributed.
break_severity (float) – Magnitude of level shift at break (0 = no shift)
variance_change (float) – Multiplicative factor for variance after break (1 = no change)
n_breaks (int) – Number of structural breaks
sigma (float) – Standard deviation scaling
random_state (int, optional) – Random seed

_get_h_trajectory(length: int) → ndarray[source]: Get H trajectory with alternating regimes.

generate(length: int, seed: int | None = None, segment_length: int = 256) → Dict[str, Any][source]: Generate signal with structural breaks including level shifts.

_get_metadata() → Dict[str, Any][source]: Get structural break metadata.

EnsembleTimeAverageProcess

class lrdbenchmark.generation.EnsembleTimeAverageProcess(H: float = 0.7, aging_exponent: float = 0.5, aging_type: str = 'power_law', sigma: float = 1.0, random_state: int | None = None)[source]

Generate data for testing ergodicity violation.

In equilibrium systems, ensemble averages equal time averages. This process generates data where this equivalence is broken, which is characteristic of aging/nonequilibrium systems where classical estimators fail.

Example

>>> gen = EnsembleTimeAverageProcess(H=0.7, aging_exponent=0.5)
>>> result = gen.generate(1000)
>>> # Signal exhibits aging behavior

__init__(H: float = 0.7, aging_exponent: float = 0.5, aging_type: str = 'power_law', sigma: float = 1.0, random_state: int | None = None)[source]

Initialize ensemble-time average process.

Parameters:

H (float) – Base Hurst parameter
aging_exponent (float) – Exponent controlling aging rate (0 = no aging, 1 = strong aging)
aging_type (str) – Type of aging: ‘power_law’, ‘logarithmic’, ‘exponential’
sigma (float) – Standard deviation scaling
random_state (int, optional) – Random seed

_get_h_trajectory(length: int) → ndarray[source]: Get H trajectory with aging-induced drift.

generate_ensemble(n_realizations: int, length: int, seed: int | None = None) → Dict[str, Any][source]

Generate ensemble of realizations for ergodicity testing.

Parameters:

n_realizations (int) – Number of realizations
length (int) – Length of each realization
seed (int, optional) – Random seed

Returns:

Dictionary containing: - ‘ensemble’: Array of shape (n_realizations, length) - ‘ensemble_mean’: Mean across realizations at each time - ‘time_mean’: Mean across time for each realization - ‘h_trajectory’: True H trajectory

Return type:

dict

_get_metadata() → Dict[str, Any][source]: Get ergodicity testing metadata.

Critical Regime Models

Physics-motivated generators for critical and nonequilibrium regimes.

OrnsteinUhlenbeckProcess

class lrdbenchmark.generation.OrnsteinUhlenbeckProcess(theta_start: float = 0.1, theta_end: float = 1.0, sigma: float = 1.0, transition_type: str = 'linear', dt: float = 0.01, random_state: int | None = None)[source]

Ornstein-Uhlenbeck process with time-varying friction coefficient.

Models transient criticality where the system transitions between different relaxation regimes.

The SDE is:: dX_t = -θ(t) * X_t * dt + σ * dW_t

where θ(t) is the time-varying friction coefficient.

Example

>>> gen = OrnsteinUhlenbeckProcess(theta_start=0.1, theta_end=1.0)
>>> result = gen.generate(1000)

__init__(theta_start: float = 0.1, theta_end: float = 1.0, sigma: float = 1.0, transition_type: str = 'linear', dt: float = 0.01, random_state: int | None = None)[source]

Initialize OU process with time-varying friction.

Parameters:

theta_start (float) – Initial friction coefficient (low = critical-like)
theta_end (float) – Final friction coefficient
sigma (float) – Noise intensity
transition_type (str) – How θ transitions: ‘linear’, ‘exponential’, ‘step’
dt (float) – Time step for simulation
random_state (int, optional) – Random seed

_get_theta_trajectory(length: int) → ndarray[source]: Get time-varying friction coefficient.

generate(length: int, seed: int | None = None) → Dict[str, Any][source]

Generate OU process with time-varying friction.

Returns dict with ‘signal’, ‘theta_trajectory’, ‘metadata’.

SubordinatedProcess

class lrdbenchmark.generation.SubordinatedProcess(alpha: float = 0.7, sigma: float = 1.0, random_state: int | None = None)[source]

Subordinated Brownian motion for modeling nonequilibrium phenomena.

The process X(S(t)) where X is Brownian motion and S(t) is an inverse stable subordinator, producing subdiffusive behavior.

This models systems with trapping events and anomalous diffusion where classical ergodicity breaks down.

Example

>>> gen = SubordinatedProcess(alpha=0.7)
>>> result = gen.generate(1000)

__init__(alpha: float = 0.7, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize subordinated process.

Parameters:

alpha (float) – Subordinator index (0 < alpha < 1). Lower = more trapping.
sigma (float) – Diffusion coefficient of parent Brownian motion
random_state (int, optional) – Random seed

_generate_stable_subordinator(length: int, rng: Generator) → ndarray[source]: Generate one-sided stable Lévy process (subordinator).

generate(length: int, seed: int | None = None) → Dict[str, Any][source]

Generate subordinated Brownian motion.

Returns dict with ‘signal’, ‘operational_time’, ‘metadata’.

FractionalLevyMotion

class lrdbenchmark.generation.FractionalLevyMotion(H: float = 0.7, alpha: float = 1.5, beta: float = 0.0, scale: float = 1.0, use_hpfracc: bool = True, random_state: int | None = None)[source]

Linear Fractional Stable Motion (LFSM) via FFT-based spectral method.

Generates heavy-tailed, non-Gaussian processes with long-range dependence by applying fractional integration to symmetric α-stable noise in the frequency domain.

The algorithm:

Generate symmetric α-stable noise Z
FFT to frequency domain: Z̃ = FFT(Z)
Apply spectral kernel: X̃ = Z̃ * |ω|^{-d} where d = H - 1/α
IFFT back to time domain: X = IFFT(X̃)

When α = 2 (Gaussian case), d = H - 0.5, recovering fractional Brownian motion.

Parameters:

H (float) – Hurst (self-similarity) parameter, 0 < H < 1
alpha (float) – Stability index, 0 < alpha <= 2. α=2 is Gaussian (fBm).
beta (float) – Skewness parameter, -1 <= beta <= 1. Use 0 for symmetric.
scale (float) – Scale parameter for the stable distribution
use_hpfracc (bool) – If True, attempt to use hpfracc library for optimized operations. Falls back to NumPy if hpfracc is not available.
random_state (int, optional) – Random seed for reproducibility

Example

>>> gen = FractionalLevyMotion(H=0.7, alpha=1.5)
>>> result = gen.generate(1000)
>>> signal = result['signal']

Notes

The relationship between H (Hurst parameter), α (stability index), and d (fractional integration order) is: d = H - 1/α

For LFSM, the valid parameter range requires: 0 < H < 1 and 1/α < H < 1 to ensure d > 0 (fractional integration, not differentiation).

References

Samorodnitsky, G. & Taqqu, M. S. (1994). Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall.

classmethod _check_hpfracc() → bool[source]: Check if hpfracc is available (cached).

__init__(H: float = 0.7, alpha: float = 1.5, beta: float = 0.0, scale: float = 1.0, use_hpfracc: bool = True, random_state: int | None = None)[source]

Initialize Linear Fractional Stable Motion generator.

Parameters:

H (float) – Hurst-like self-similarity parameter (0 < H < 1)
alpha (float) – Stability index (0 < alpha <= 2). α=2 is Gaussian.
beta (float) – Skewness parameter (-1 <= beta <= 1)
scale (float) – Scale parameter
use_hpfracc (bool) – Whether to use hpfracc library if available
random_state (int, optional) – Random seed

_generate_stable_rv(size: int, rng: Generator) → ndarray[source]

Generate stable random variables using Chambers-Mallows-Stuck algorithm.

For symmetric α-stable (beta=0), this produces the Lévy driver noise.

_apply_spectral_kernel(noise: ndarray) → ndarray[source]

Apply fractional integration kernel |ω|^{-d} in frequency domain.

This is the core of the spectral method for LFSM generation.

_apply_spectral_kernel_hpfracc(noise: ndarray) → ndarray[source]

Apply fractional integration using hpfracc’s optimized methods.

Uses Riemann-Liouville fractional integral when available.

generate(length: int, seed: int | None = None) → Dict[str, Any][source]

Generate Linear Fractional Stable Motion.

Uses FFT-based spectral method with kernel |ω|^{-d} where d = H - 1/α.

Parameters:

length (int) – Length of the time series to generate
seed (int, optional) – Random seed for this generation (overrides constructor seed)

Returns:

Dictionary containing: - ‘signal’: The generated LFSM time series - ‘metadata’: Process parameters and properties

Return type:

dict

SOCAvalancheModel

class lrdbenchmark.generation.SOCAvalancheModel(grid_size: int = 32, threshold: int = 4, random_state: int | None = None)[source]

Self-Organized Criticality avalanche model (Bak-Tang-Wiesenfeld).

Simulates a sandpile model producing scale-free avalanche dynamics. The resulting time series of avalanche sizes exhibits power-law correlations characteristic of critical systems.

Example

>>> gen = SOCAvalancheModel(grid_size=64)
>>> result = gen.generate(1000)

__init__(grid_size: int = 32, threshold: int = 4, random_state: int | None = None)[source]

Initialize SOC sandpile model.

Parameters:

grid_size (int) – Size of square lattice
threshold (int) – Toppling threshold (typically 4 for 2D)
random_state (int, optional) – Random seed

_run_sandpile(n_avalanches: int, rng: Generator) → ndarray[source]: Run sandpile simulation and record avalanche sizes.

generate(length: int, seed: int | None = None, warmup: int = 1000) → Dict[str, Any][source]

Generate time series of avalanche sizes from SOC sandpile.

Parameters:

length (int) – Number of avalanche events to generate
seed (int, optional) – Random seed
warmup (int) – Number of initial events to discard (reach critical state)
'signal' (Returns dict with)
'metadata'.

Surrogate Data Generators

Generators for hypothesis testing of LRD and nonlinearity.

IAFFTSurrogate

class lrdbenchmark.generation.IAFFTSurrogate(max_iterations: int = 100, convergence_tol: float = 1e-06, random_state: int | None = None)[source]

Iterative Amplitude Adjusted Fourier Transform (IAAFT) surrogate generator.

Generates surrogates that preserve both the power spectrum and the amplitude distribution of the original time series, while destroying any nonlinear temporal structure.

Reference: Schreiber & Schmitz (1996)

Example

>>> gen = IAFFTSurrogate()
>>> surrogate = gen.generate(original_data)

__init__(max_iterations: int = 100, convergence_tol: float = 1e-06, random_state: int | None = None)[source]

Initialize IAAFT surrogate generator.

Parameters:

max_iterations (int) – Maximum number of iterations
convergence_tol (float) – Convergence tolerance for power spectrum matching
random_state (int, optional) – Random seed

generate(data: ndarray, n_surrogates: int = 1, seed: int | None = None) → Dict[str, Any][source]

Generate IAAFT surrogates.

Parameters:

data (np.ndarray) – Original time series
n_surrogates (int) – Number of surrogates to generate
seed (int, optional) – Random seed

Returns:

Dictionary with ‘surrogates’ array and ‘metadata’

Return type:

dict

PhaseRandomizedSurrogate

class lrdbenchmark.generation.PhaseRandomizedSurrogate(random_state: int | None = None)[source]

Phase randomization surrogate generator.

Generates surrogates by randomizing the Fourier phases while preserving the power spectrum (amplitude). Destroys all temporal correlations except those captured by the power spectrum.

Example

>>> gen = PhaseRandomizedSurrogate()
>>> surrogate = gen.generate(original_data)

__init__(random_state: int | None = None)[source]

Initialize phase randomization generator.

Parameters:: random_state (int, optional) – Random seed

generate(data: ndarray, n_surrogates: int = 1, seed: int | None = None) → Dict[str, Any][source]

Generate phase-randomized surrogates.

Parameters:

data (np.ndarray) – Original time series
n_surrogates (int) – Number of surrogates
seed (int, optional) – Random seed

Returns:

Dictionary with ‘surrogates’ and ‘metadata’

Return type:

dict

ARSurrogate

class lrdbenchmark.generation.ARSurrogate(order: int = 10, random_state: int | None = None)[source]

Autoregressive (AR) surrogate generator.

Fits an AR model to the original data and generates surrogates from the fitted model. Provides a linear null hypothesis.

Example

>>> gen = ARSurrogate(order=10)
>>> surrogate = gen.generate(original_data)

__init__(order: int = 10, random_state: int | None = None)[source]

Initialize AR surrogate generator.

Parameters:

order (int) – AR model order
random_state (int, optional) – Random seed

_fit_ar(data: ndarray) → tuple[source]: Fit AR model using Yule-Walker equations.

generate(data: ndarray, n_surrogates: int = 1, seed: int | None = None) → Dict[str, Any][source]

Generate AR surrogates.

Parameters:

data (np.ndarray) – Original time series
n_surrogates (int) – Number of surrogates
seed (int, optional) – Random seed

Returns:

Dictionary with ‘surrogates’ and ‘metadata’

Return type:

dict

Factory Functions

lrdbenchmark.generation.create_nonstationary_process(process_type: str, **kwargs) → NonstationaryProcessBase[source]

Factory function to create nonstationary processes.

Parameters:

process_type (str) – Type of process: ‘regime_switching’, ‘continuous_drift’, ‘structural_break’, ‘ensemble_time_average’
**kwargs – Process-specific parameters

Returns:

Configured process instance

Return type:

NonstationaryProcessBase

lrdbenchmark.generation.create_critical_regime_process(process_type: str, **kwargs) → Any[source]

Factory function for critical regime processes.

Parameters:

process_type (str) – ‘ornstein_uhlenbeck’, ‘subordinated’, ‘fractional_levy’, ‘soc_avalanche’
**kwargs – Process-specific parameters

lrdbenchmark.generation.create_surrogate_generator(method: str, **kwargs) → Any[source]

Factory function for surrogate generators.

Parameters:

method (str) – ‘iaaft’, ‘phase_randomization’, or ‘ar’
**kwargs – Method-specific parameters