Generation Module API

This module provides data generators for testing LRD estimators under various conditions.

Time Series Generator

class lrdbenchmark.generation.TimeSeriesGenerator(random_state: int | None = None)[source]

Bases: object

Unified Time Series Generator for LRD Benchmark.

This class handles the end-to-end generation process: 1. Base signal generation (FBM, FGN, ARFIMA, MRW) 2. Contamination application (Noise, Trends, Artifacts) 3. Preprocessing (Detrending, Winsorizing, Normalization) – “Baked In”

__init__(random_state: int | None = None)[source]

Initialize the generator.

Parameters:

random_state (int, optional) – Global random seed.

generate(model: str, length: int, params: Dict[str, Any], contamination: List[Dict[str, Any]] | None = None, preprocess: bool = True, preprocess_params: Dict[str, Any] | None = None, seed: int | None = None) Dict[str, Any][source]

Generate a processed time series.

Parameters:
  • model (str) – Model name (‘fbm’, ‘fgn’, ‘arfima’, ‘mrw’). Case-insensitive.

  • length (int) – Length of the time series.

  • params (dict) – Parameters for the model (e.g., {‘H’: 0.7}).

  • contamination (list of dicts, optional) –

    List of contamination specs to apply sequentially. Each dict should have:

    • ’scenario’: ConfoundingScenario enum or str name

    • ’intensity’: float (0.0 to 1.0)

    • ’params’: dict (scenario-specific parameters)

  • preprocess (bool, default=True) – Whether to apply the baked-in preprocessing pipeline.

  • preprocess_params (dict, optional) – Overrides for preprocessing configuration (e.g. {‘enable_detrend’: False}).

  • seed (int, optional) – Specific seed for this generation.

Returns:

Result dictionary containing: - ‘signal’: The final processed numpy array - ‘clean_signal’: The clean signal before contamination - ‘contaminated_signal’: Signal after contamination but before preprocessing - ‘metadata’: Full generation metadata (true params, contamination info, preprocessing info)

Return type:

dict

Nonstationarity Generators

Generators for time-varying Hurst parameter testing.

RegimeSwitchingProcess

class lrdbenchmark.generation.RegimeSwitchingProcess(h_regimes: List[float], change_points: List[float] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Generate time series with abrupt H transitions at specified change points.

Useful for testing estimator behavior under broken stationarity conditions.

Example

>>> gen = RegimeSwitchingProcess(h_regimes=[0.3, 0.8], change_points=[0.5])
>>> result = gen.generate(1000)
>>> # First half has H=0.3, second half has H=0.8
__init__(h_regimes: List[float], change_points: List[float] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize regime switching process.

Parameters:
  • h_regimes (list of float) – H values for each regime (must be in (0, 1))

  • change_points (list of float, optional) – Relative positions of change points in (0, 1). If None, creates equal-length regimes.

  • sigma (float) – Standard deviation scaling

  • random_state (int, optional) – Random seed

_get_h_trajectory(length: int) ndarray[source]

Get step-function H trajectory.

_get_metadata() Dict[str, Any][source]

Get regime switching metadata.

ContinuousDriftProcess

class lrdbenchmark.generation.ContinuousDriftProcess(h_start: float = 0.3, h_end: float = 0.8, drift_type: str | DriftType = DriftType.LINEAR, drift_params: Dict[str, Any] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Generate time series with smoothly varying H(t).

Supports linear, sinusoidal, logistic, and exponential drift patterns. Useful for testing estimator behavior under gradual nonstationarity.

Example

>>> gen = ContinuousDriftProcess(h_start=0.3, h_end=0.8, drift_type='linear')
>>> result = gen.generate(1000)
>>> # H increases linearly from 0.3 to 0.8
__init__(h_start: float = 0.3, h_end: float = 0.8, drift_type: str | DriftType = DriftType.LINEAR, drift_params: Dict[str, Any] | None = None, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize continuous drift process.

Parameters:
  • h_start (float) – Initial H value

  • h_end (float) – Final H value

  • drift_type (str or DriftType) – Type of drift: ‘linear’, ‘sinusoidal’, ‘logistic’, ‘exponential’

  • drift_params (dict, optional) – Additional parameters for drift function: - sinusoidal: {‘frequency’: float, ‘phase’: float} - logistic: {‘steepness’: float, ‘midpoint’: float} - exponential: {‘rate’: float}

  • sigma (float) – Standard deviation scaling

  • random_state (int, optional) – Random seed

_get_h_trajectory(length: int) ndarray[source]

Get smooth H trajectory based on drift type.

_get_metadata() Dict[str, Any][source]

Get continuous drift metadata.

StructuralBreakProcess

class lrdbenchmark.generation.StructuralBreakProcess(h_before: float = 0.7, h_after: float = 0.4, break_position: float = 0.5, break_severity: float = 0.0, variance_change: float = 1.0, n_breaks: int = 1, sigma: float = 1.0, random_state: int | None = None)[source]

Generate time series with structural breaks (abrupt level/variance shifts).

Combines H regime switching with additional level shifts and variance changes to create more realistic nonstationary scenarios.

Example

>>> gen = StructuralBreakProcess(
...     h_before=0.7, h_after=0.4,
...     break_position=0.5, break_severity=0.3
... )
>>> result = gen.generate(1000)
__init__(h_before: float = 0.7, h_after: float = 0.4, break_position: float = 0.5, break_severity: float = 0.0, variance_change: float = 1.0, n_breaks: int = 1, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize structural break process.

Parameters:
  • h_before (float) – H value before break(s)

  • h_after (float) – H value after break(s)

  • break_position (float) – Relative position of break in (0, 1). For multiple breaks, positions are evenly distributed.

  • break_severity (float) – Magnitude of level shift at break (0 = no shift)

  • variance_change (float) – Multiplicative factor for variance after break (1 = no change)

  • n_breaks (int) – Number of structural breaks

  • sigma (float) – Standard deviation scaling

  • random_state (int, optional) – Random seed

_get_h_trajectory(length: int) ndarray[source]

Get H trajectory with alternating regimes.

generate(length: int, seed: int | None = None, segment_length: int = 256) Dict[str, Any][source]

Generate signal with structural breaks including level shifts.

_get_metadata() Dict[str, Any][source]

Get structural break metadata.

EnsembleTimeAverageProcess

class lrdbenchmark.generation.EnsembleTimeAverageProcess(H: float = 0.7, aging_exponent: float = 0.5, aging_type: str = 'power_law', sigma: float = 1.0, random_state: int | None = None)[source]

Generate data for testing ergodicity violation.

In equilibrium systems, ensemble averages equal time averages. This process generates data where this equivalence is broken, which is characteristic of aging/nonequilibrium systems where classical estimators fail.

Example

>>> gen = EnsembleTimeAverageProcess(H=0.7, aging_exponent=0.5)
>>> result = gen.generate(1000)
>>> # Signal exhibits aging behavior
__init__(H: float = 0.7, aging_exponent: float = 0.5, aging_type: str = 'power_law', sigma: float = 1.0, random_state: int | None = None)[source]

Initialize ensemble-time average process.

Parameters:
  • H (float) – Base Hurst parameter

  • aging_exponent (float) – Exponent controlling aging rate (0 = no aging, 1 = strong aging)

  • aging_type (str) – Type of aging: ‘power_law’, ‘logarithmic’, ‘exponential’

  • sigma (float) – Standard deviation scaling

  • random_state (int, optional) – Random seed

_get_h_trajectory(length: int) ndarray[source]

Get H trajectory with aging-induced drift.

generate_ensemble(n_realizations: int, length: int, seed: int | None = None) Dict[str, Any][source]

Generate ensemble of realizations for ergodicity testing.

Parameters:
  • n_realizations (int) – Number of realizations

  • length (int) – Length of each realization

  • seed (int, optional) – Random seed

Returns:

Dictionary containing: - ‘ensemble’: Array of shape (n_realizations, length) - ‘ensemble_mean’: Mean across realizations at each time - ‘time_mean’: Mean across time for each realization - ‘h_trajectory’: True H trajectory

Return type:

dict

_get_metadata() Dict[str, Any][source]

Get ergodicity testing metadata.

Critical Regime Models

Physics-motivated generators for critical and nonequilibrium regimes.

OrnsteinUhlenbeckProcess

class lrdbenchmark.generation.OrnsteinUhlenbeckProcess(theta_start: float = 0.1, theta_end: float = 1.0, sigma: float = 1.0, transition_type: str = 'linear', dt: float = 0.01, random_state: int | None = None)[source]

Ornstein-Uhlenbeck process with time-varying friction coefficient.

Models transient criticality where the system transitions between different relaxation regimes.

The SDE is:

dX_t = -θ(t) * X_t * dt + σ * dW_t

where θ(t) is the time-varying friction coefficient.

Example

>>> gen = OrnsteinUhlenbeckProcess(theta_start=0.1, theta_end=1.0)
>>> result = gen.generate(1000)
__init__(theta_start: float = 0.1, theta_end: float = 1.0, sigma: float = 1.0, transition_type: str = 'linear', dt: float = 0.01, random_state: int | None = None)[source]

Initialize OU process with time-varying friction.

Parameters:
  • theta_start (float) – Initial friction coefficient (low = critical-like)

  • theta_end (float) – Final friction coefficient

  • sigma (float) – Noise intensity

  • transition_type (str) – How θ transitions: ‘linear’, ‘exponential’, ‘step’

  • dt (float) – Time step for simulation

  • random_state (int, optional) – Random seed

_get_theta_trajectory(length: int) ndarray[source]

Get time-varying friction coefficient.

generate(length: int, seed: int | None = None) Dict[str, Any][source]

Generate OU process with time-varying friction.

Returns dict with ‘signal’, ‘theta_trajectory’, ‘metadata’.

SubordinatedProcess

class lrdbenchmark.generation.SubordinatedProcess(alpha: float = 0.7, sigma: float = 1.0, random_state: int | None = None)[source]

Subordinated Brownian motion for modeling nonequilibrium phenomena.

The process X(S(t)) where X is Brownian motion and S(t) is an inverse stable subordinator, producing subdiffusive behavior.

This models systems with trapping events and anomalous diffusion where classical ergodicity breaks down.

Example

>>> gen = SubordinatedProcess(alpha=0.7)
>>> result = gen.generate(1000)
__init__(alpha: float = 0.7, sigma: float = 1.0, random_state: int | None = None)[source]

Initialize subordinated process.

Parameters:
  • alpha (float) – Subordinator index (0 < alpha < 1). Lower = more trapping.

  • sigma (float) – Diffusion coefficient of parent Brownian motion

  • random_state (int, optional) – Random seed

_generate_stable_subordinator(length: int, rng: Generator) ndarray[source]

Generate one-sided stable Lévy process (subordinator).

generate(length: int, seed: int | None = None) Dict[str, Any][source]

Generate subordinated Brownian motion.

Returns dict with ‘signal’, ‘operational_time’, ‘metadata’.

FractionalLevyMotion

class lrdbenchmark.generation.FractionalLevyMotion(H: float = 0.7, alpha: float = 1.5, beta: float = 0.0, scale: float = 1.0, use_hpfracc: bool = True, random_state: int | None = None)[source]

Linear Fractional Stable Motion (LFSM) via FFT-based spectral method.

Generates heavy-tailed, non-Gaussian processes with long-range dependence by applying fractional integration to symmetric α-stable noise in the frequency domain.

The algorithm:
  1. Generate symmetric α-stable noise Z

  2. FFT to frequency domain: Z̃ = FFT(Z)

  3. Apply spectral kernel: X̃ = Z̃ * |ω|^{-d} where d = H - 1/α

  4. IFFT back to time domain: X = IFFT(X̃)

When α = 2 (Gaussian case), d = H - 0.5, recovering fractional Brownian motion.

Parameters:
  • H (float) – Hurst (self-similarity) parameter, 0 < H < 1

  • alpha (float) – Stability index, 0 < alpha <= 2. α=2 is Gaussian (fBm).

  • beta (float) – Skewness parameter, -1 <= beta <= 1. Use 0 for symmetric.

  • scale (float) – Scale parameter for the stable distribution

  • use_hpfracc (bool) – If True, attempt to use hpfracc library for optimized operations. Falls back to NumPy if hpfracc is not available.

  • random_state (int, optional) – Random seed for reproducibility

Example

>>> gen = FractionalLevyMotion(H=0.7, alpha=1.5)
>>> result = gen.generate(1000)
>>> signal = result['signal']

Notes

The relationship between H (Hurst parameter), α (stability index), and d (fractional integration order) is: d = H - 1/α

For LFSM, the valid parameter range requires: 0 < H < 1 and 1/α < H < 1 to ensure d > 0 (fractional integration, not differentiation).

References

Samorodnitsky, G. & Taqqu, M. S. (1994). Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall.

classmethod _check_hpfracc() bool[source]

Check if hpfracc is available (cached).

__init__(H: float = 0.7, alpha: float = 1.5, beta: float = 0.0, scale: float = 1.0, use_hpfracc: bool = True, random_state: int | None = None)[source]

Initialize Linear Fractional Stable Motion generator.

Parameters:
  • H (float) – Hurst-like self-similarity parameter (0 < H < 1)

  • alpha (float) – Stability index (0 < alpha <= 2). α=2 is Gaussian.

  • beta (float) – Skewness parameter (-1 <= beta <= 1)

  • scale (float) – Scale parameter

  • use_hpfracc (bool) – Whether to use hpfracc library if available

  • random_state (int, optional) – Random seed

_generate_stable_rv(size: int, rng: Generator) ndarray[source]

Generate stable random variables using Chambers-Mallows-Stuck algorithm.

For symmetric α-stable (beta=0), this produces the Lévy driver noise.

_apply_spectral_kernel(noise: ndarray) ndarray[source]

Apply fractional integration kernel |ω|^{-d} in frequency domain.

This is the core of the spectral method for LFSM generation.

_apply_spectral_kernel_hpfracc(noise: ndarray) ndarray[source]

Apply fractional integration using hpfracc’s optimized methods.

Uses Riemann-Liouville fractional integral when available.

generate(length: int, seed: int | None = None) Dict[str, Any][source]

Generate Linear Fractional Stable Motion.

Uses FFT-based spectral method with kernel |ω|^{-d} where d = H - 1/α.

Parameters:
  • length (int) – Length of the time series to generate

  • seed (int, optional) – Random seed for this generation (overrides constructor seed)

Returns:

Dictionary containing: - ‘signal’: The generated LFSM time series - ‘metadata’: Process parameters and properties

Return type:

dict

SOCAvalancheModel

class lrdbenchmark.generation.SOCAvalancheModel(grid_size: int = 32, threshold: int = 4, random_state: int | None = None)[source]

Self-Organized Criticality avalanche model (Bak-Tang-Wiesenfeld).

Simulates a sandpile model producing scale-free avalanche dynamics. The resulting time series of avalanche sizes exhibits power-law correlations characteristic of critical systems.

Example

>>> gen = SOCAvalancheModel(grid_size=64)
>>> result = gen.generate(1000)
__init__(grid_size: int = 32, threshold: int = 4, random_state: int | None = None)[source]

Initialize SOC sandpile model.

Parameters:
  • grid_size (int) – Size of square lattice

  • threshold (int) – Toppling threshold (typically 4 for 2D)

  • random_state (int, optional) – Random seed

_run_sandpile(n_avalanches: int, rng: Generator) ndarray[source]

Run sandpile simulation and record avalanche sizes.

generate(length: int, seed: int | None = None, warmup: int = 1000) Dict[str, Any][source]

Generate time series of avalanche sizes from SOC sandpile.

Parameters:
  • length (int) – Number of avalanche events to generate

  • seed (int, optional) – Random seed

  • warmup (int) – Number of initial events to discard (reach critical state)

  • 'signal' (Returns dict with)

  • 'metadata'.

Surrogate Data Generators

Generators for hypothesis testing of LRD and nonlinearity.

IAFFTSurrogate

class lrdbenchmark.generation.IAFFTSurrogate(max_iterations: int = 100, convergence_tol: float = 1e-06, random_state: int | None = None)[source]

Iterative Amplitude Adjusted Fourier Transform (IAAFT) surrogate generator.

Generates surrogates that preserve both the power spectrum and the amplitude distribution of the original time series, while destroying any nonlinear temporal structure.

Reference: Schreiber & Schmitz (1996)

Example

>>> gen = IAFFTSurrogate()
>>> surrogate = gen.generate(original_data)
__init__(max_iterations: int = 100, convergence_tol: float = 1e-06, random_state: int | None = None)[source]

Initialize IAAFT surrogate generator.

Parameters:
  • max_iterations (int) – Maximum number of iterations

  • convergence_tol (float) – Convergence tolerance for power spectrum matching

  • random_state (int, optional) – Random seed

generate(data: ndarray, n_surrogates: int = 1, seed: int | None = None) Dict[str, Any][source]

Generate IAAFT surrogates.

Parameters:
  • data (np.ndarray) – Original time series

  • n_surrogates (int) – Number of surrogates to generate

  • seed (int, optional) – Random seed

Returns:

Dictionary with ‘surrogates’ array and ‘metadata’

Return type:

dict

PhaseRandomizedSurrogate

class lrdbenchmark.generation.PhaseRandomizedSurrogate(random_state: int | None = None)[source]

Phase randomization surrogate generator.

Generates surrogates by randomizing the Fourier phases while preserving the power spectrum (amplitude). Destroys all temporal correlations except those captured by the power spectrum.

Example

>>> gen = PhaseRandomizedSurrogate()
>>> surrogate = gen.generate(original_data)
__init__(random_state: int | None = None)[source]

Initialize phase randomization generator.

Parameters:

random_state (int, optional) – Random seed

generate(data: ndarray, n_surrogates: int = 1, seed: int | None = None) Dict[str, Any][source]

Generate phase-randomized surrogates.

Parameters:
  • data (np.ndarray) – Original time series

  • n_surrogates (int) – Number of surrogates

  • seed (int, optional) – Random seed

Returns:

Dictionary with ‘surrogates’ and ‘metadata’

Return type:

dict

ARSurrogate

class lrdbenchmark.generation.ARSurrogate(order: int = 10, random_state: int | None = None)[source]

Autoregressive (AR) surrogate generator.

Fits an AR model to the original data and generates surrogates from the fitted model. Provides a linear null hypothesis.

Example

>>> gen = ARSurrogate(order=10)
>>> surrogate = gen.generate(original_data)
__init__(order: int = 10, random_state: int | None = None)[source]

Initialize AR surrogate generator.

Parameters:
  • order (int) – AR model order

  • random_state (int, optional) – Random seed

_fit_ar(data: ndarray) tuple[source]

Fit AR model using Yule-Walker equations.

generate(data: ndarray, n_surrogates: int = 1, seed: int | None = None) Dict[str, Any][source]

Generate AR surrogates.

Parameters:
  • data (np.ndarray) – Original time series

  • n_surrogates (int) – Number of surrogates

  • seed (int, optional) – Random seed

Returns:

Dictionary with ‘surrogates’ and ‘metadata’

Return type:

dict

Factory Functions

lrdbenchmark.generation.create_nonstationary_process(process_type: str, **kwargs) NonstationaryProcessBase[source]

Factory function to create nonstationary processes.

Parameters:
  • process_type (str) – Type of process: ‘regime_switching’, ‘continuous_drift’, ‘structural_break’, ‘ensemble_time_average’

  • **kwargs – Process-specific parameters

Returns:

Configured process instance

Return type:

NonstationaryProcessBase

lrdbenchmark.generation.create_critical_regime_process(process_type: str, **kwargs) Any[source]

Factory function for critical regime processes.

Parameters:
  • process_type (str) – ‘ornstein_uhlenbeck’, ‘subordinated’, ‘fractional_levy’, ‘soc_avalanche’

  • **kwargs – Process-specific parameters

lrdbenchmark.generation.create_surrogate_generator(method: str, **kwargs) Any[source]

Factory function for surrogate generators.

Parameters:
  • method (str) – ‘iaaft’, ‘phase_randomization’, or ‘ar’

  • **kwargs – Method-specific parameters