Contamination Factory API

The Contamination Factory provides a comprehensive system for generating realistic confounding profiles and applying them to pure signals for testing estimator robustness under various real-world conditions.

Contamination Factory

class lrdbenchmark.models.contamination.contamination_factory.ContaminationFactory(random_seed: int | None = None)[source]

Bases: object

Advanced contamination factory that creates realistic confounding profiles.

This factory generates complex, realistic data artifacts that mimic real-world scenarios encountered in time series analysis across various domains.

__init__(random_seed: int | None = None)[source]

Initialize the contamination factory.

Parameters:

random_seed (int, optional) – Random seed for reproducible results

_initialize_profiles() Dict[ConfoundingScenario, ConfoundingProfile][source]

Initialize predefined confounding profiles.

create_confounding_profile(scenario: ConfoundingScenario, intensity: float | None = None, custom_parameters: Dict[str, float] | None = None) ConfoundingProfile[source]

Create a confounding profile for a specific scenario.

Parameters:
  • scenario (ConfoundingScenario) – The confounding scenario to create

  • intensity (float, optional) – Intensity of the confounding (0.0 to 1.0). If None, uses default.

  • custom_parameters (dict, optional) – Custom parameters to override defaults

Returns:

The created confounding profile

Return type:

ConfoundingProfile

apply_confounding(data: ndarray, scenario: ConfoundingScenario, intensity: float | None = None) Tuple[ndarray, str][source]

Apply confounding to data based on a specific scenario.

Parameters:
  • data (np.ndarray) – Input time series data

  • scenario (ConfoundingScenario) – Confounding scenario to apply

  • intensity (float, optional) – Intensity of confounding (0.0 to 1.0). If None, uses default.

Returns:

(contaminated_data, description)

Return type:

tuple

_apply_financial_crash(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply financial crash confounding.

_apply_volatility_clustering(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply volatility clustering confounding.

_apply_regime_change(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply regime change confounding.

_apply_sensor_drift(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply sensor drift confounding.

_apply_motion_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply motion artifacts confounding.

_apply_equipment_failure(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply equipment failure confounding.

_apply_seasonal_effects(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply seasonal effects confounding.

_apply_extreme_events(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply extreme events confounding.

_apply_measurement_drift(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply measurement drift confounding.

_apply_network_bursts(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply network bursts confounding.

_apply_network_congestion(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply network congestion confounding.

_apply_network_equipment_failure(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply network equipment failure confounding.

_apply_calibration_drift(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply calibration drift confounding.

_apply_sensor_aging(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply sensor aging confounding.

_apply_environmental_interference(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply environmental interference confounding.

_apply_mixed_realistic_light(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply light realistic confounding.

_apply_mixed_realistic_moderate(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply moderate realistic confounding.

_apply_mixed_realistic_severe(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply severe realistic confounding.

_apply_eeg_ocular_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG ocular artifacts (blinks, saccades).

_apply_eeg_muscle_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG muscle artifacts (EMG).

_apply_eeg_cardiac_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG cardiac artifacts (ECG).

_apply_eeg_electrode_popping(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG electrode popping artifacts.

_apply_eeg_electrode_drift(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG electrode drift and impedance changes.

_apply_eeg_60hz_noise(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG 60Hz power line noise and harmonics.

_apply_eeg_sweat_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG sweat artifacts and impedance changes.

_apply_eeg_movement_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]

Apply EEG head/body movement artifacts.

get_available_scenarios() List[ConfoundingScenario][source]

Get list of available confounding scenarios.

get_scenario_info(scenario: ConfoundingScenario) Dict[str, str][source]

Get information about a specific scenario.

Confounding Scenarios

The contamination factory supports various domain-specific confounding scenarios:

class lrdbenchmark.models.contamination.contamination_factory.ConfoundingScenario(value)[source]

Bases: Enum

Real-world confounding scenarios.

FINANCIAL_CRASH = 'financial_crash'
FINANCIAL_VOLATILITY_CLUSTERING = 'financial_volatility_clustering'
FINANCIAL_REGIME_CHANGE = 'financial_regime_change'
PHYSIOLOGICAL_SENSOR_DRIFT = 'physiological_sensor_drift'
PHYSIOLOGICAL_MOTION_ARTIFACTS = 'physiological_motion_artifacts'
PHYSIOLOGICAL_EQUIPMENT_FAILURE = 'physiological_equipment_failure'
ENVIRONMENTAL_SEASONAL = 'environmental_seasonal'
ENVIRONMENTAL_EXTREME_EVENTS = 'environmental_extreme_events'
ENVIRONMENTAL_MEASUREMENT_DRIFT = 'environmental_measurement_drift'
NETWORK_BURSTS = 'network_bursts'
NETWORK_CONGESTION = 'network_congestion'
NETWORK_EQUIPMENT_FAILURE = 'network_equipment_failure'
INDUSTRIAL_CALIBRATION_DRIFT = 'industrial_calibration_drift'
INDUSTRIAL_SENSOR_AGING = 'industrial_sensor_aging'
INDUSTRIAL_ENVIRONMENTAL_INTERFERENCE = 'industrial_environmental_interference'
EEG_OCULAR_ARTIFACTS = 'eeg_ocular_artifacts'
EEG_MUSCLE_ARTIFACTS = 'eeg_muscle_artifacts'
EEG_CARDIAC_ARTIFACTS = 'eeg_cardiac_artifacts'
EEG_ELECTRODE_POPPING = 'eeg_electrode_popping'
EEG_ELECTRODE_DRIFT = 'eeg_electrode_drift'
EEG_60HZ_NOISE = 'eeg_60hz_noise'
EEG_SWEAT_ARTIFACTS = 'eeg_sweat_artifacts'
EEG_MOVEMENT_ARTIFACTS = 'eeg_movement_artifacts'
MIXED_REALISTIC_LIGHT = 'mixed_realistic_light'
MIXED_REALISTIC_MODERATE = 'mixed_realistic_moderate'
MIXED_REALISTIC_SEVERE = 'mixed_realistic_severe'

Confounding Profiles

class lrdbenchmark.models.contamination.contamination_factory.ConfoundingProfile(scenario: ConfoundingScenario, intensity: float, parameters: Dict[str, float], description: str)[source]

Bases: object

Configuration for a specific confounding profile.

scenario: ConfoundingScenario
intensity: float
parameters: Dict[str, float]
description: str
__init__(scenario: ConfoundingScenario, intensity: float, parameters: Dict[str, float], description: str) None

Usage Examples

Basic Contamination Application

from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# Apply different contamination scenarios
scenarios = [
    ConfoundingScenario.PURE,
    ConfoundingScenario.GAUSSIAN_NOISE,
    ConfoundingScenario.TREND,
    ConfoundingScenario.OUTLIERS
]

for scenario in scenarios:
    contaminated_data, description = factory.apply_confounding(
        pure_data, scenario, intensity=0.2
    )
    print(f"{scenario.value}: {description}")

EEG Contamination Scenarios

from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# EEG-specific contamination scenarios
eeg_scenarios = [
    ConfoundingScenario.EEG_OCULAR_ARTIFACTS,
    ConfoundingScenario.EEG_MUSCLE_ARTIFACTS,
    ConfoundingScenario.EEG_CARDIAC_ARTIFACTS,
    ConfoundingScenario.EEG_ELECTRODE_POPPING,
    ConfoundingScenario.EEG_ELECTRODE_DRIFT,
    ConfoundingScenario.EEG_60HZ_NOISE,
    ConfoundingScenario.EEG_SWEAT_ARTIFACTS,
    ConfoundingScenario.EEG_MOVEMENT_ARTIFACTS
]

print("EEG Contamination Testing:")
for scenario in eeg_scenarios:
    contaminated_data, description = factory.apply_confounding(
        pure_data, scenario, intensity=0.3
    )
    print(f"{scenario.value}: {description}")

Financial Contamination Scenarios

from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# Financial-specific contamination scenarios
financial_scenarios = [
    ConfoundingScenario.FINANCIAL_CRASH,
    ConfoundingScenario.VOLATILITY_CLUSTERING,
    ConfoundingScenario.MARKET_MICROSTRUCTURE
]

print("Financial Contamination Testing:")
for scenario in financial_scenarios:
    contaminated_data, description = factory.apply_confounding(
        pure_data, scenario, intensity=0.4
    )
    print(f"{scenario.value}: {description}")

Physiological Contamination Scenarios

from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# Physiological-specific contamination scenarios
physiological_scenarios = [
    ConfoundingScenario.PHYSIOLOGICAL_DRIFT,
    ConfoundingScenario.SENSOR_ARTIFACTS,
    ConfoundingScenario.MEASUREMENT_ERRORS
]

print("Physiological Contamination Testing:")
for scenario in physiological_scenarios:
    contaminated_data, description = factory.apply_confounding(
        pure_data, scenario, intensity=0.25
    )
    print(f"{scenario.value}: {description}")

Environmental Contamination Scenarios

from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# Environmental-specific contamination scenarios
environmental_scenarios = [
    ConfoundingScenario.ENVIRONMENTAL_SEASONAL,
    ConfoundingScenario.NETWORK_BURSTS,
    ConfoundingScenario.INDUSTRIAL_CALIBRATION
]

print("Environmental Contamination Testing:")
for scenario in environmental_scenarios:
    contaminated_data, description = factory.apply_confounding(
        pure_data, scenario, intensity=0.3
    )
    print(f"{scenario.value}: {description}")

Custom Contamination Profiles

from lrdbenchmark import ContaminationFactory, ConfoundingScenario, ConfoundingProfile
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# Create custom contamination profile
custom_profile = ConfoundingProfile(
    scenario=ConfoundingScenario.GAUSSIAN_NOISE,
    intensity=0.5,
    parameters={
        'noise_std': 0.3,
        'noise_correlation': 0.1
    },
    description="Custom high-intensity correlated noise"
)

# Apply custom contamination
contaminated_data, description = factory.apply_confounding(
    pure_data, custom_profile
)
print(f"Custom contamination: {description}")

Intensity Variation Testing

from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# Test different intensity levels
intensities = [0.1, 0.2, 0.3, 0.4, 0.5]
scenario = ConfoundingScenario.EEG_OCULAR_ARTIFACTS

print("Intensity Variation Testing:")
for intensity in intensities:
    contaminated_data, description = factory.apply_confounding(
        pure_data, scenario, intensity=intensity
    )
    print(f"Intensity {intensity}: {description}")

Contamination Analysis

from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np

# Create contamination factory
factory = ContaminationFactory()

# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)

# Analyze contamination effects
scenarios = [
    ConfoundingScenario.PURE,
    ConfoundingScenario.GAUSSIAN_NOISE,
    ConfoundingScenario.EEG_OCULAR_ARTIFACTS,
    ConfoundingScenario.FINANCIAL_CRASH
]

print("Contamination Analysis:")
for scenario in scenarios:
    contaminated_data, description = factory.apply_confounding(
        pure_data, scenario, intensity=0.3
    )

    # Calculate contamination metrics
    contamination_level = np.std(contaminated_data - pure_data) / np.std(pure_data)
    correlation = np.corrcoef(pure_data, contaminated_data)[0, 1]

    print(f"{scenario.value}:")
    print(f"  Description: {description}")
    print(f"  Contamination level: {contamination_level:.3f}")
    print(f"  Correlation with pure: {correlation:.3f}")

Best Practices

  1. Intensity Selection: Use appropriate intensity levels (0.1-0.5) for realistic testing

  2. Scenario Selection: Choose scenarios relevant to your application domain

  3. Multiple Scenarios: Test robustness across different contamination types

  4. Intensity Variation: Test across different intensity levels

  5. Analysis: Always analyze the effects of contamination on your data

  6. Validation: Compare contaminated results with pure data baselines

Note

The contamination factory provides realistic confounding profiles based on real-world scenarios. This makes it ideal for testing estimator robustness in practical applications.

Warning

High intensity contamination (> 0.5) may significantly distort the underlying signal characteristics and should be used with caution.