Contamination Factory API
The Contamination Factory provides a comprehensive system for generating realistic confounding profiles and applying them to pure signals for testing estimator robustness under various real-world conditions.
Contamination Factory
- class lrdbenchmark.models.contamination.contamination_factory.ContaminationFactory(random_seed: int | None = None)[source]
Bases:
objectAdvanced contamination factory that creates realistic confounding profiles.
This factory generates complex, realistic data artifacts that mimic real-world scenarios encountered in time series analysis across various domains.
- __init__(random_seed: int | None = None)[source]
Initialize the contamination factory.
- Parameters:
random_seed (int, optional) – Random seed for reproducible results
- _initialize_profiles() Dict[ConfoundingScenario, ConfoundingProfile][source]
Initialize predefined confounding profiles.
- create_confounding_profile(scenario: ConfoundingScenario, intensity: float | None = None, custom_parameters: Dict[str, float] | None = None) ConfoundingProfile[source]
Create a confounding profile for a specific scenario.
- Parameters:
scenario (ConfoundingScenario) – The confounding scenario to create
intensity (float, optional) – Intensity of the confounding (0.0 to 1.0). If None, uses default.
custom_parameters (dict, optional) – Custom parameters to override defaults
- Returns:
The created confounding profile
- Return type:
- apply_confounding(data: ndarray, scenario: ConfoundingScenario, intensity: float | None = None) Tuple[ndarray, str][source]
Apply confounding to data based on a specific scenario.
- Parameters:
data (np.ndarray) – Input time series data
scenario (ConfoundingScenario) – Confounding scenario to apply
intensity (float, optional) – Intensity of confounding (0.0 to 1.0). If None, uses default.
- Returns:
(contaminated_data, description)
- Return type:
- _apply_financial_crash(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply financial crash confounding.
- _apply_volatility_clustering(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply volatility clustering confounding.
- _apply_regime_change(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply regime change confounding.
- _apply_sensor_drift(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply sensor drift confounding.
- _apply_motion_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply motion artifacts confounding.
- _apply_equipment_failure(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply equipment failure confounding.
- _apply_seasonal_effects(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply seasonal effects confounding.
- _apply_extreme_events(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply extreme events confounding.
- _apply_measurement_drift(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply measurement drift confounding.
- _apply_network_bursts(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply network bursts confounding.
- _apply_network_congestion(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply network congestion confounding.
- _apply_network_equipment_failure(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply network equipment failure confounding.
- _apply_calibration_drift(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply calibration drift confounding.
- _apply_sensor_aging(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply sensor aging confounding.
- _apply_environmental_interference(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply environmental interference confounding.
- _apply_mixed_realistic_light(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply light realistic confounding.
- _apply_mixed_realistic_moderate(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply moderate realistic confounding.
- _apply_mixed_realistic_severe(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply severe realistic confounding.
- _apply_eeg_ocular_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG ocular artifacts (blinks, saccades).
- _apply_eeg_muscle_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG muscle artifacts (EMG).
- _apply_eeg_cardiac_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG cardiac artifacts (ECG).
- _apply_eeg_electrode_popping(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG electrode popping artifacts.
- _apply_eeg_electrode_drift(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG electrode drift and impedance changes.
- _apply_eeg_60hz_noise(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG 60Hz power line noise and harmonics.
- _apply_eeg_sweat_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG sweat artifacts and impedance changes.
- _apply_eeg_movement_artifacts(data: ndarray, params: Dict[str, float]) ndarray[source]
Apply EEG head/body movement artifacts.
- get_available_scenarios() List[ConfoundingScenario][source]
Get list of available confounding scenarios.
Confounding Scenarios
The contamination factory supports various domain-specific confounding scenarios:
- class lrdbenchmark.models.contamination.contamination_factory.ConfoundingScenario(value)[source]
Bases:
EnumReal-world confounding scenarios.
- FINANCIAL_CRASH = 'financial_crash'
- FINANCIAL_VOLATILITY_CLUSTERING = 'financial_volatility_clustering'
- FINANCIAL_REGIME_CHANGE = 'financial_regime_change'
- PHYSIOLOGICAL_SENSOR_DRIFT = 'physiological_sensor_drift'
- PHYSIOLOGICAL_MOTION_ARTIFACTS = 'physiological_motion_artifacts'
- PHYSIOLOGICAL_EQUIPMENT_FAILURE = 'physiological_equipment_failure'
- ENVIRONMENTAL_SEASONAL = 'environmental_seasonal'
- ENVIRONMENTAL_EXTREME_EVENTS = 'environmental_extreme_events'
- ENVIRONMENTAL_MEASUREMENT_DRIFT = 'environmental_measurement_drift'
- NETWORK_BURSTS = 'network_bursts'
- NETWORK_CONGESTION = 'network_congestion'
- NETWORK_EQUIPMENT_FAILURE = 'network_equipment_failure'
- INDUSTRIAL_CALIBRATION_DRIFT = 'industrial_calibration_drift'
- INDUSTRIAL_SENSOR_AGING = 'industrial_sensor_aging'
- INDUSTRIAL_ENVIRONMENTAL_INTERFERENCE = 'industrial_environmental_interference'
- EEG_OCULAR_ARTIFACTS = 'eeg_ocular_artifacts'
- EEG_MUSCLE_ARTIFACTS = 'eeg_muscle_artifacts'
- EEG_CARDIAC_ARTIFACTS = 'eeg_cardiac_artifacts'
- EEG_ELECTRODE_POPPING = 'eeg_electrode_popping'
- EEG_ELECTRODE_DRIFT = 'eeg_electrode_drift'
- EEG_60HZ_NOISE = 'eeg_60hz_noise'
- EEG_SWEAT_ARTIFACTS = 'eeg_sweat_artifacts'
- EEG_MOVEMENT_ARTIFACTS = 'eeg_movement_artifacts'
- MIXED_REALISTIC_LIGHT = 'mixed_realistic_light'
- MIXED_REALISTIC_MODERATE = 'mixed_realistic_moderate'
- MIXED_REALISTIC_SEVERE = 'mixed_realistic_severe'
Confounding Profiles
Usage Examples
Basic Contamination Application
from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# Apply different contamination scenarios
scenarios = [
ConfoundingScenario.PURE,
ConfoundingScenario.GAUSSIAN_NOISE,
ConfoundingScenario.TREND,
ConfoundingScenario.OUTLIERS
]
for scenario in scenarios:
contaminated_data, description = factory.apply_confounding(
pure_data, scenario, intensity=0.2
)
print(f"{scenario.value}: {description}")
EEG Contamination Scenarios
from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# EEG-specific contamination scenarios
eeg_scenarios = [
ConfoundingScenario.EEG_OCULAR_ARTIFACTS,
ConfoundingScenario.EEG_MUSCLE_ARTIFACTS,
ConfoundingScenario.EEG_CARDIAC_ARTIFACTS,
ConfoundingScenario.EEG_ELECTRODE_POPPING,
ConfoundingScenario.EEG_ELECTRODE_DRIFT,
ConfoundingScenario.EEG_60HZ_NOISE,
ConfoundingScenario.EEG_SWEAT_ARTIFACTS,
ConfoundingScenario.EEG_MOVEMENT_ARTIFACTS
]
print("EEG Contamination Testing:")
for scenario in eeg_scenarios:
contaminated_data, description = factory.apply_confounding(
pure_data, scenario, intensity=0.3
)
print(f"{scenario.value}: {description}")
Financial Contamination Scenarios
from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# Financial-specific contamination scenarios
financial_scenarios = [
ConfoundingScenario.FINANCIAL_CRASH,
ConfoundingScenario.VOLATILITY_CLUSTERING,
ConfoundingScenario.MARKET_MICROSTRUCTURE
]
print("Financial Contamination Testing:")
for scenario in financial_scenarios:
contaminated_data, description = factory.apply_confounding(
pure_data, scenario, intensity=0.4
)
print(f"{scenario.value}: {description}")
Physiological Contamination Scenarios
from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# Physiological-specific contamination scenarios
physiological_scenarios = [
ConfoundingScenario.PHYSIOLOGICAL_DRIFT,
ConfoundingScenario.SENSOR_ARTIFACTS,
ConfoundingScenario.MEASUREMENT_ERRORS
]
print("Physiological Contamination Testing:")
for scenario in physiological_scenarios:
contaminated_data, description = factory.apply_confounding(
pure_data, scenario, intensity=0.25
)
print(f"{scenario.value}: {description}")
Environmental Contamination Scenarios
from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# Environmental-specific contamination scenarios
environmental_scenarios = [
ConfoundingScenario.ENVIRONMENTAL_SEASONAL,
ConfoundingScenario.NETWORK_BURSTS,
ConfoundingScenario.INDUSTRIAL_CALIBRATION
]
print("Environmental Contamination Testing:")
for scenario in environmental_scenarios:
contaminated_data, description = factory.apply_confounding(
pure_data, scenario, intensity=0.3
)
print(f"{scenario.value}: {description}")
Custom Contamination Profiles
from lrdbenchmark import ContaminationFactory, ConfoundingScenario, ConfoundingProfile
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# Create custom contamination profile
custom_profile = ConfoundingProfile(
scenario=ConfoundingScenario.GAUSSIAN_NOISE,
intensity=0.5,
parameters={
'noise_std': 0.3,
'noise_correlation': 0.1
},
description="Custom high-intensity correlated noise"
)
# Apply custom contamination
contaminated_data, description = factory.apply_confounding(
pure_data, custom_profile
)
print(f"Custom contamination: {description}")
Intensity Variation Testing
from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# Test different intensity levels
intensities = [0.1, 0.2, 0.3, 0.4, 0.5]
scenario = ConfoundingScenario.EEG_OCULAR_ARTIFACTS
print("Intensity Variation Testing:")
for intensity in intensities:
contaminated_data, description = factory.apply_confounding(
pure_data, scenario, intensity=intensity
)
print(f"Intensity {intensity}: {description}")
Contamination Analysis
from lrdbenchmark import ContaminationFactory, ConfoundingScenario
from lrdbenchmark import FBMModel
import numpy as np
# Create contamination factory
factory = ContaminationFactory()
# Generate pure data
model = FBMModel(H=0.7, sigma=1.0)
pure_data = model.generate(1000, seed=42)
# Analyze contamination effects
scenarios = [
ConfoundingScenario.PURE,
ConfoundingScenario.GAUSSIAN_NOISE,
ConfoundingScenario.EEG_OCULAR_ARTIFACTS,
ConfoundingScenario.FINANCIAL_CRASH
]
print("Contamination Analysis:")
for scenario in scenarios:
contaminated_data, description = factory.apply_confounding(
pure_data, scenario, intensity=0.3
)
# Calculate contamination metrics
contamination_level = np.std(contaminated_data - pure_data) / np.std(pure_data)
correlation = np.corrcoef(pure_data, contaminated_data)[0, 1]
print(f"{scenario.value}:")
print(f" Description: {description}")
print(f" Contamination level: {contamination_level:.3f}")
print(f" Correlation with pure: {correlation:.3f}")
Best Practices
Intensity Selection: Use appropriate intensity levels (0.1-0.5) for realistic testing
Scenario Selection: Choose scenarios relevant to your application domain
Multiple Scenarios: Test robustness across different contamination types
Intensity Variation: Test across different intensity levels
Analysis: Always analyze the effects of contamination on your data
Validation: Compare contaminated results with pure data baselines
Note
The contamination factory provides realistic confounding profiles based on real-world scenarios. This makes it ideal for testing estimator robustness in practical applications.
Warning
High intensity contamination (> 0.5) may significantly distort the underlying signal characteristics and should be used with caution.