Validation Techniques and Statistical Tests
This document provides comprehensive coverage of the validation techniques, statistical tests, and quality assurance methods used in LRDBench.
Monte Carlo Validation
Overview
Monte Carlo validation is the primary method for assessing estimator performance on synthetic data with known parameters. This approach allows for controlled evaluation of estimator bias, variance, and overall accuracy.
Methodology
Data Generation Process:
Parameter Space Definition: Define ranges for Hurst parameters (typically H ∈ [0.3, 0.9])
Model Selection: Choose data models (FBM, FGN, ARFIMA, MRW)
Sample Size Selection: Use multiple sample sizes (e.g., 500, 1000, 2000, 5000)
Realization Count: Generate N realizations for each parameter combination
Mathematical Framework:
For estimator \(\hat{H}\) and true value \(H_0\):
Implementation:
import numpy as np
from lrdbenchmark import FBMModel, ComprehensiveBenchmark
def monte_carlo_validation(H_values, sample_sizes, n_realizations=100):
"""Perform Monte Carlo validation."""
results = {}
for H in H_values:
for n in sample_sizes:
estimates = []
for i in range(n_realizations):
# Generate synthetic data
model = FBMModel(H=H, sigma=1.0)
data = model.generate(n, seed=i)
# Apply estimator
benchmark = ComprehensiveBenchmark()
result = benchmark.run_classical_benchmark(
data_length=n,
estimators=['dfa', 'gph', 'rs']
)
# Collect estimates
for estimator_name, estimator_result in result.estimators.items():
if estimator_name not in results:
results[estimator_name] = {}
if (H, n) not in results[estimator_name]:
results[estimator_name][(H, n)] = []
results[estimator_name][(H, n)].append(
estimator_result.mean_estimate
)
return results
Bootstrap Methods
Overview
Bootstrap methods provide non-parametric approaches to estimate confidence intervals, standard errors, and bias correction for estimators.
Types of Bootstrap
1. Non-Parametric Bootstrap:
Resample with replacement from the original data:
2. Parametric Bootstrap:
Generate bootstrap samples from a fitted parametric model:
3. Moving Block Bootstrap:
Preserve temporal dependence by resampling blocks:
Implementation:
import numpy as np
from scipy import stats
def bootstrap_confidence_interval(data, estimator_func, n_bootstrap=1000,
confidence_level=0.95):
"""Calculate bootstrap confidence interval."""
bootstrap_estimates = []
for _ in range(n_bootstrap):
# Resample with replacement
bootstrap_sample = np.random.choice(data, size=len(data), replace=True)
# Apply estimator
estimate = estimator_func(bootstrap_sample)
bootstrap_estimates.append(estimate)
# Calculate confidence interval
alpha = 1 - confidence_level
lower_percentile = (alpha / 2) * 100
upper_percentile = (1 - alpha / 2) * 100
ci_lower = np.percentile(bootstrap_estimates, lower_percentile)
ci_upper = np.percentile(bootstrap_estimates, upper_percentile)
return ci_lower, ci_upper, bootstrap_estimates
Cross-Validation
Overview
Cross-validation is essential for machine learning estimators to prevent overfitting and assess generalization performance.
K-Fold Cross-Validation
Algorithm:
Data Partitioning: Divide data into K folds
Training/Validation: For each fold k: - Train on K-1 folds - Validate on fold k
Performance Aggregation: Average performance across all folds
Mathematical Formulation:
where \(f^{-k}\) is the estimator trained on all folds except fold k.
Implementation:
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
def cross_validate_estimator(data, labels, estimator_class, k_folds=5):
"""Perform k-fold cross-validation."""
kf = KFold(n_splits=k_folds, shuffle=True, random_state=42)
cv_scores = []
for train_idx, val_idx in kf.split(data):
# Split data
X_train, X_val = data[train_idx], data[val_idx]
y_train, y_val = labels[train_idx], labels[val_idx]
# Train estimator
estimator = estimator_class()
estimator.fit(X_train, y_train)
# Predict and evaluate
y_pred = estimator.estimate(X_val)
mse = mean_squared_error(y_val, y_pred)
cv_scores.append(mse)
return np.mean(cv_scores), np.std(cv_scores)
Time Series Cross-Validation
For time series data, standard k-fold CV can lead to data leakage. Time series CV uses expanding windows:
def time_series_cv(data, labels, estimator_class, min_train_size=100):
"""Time series cross-validation with expanding windows."""
cv_scores = []
for i in range(min_train_size, len(data)):
# Training set: all data up to index i
X_train = data[:i]
y_train = labels[:i]
# Validation set: next observation
X_val = data[i:i+1]
y_val = labels[i:i+1]
# Train and predict
estimator = estimator_class()
estimator.fit(X_train, y_train)
y_pred = estimator.estimate(X_val)
# Calculate error
mse = mean_squared_error(y_val, y_pred)
cv_scores.append(mse)
return np.mean(cv_scores), np.std(cv_scores)
Robustness Analysis
Overview
Robustness analysis assesses estimator performance under various data conditions, including contamination, noise, and model misspecification.
Contamination Models
1. Additive Noise:
2. Outlier Contamination:
3. Trend Contamination:
Implementation:
def robustness_analysis(data, estimator_func, contamination_levels):
"""Analyze estimator robustness to contamination."""
results = {}
for epsilon in contamination_levels:
contaminated_data = data.copy()
# Add contamination
n_contaminated = int(epsilon * len(data))
contamination_indices = np.random.choice(
len(data), size=n_contaminated, replace=False
)
# Add outliers
contaminated_data[contamination_indices] += np.random.normal(
0, 5, size=n_contaminated
)
# Apply estimator
estimate = estimator_func(contaminated_data)
results[epsilon] = estimate
return results
Breakdown Point Analysis
The breakdown point is the smallest fraction of contaminated data that can cause the estimator to produce arbitrarily bad results.
Implementation:
def breakdown_point_analysis(data, estimator_func, max_contamination=0.5):
"""Estimate breakdown point of an estimator."""
original_estimate = estimator_func(data)
breakdown_point = None
for epsilon in np.arange(0, max_contamination, 0.01):
# Create contaminated data
n_contaminated = int(epsilon * len(data))
contaminated_data = data.copy()
# Add extreme outliers
contamination_indices = np.random.choice(
len(data), size=n_contaminated, replace=False
)
contaminated_data[contamination_indices] = 1e6
# Check if estimator breaks down
try:
contaminated_estimate = estimator_func(contaminated_data)
# Check if estimate is reasonable
if abs(contaminated_estimate - original_estimate) > 0.5:
breakdown_point = epsilon
break
except:
breakdown_point = epsilon
break
return breakdown_point
Validation Statistical Tests
Hypothesis Testing
1. Test for Long-Range Dependence:
Null hypothesis: \(H_0: H = 0.5\) (no LRD) Alternative hypothesis: \(H_1: H \neq 0.5\) (LRD present)
Test statistic: .. math:
T = \frac{\hat{H} - 0.5}{\text{SE}(\hat{H})}
2. Test for Parameter Equality:
Null hypothesis: \(H_0: H_1 = H_2\) Alternative hypothesis: \(H_1: H_1 \neq H_2\)
Test statistic: .. math:
T = \frac{\hat{H}_1 - \hat{H}_2}{\sqrt{\text{SE}(\hat{H}_1)^2 + \text{SE}(\hat{H}_2)^2}}
Implementation:
from scipy import stats
def test_lrd_presence(data, estimator_func, alpha=0.05):
"""Test for presence of long-range dependence."""
# Estimate H
H_estimate = estimator_func(data)
# Bootstrap standard error
bootstrap_estimates = []
for _ in range(1000):
bootstrap_sample = np.random.choice(data, size=len(data), replace=True)
bootstrap_estimates.append(estimator_func(bootstrap_sample))
se_H = np.std(bootstrap_estimates)
# Test statistic
test_statistic = (H_estimate - 0.5) / se_H
# Critical value
critical_value = stats.norm.ppf(1 - alpha/2)
# Decision
reject_null = abs(test_statistic) > critical_value
p_value = 2 * (1 - stats.norm.cdf(abs(test_statistic)))
return {
'test_statistic': test_statistic,
'p_value': p_value,
'reject_null': reject_null,
'H_estimate': H_estimate,
'standard_error': se_H
}
Goodness-of-Fit Tests
1. Kolmogorov-Smirnov Test:
Tests whether empirical distribution matches theoretical distribution.
2. Anderson-Darling Test:
Weighted version of KS test, more sensitive to tails.
3. Chi-Square Test:
Tests fit of observed frequencies to expected frequencies.
Implementation:
def goodness_of_fit_tests(data, theoretical_distribution):
"""Perform goodness-of-fit tests."""
results = {}
# Kolmogorov-Smirnov test
ks_statistic, ks_pvalue = stats.kstest(data, theoretical_distribution)
results['ks_test'] = {
'statistic': ks_statistic,
'p_value': ks_pvalue
}
# Anderson-Darling test
ad_statistic, ad_critical_values, ad_significance_levels = stats.anderson(
data, dist=theoretical_distribution
)
results['anderson_darling'] = {
'statistic': ad_statistic,
'critical_values': ad_critical_values,
'significance_levels': ad_significance_levels
}
# Chi-square test
observed, bins = np.histogram(data, bins='auto')
expected = theoretical_distribution.pdf(bins[:-1]) * len(data) * np.diff(bins)
chi2_statistic, chi2_pvalue = stats.chisquare(observed, expected)
results['chi_square'] = {
'statistic': chi2_statistic,
'p_value': chi2_pvalue
}
return results
Model Selection
Information Criteria
1. Akaike Information Criterion (AIC):
where k is the number of parameters and L is the likelihood.
2. Bayesian Information Criterion (BIC):
where n is the sample size.
3. Corrected AIC (AICc):
Implementation:
def model_selection_criteria(models, data):
"""Calculate model selection criteria."""
results = {}
for model_name, model in models.items():
# Fit model and get likelihood
model.fit(data)
log_likelihood = model.log_likelihood(data)
n_params = model.n_parameters
n_samples = len(data)
# Calculate criteria
aic = 2 * n_params - 2 * log_likelihood
bic = np.log(n_samples) * n_params - 2 * log_likelihood
aicc = aic + (2 * n_params * (n_params + 1)) / (n_samples - n_params - 1)
results[model_name] = {
'AIC': aic,
'BIC': bic,
'AICc': aicc,
'log_likelihood': log_likelihood,
'n_parameters': n_params
}
return results
Validation Performance Metrics
Accuracy Metrics
1. Mean Absolute Error (MAE):
2. Root Mean Square Error (RMSE):
3. Mean Absolute Percentage Error (MAPE):
4. Symmetric Mean Absolute Percentage Error (SMAPE):
Precision Metrics
1. Standard Error:
2. Coefficient of Variation:
3. Confidence Interval Width:
Implementation:
def calculate_performance_metrics(true_values, estimated_values):
"""Calculate comprehensive performance metrics."""
metrics = {}
# Convert to numpy arrays
true_vals = np.array(true_values)
est_vals = np.array(estimated_values)
# Accuracy metrics
errors = est_vals - true_vals
abs_errors = np.abs(errors)
metrics['MAE'] = np.mean(abs_errors)
metrics['RMSE'] = np.sqrt(np.mean(errors**2))
metrics['MAPE'] = 100 * np.mean(np.abs(errors / true_vals))
metrics['SMAPE'] = 100 * np.mean(2 * abs_errors / (np.abs(est_vals) + np.abs(true_vals)))
# Precision metrics
metrics['Standard_Error'] = np.std(est_vals)
metrics['Coefficient_of_Variation'] = (np.std(est_vals) / np.mean(est_vals)) * 100
# Bias
metrics['Bias'] = np.mean(errors)
# Correlation
metrics['Correlation'] = np.corrcoef(true_vals, est_vals)[0, 1]
return metrics
Efficiency Metrics
1. Computational Complexity: Big-O notation for time and space complexity of estimators.
2. Convergence Rate: Rate at which estimator approaches true value with increasing sample size.
3. Asymptotic Efficiency: Ratio of estimator variance to Cramér-Rao lower bound.
Implementation:
import time
import numpy as np
from lrdbenchmark import FBMModel, ComprehensiveBenchmark
import matplotlib.pyplot as plt
def efficiency_analysis_example():
"""Demonstrate efficiency analysis of estimators."""
# Test different sample sizes
sample_sizes = [500, 1000, 2000, 4000, 8000]
n_runs = 10
# Initialize results
results = {
'dfa': {'times': [], 'estimates': []},
'gph': {'times': [], 'estimates': []},
'rs': {'times': [], 'estimates': []}
}
print("Running efficiency analysis...")
for n in sample_sizes:
print(f"Testing sample size: {n}")
for estimator_name in results.keys():
times = []
estimates = []
for i in range(n_runs):
# Generate data
model = FBMModel(H=0.7, sigma=1.0)
data = model.generate(n, seed=i)
# Time estimation
start_time = time.time()
benchmark = ComprehensiveBenchmark()
result = benchmark.run_classical_benchmark(
data_length=n,
estimators=[estimator_name]
)
end_time = time.time()
if estimator_name in result.estimators:
times.append(end_time - start_time)
estimates.append(
result.estimators[estimator_name].mean_estimate
)
if times:
results[estimator_name]['times'].append(np.mean(times))
results[estimator_name]['estimates'].append(np.mean(estimates))
# Plot computational complexity
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
for estimator_name in results.keys():
if results[estimator_name]['times']:
plt.loglog(sample_sizes[:len(results[estimator_name]['times'])],
results[estimator_name]['times'],
label=estimator_name.upper(), marker='o')
plt.xlabel('Sample Size')
plt.ylabel('Execution Time (seconds)')
plt.title('Computational Complexity')
plt.legend()
plt.grid(True)
# Plot convergence
plt.subplot(1, 2, 2)
true_H = 0.7
for estimator_name in results.keys():
if results[estimator_name]['estimates']:
errors = np.abs(np.array(results[estimator_name]['estimates']) - true_H)
plt.loglog(sample_sizes[:len(errors)], errors,
label=estimator_name.upper(), marker='o')
plt.xlabel('Sample Size')
plt.ylabel('Absolute Error')
plt.title('Convergence Rate')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
return results
# Run the example
if __name__ == "__main__":
results = efficiency_analysis_example()
print("Efficiency analysis completed!")
Quality Assurance
Data Quality Checks
1. Stationarity Tests:
from statsmodels.tsa.stattools import adfuller, kpss
from scipy import stats
import numpy as np
from lrdbenchmark import FBMModel, FGNModel
def data_quality_checks_example():
"""Demonstrate data quality checks for time series."""
# Generate different types of data
models = {
'FBM (H=0.7)': FBMModel(H=0.7, sigma=1.0),
'FGN (H=0.8)': FGNModel(H=0.8, sigma=1.0),
'Non-stationary': lambda: np.cumsum(np.random.normal(0, 1, 1000))
}
print("=== DATA QUALITY CHECKS ===")
for model_name, model in models.items():
print(f"\n--- {model_name} ---")
# Generate data
if callable(model):
data = model()
else:
data = model.generate(1000, seed=42)
# ADF Test (Augmented Dickey-Fuller)
adf_stat, adf_pvalue, adf_critical = adfuller(data)[:3]
print(f"ADF Test:")
print(f" Statistic: {adf_stat:.4f}")
print(f" p-value: {adf_pvalue:.4f}")
print(f" Stationary: {'Yes' if adf_pvalue < 0.05 else 'No'}")
# KPSS Test
kpss_stat, kpss_pvalue, kpss_critical = kpss(data)[:3]
print(f"KPSS Test:")
print(f" Statistic: {kpss_stat:.4f}")
print(f" p-value: {kpss_pvalue:.4f}")
print(f" Stationary: {'Yes' if kpss_pvalue > 0.05 else 'No'}")
# Normality Test (Shapiro-Wilk)
shapiro_stat, shapiro_pvalue = stats.shapiro(data)
print(f"Shapiro-Wilk Test:")
print(f" Statistic: {shapiro_stat:.4f}")
print(f" p-value: {shapiro_pvalue:.4f}")
print(f" Normal: {'Yes' if shapiro_pvalue > 0.05 else 'No'}")
# Basic statistics
print(f"Basic Statistics:")
print(f" Mean: {np.mean(data):.4f}")
print(f" Std: {np.std(data):.4f}")
print(f" Skewness: {stats.skew(data):.4f}")
print(f" Kurtosis: {stats.kurtosis(data):.4f}")
# Run the example
if __name__ == "__main__":
data_quality_checks_example()
Estimator Validation
1. Consistency Checks:
import numpy as np
from lrdbenchmark import FBMModel, ComprehensiveBenchmark
import matplotlib.pyplot as plt
def estimator_validation_example():
"""Demonstrate estimator validation procedures."""
# Test consistency with increasing sample size
sample_sizes = [500, 1000, 2000, 4000, 8000]
true_H = 0.7
n_runs = 20
results = {
'dfa': {'estimates': [], 'std': []},
'gph': {'estimates': [], 'std': []},
'rs': {'estimates': [], 'std': []}
}
print("Running estimator validation...")
for n in sample_sizes:
print(f"Sample size: {n}")
for estimator_name in results.keys():
estimates = []
for i in range(n_runs):
# Generate data
model = FBMModel(H=true_H, sigma=1.0)
data = model.generate(n, seed=i)
# Apply estimator
benchmark = ComprehensiveBenchmark()
result = benchmark.run_classical_benchmark(
data_length=n,
estimators=[estimator_name]
)
if estimator_name in result.estimators:
estimates.append(
result.estimators[estimator_name].mean_estimate
)
if estimates:
results[estimator_name]['estimates'].append(np.mean(estimates))
results[estimator_name]['std'].append(np.std(estimates))
# Plot consistency
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
for estimator_name in results.keys():
if results[estimator_name]['estimates']:
plt.semilogx(sample_sizes[:len(results[estimator_name]['estimates'])],
results[estimator_name]['estimates'],
label=estimator_name.upper(), marker='o')
plt.axhline(y=true_H, color='red', linestyle='--', label='True H')
plt.xlabel('Sample Size')
plt.ylabel('Estimated H')
plt.title('Consistency Check')
plt.legend()
plt.grid(True)
# Plot standard deviation
plt.subplot(1, 2, 2)
for estimator_name in results.keys():
if results[estimator_name]['std']:
plt.loglog(sample_sizes[:len(results[estimator_name]['std'])],
results[estimator_name]['std'],
label=estimator_name.upper(), marker='o')
plt.xlabel('Sample Size')
plt.ylabel('Standard Deviation')
plt.title('Precision vs Sample Size')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
return results
# Run the example
if __name__ == "__main__":
results = estimator_validation_example()
print("Estimator validation completed!")
Comprehensive Validation Workflow
from lrdbenchmark import FBMModel, FGNModel, ComprehensiveBenchmark
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def comprehensive_validation_workflow():
"""Complete validation workflow for LRDBench estimators."""
print("=== COMPREHENSIVE VALIDATION WORKFLOW ===")
# 1. Define validation parameters
H_values = np.linspace(0.3, 0.9, 13)
sample_sizes = [500, 1000, 2000]
n_realizations = 50
# 2. Initialize results storage
validation_results = []
# 3. Run comprehensive validation
for H in H_values:
print(f"Testing H = {H:.2f}")
for n in sample_sizes:
for i in range(n_realizations):
# Generate data
model = FBMModel(H=H, sigma=1.0)
data = model.generate(n, seed=int(H*1000 + i))
# Run benchmark
benchmark = ComprehensiveBenchmark()
result = benchmark.run_classical_benchmark(
data_length=n,
estimators=['dfa', 'gph', 'rs', 'higuchi']
)
# Store results
for estimator_name, estimator_result in result.estimators.items():
validation_results.append({
'true_H': H,
'sample_size': n,
'realization': i,
'estimator': estimator_name,
'estimated_H': estimator_result.mean_estimate,
'error': abs(estimator_result.mean_estimate - H)
})
# 4. Analyze results
df = pd.DataFrame(validation_results)
print(f"\n=== VALIDATION SUMMARY ===")
print(f"Total tests: {len(df)}")
print(f"H range: {df['true_H'].min():.2f} to {df['true_H'].max():.2f}")
print(f"Sample sizes: {sorted(df['sample_size'].unique())}")
print(f"Estimators: {sorted(df['estimator'].unique())}")
# 5. Calculate performance metrics
performance = df.groupby('estimator').agg({
'error': ['mean', 'std', 'min', 'max'],
'estimated_H': ['mean', 'std']
}).round(4)
print(f"\n=== PERFORMANCE METRICS ===")
print(performance)
# 6. Create visualization
plt.figure(figsize=(15, 10))
# Error distribution
plt.subplot(2, 3, 1)
for estimator in df['estimator'].unique():
subset = df[df['estimator'] == estimator]
plt.hist(subset['error'], alpha=0.7, label=estimator.upper(), bins=20)
plt.xlabel('Absolute Error')
plt.ylabel('Frequency')
plt.title('Error Distribution')
plt.legend()
# Error vs True H
plt.subplot(2, 3, 2)
for estimator in df['estimator'].unique():
subset = df[df['estimator'] == estimator]
plt.scatter(subset['true_H'], subset['error'], alpha=0.6, label=estimator.upper())
plt.xlabel('True H')
plt.ylabel('Absolute Error')
plt.title('Error vs True H')
plt.legend()
# Error vs Sample Size
plt.subplot(2, 3, 3)
for estimator in df['estimator'].unique():
subset = df[df['estimator'] == estimator]
plt.scatter(subset['sample_size'], subset['error'], alpha=0.6, label=estimator.upper())
plt.xlabel('Sample Size')
plt.ylabel('Absolute Error')
plt.title('Error vs Sample Size')
plt.legend()
# Estimated vs True H
plt.subplot(2, 3, 4)
for estimator in df['estimator'].unique():
subset = df[df['estimator'] == estimator]
plt.scatter(subset['true_H'], subset['estimated_H'], alpha=0.6, label=estimator.upper())
plt.plot([0.3, 0.9], [0.3, 0.9], 'r--', label='Perfect')
plt.xlabel('True H')
plt.ylabel('Estimated H')
plt.title('Estimated vs True H')
plt.legend()
# Box plot by estimator
plt.subplot(2, 3, 5)
df.boxplot(column='error', by='estimator', ax=plt.gca())
plt.title('Error Distribution by Estimator')
plt.suptitle('')
# Box plot by sample size
plt.subplot(2, 3, 6)
df.boxplot(column='error', by='sample_size', ax=plt.gca())
plt.title('Error Distribution by Sample Size')
plt.suptitle('')
plt.tight_layout()
plt.show()
return df, performance
# Run the comprehensive workflow
if __name__ == "__main__":
df, performance = comprehensive_validation_workflow()
print("Comprehensive validation workflow completed!")
1. Computational Complexity: Big-O notation for time and space complexity 2. Convergence Rate: Rate at which estimator approaches true value 3. Asymptotic Efficiency: Ratio of estimator variance to Cramér-Rao lower bound
Implementation:
import time
import psutil
def efficiency_analysis(estimator_func, data_sizes):
"""Analyze computational efficiency."""
results = {}
for n in data_sizes:
# Generate test data
test_data = np.random.randn(n)
# Measure execution time
start_time = time.time()
start_memory = psutil.Process().memory_info().rss
estimator_func(test_data)
end_time = time.time()
end_memory = psutil.Process().memory_info().rss
execution_time = end_time - start_time
memory_usage = end_memory - start_memory
results[n] = {
'execution_time': execution_time,
'memory_usage': memory_usage,
'time_per_sample': execution_time / n,
'memory_per_sample': memory_usage / n
}
return results
Validation References
Beran, J. (1994). Statistics for Long-Memory Processes. Chapman & Hall.
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap. CRC Press.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis. Wiley.
Shumway, R. H., & Stoffer, D. S. (2017). Time Series Analysis and Its Applications. Springer.
Nonstationarity and Time-Varying H
Overview
Classical LRD estimators assume stationarity—specifically that the Hurst parameter H is constant throughout the time series. When this assumption is violated, estimators can produce biased or misleading results.
Key stationarity assumptions violated by time-varying H:
Constant autocovariance structure: Classical estimators assume \(\gamma(k)\) is time-invariant
Scale invariance: Power-law decay of correlations requires constant scaling properties
Ergodicity: Time averages should equal ensemble averages
Why Classical Estimators Fail
Regime Switching:
When H switches between regimes (e.g., H=0.3 → H=0.8 at midpoint), classical estimators produce weighted averages of the regime-specific H values, with weights depending on:
Segment lengths
Estimator type (spectral vs. time domain)
Sample size
Continuous Drift:
Linear or smooth H(t) trajectories cause:
Systematic bias toward the time-averaged H̄ = ∫H(t)dt/T
Inflated variance estimates
Poor confidence interval coverage
Structural Breaks:
Level shifts and variance changes create spurious long-range correlations:
Generating Time-Varying H Signals
LRDBench provides nonstationary generators in the generation module:
from lrdbenchmark.generation import (
RegimeSwitchingProcess,
ContinuousDriftProcess,
StructuralBreakProcess,
EnsembleTimeAverageProcess
)
# Regime switching: H=0.3 → H=0.8 at midpoint
gen = RegimeSwitchingProcess(h_regimes=[0.3, 0.8], change_points=[0.5])
result = gen.generate(1000)
# Linear drift: H increases from 0.3 to 0.8
gen = ContinuousDriftProcess(h_start=0.3, h_end=0.8, drift_type='linear')
result = gen.generate(1000)
# Returns dict with 'signal', 'h_trajectory', 'metadata'
Structural Break Detection
Overview
The StructuralBreakDetector class provides multiple tests for detecting
change points that would invalidate stationarity assumptions.
Available Tests
1. CUSUM Test:
Cumulative sum test for mean shifts. The test statistic is:
The maximum absolute deviation is compared against Brownian bridge critical values.
2. Recursive CUSUM:
Sequential/online detection suitable for real-time monitoring. Uses control limits to detect when cumulative deviations exceed threshold.
3. Chow Test:
Tests whether regression coefficients differ before and after a hypothesized break:
where \(RSS_R\) and \(RSS_U\) are restricted and unrestricted residual sums.
4. ICSS Algorithm:
Iterative Cumulative Sum of Squares (Inclán & Tiao, 1994) for detecting variance change points.
Usage
from lrdbenchmark.analysis.diagnostics import StructuralBreakDetector
detector = StructuralBreakDetector(significance_level=0.05)
# Run all tests
result = detector.detect_all(data)
if result['any_break_detected']:
print("⚠️ Stationarity violated!")
print(result['warnings'])
# Individual tests
cusum_result = detector.cusum_test(data)
chow_result = detector.chow_test(data, break_index=500)
Nonequilibrium Physics Considerations
Ergodicity Breaking
In nonequilibrium systems, ensemble averages may differ from time averages:
This occurs in aging systems (e.g., glassy dynamics) and subdiffusive processes. Classical estimators assume ergodicity, leading to systematic errors when violated.
Testing for Ergodicity:
from lrdbenchmark.generation import EnsembleTimeAverageProcess
gen = EnsembleTimeAverageProcess(H=0.7, aging_exponent=0.5)
result = gen.generate_ensemble(n_realizations=100, length=1000)
# Compare ensemble vs time averages
ensemble_mean = result['ensemble_mean'] # Mean across realizations
time_mean = result['time_mean'] # Mean across time for each realization
Aging Effects
Aging manifests as:
Time-dependent diffusion coefficients
Non-stationary waiting time distributions
Power-law decay of relaxation functions
LRDBench’s EnsembleTimeAverageProcess models aging via:
Power-law aging: \(H(t) \propto t^{-\alpha}\)
Logarithmic aging: \(H(t) \propto \log(t)\)
Exponential aging: \(H(t) \to H_{\text{boundary}}\)
Failure Mode Interpretation
Catalog of Classical Estimator Failures
Failure Mode |
Affected Estimators |
Physics Regime |
Detection |
|---|---|---|---|
Bias explosion |
R/S, DFA |
H → 0 or H → 1 |
|Bias| > 0.15 |
Scale sensitivity |
Spectral methods |
Nonstationarity |
Sensitivity > 0.1 |
False positives |
GPH |
Short records (n < 512) |
Type I > 10% |
Heavy-tail breakdown |
All |
α < 2 stable |
Variance overflow |
Ergodicity breaking |
All |
Aging/nonequilibrium |
Ensemble ≠ time |
Interpreting Benchmark Results
When interpreting failure benchmark results:
Compare stationary vs. nonstationary MAE: Relative degradation indicates sensitivity
Check structural break detection: If breaks detected, results are unreliable
Examine H-dependent bias: Estimators often fail more at H boundaries
Consider sample size effects: Short series exacerbate all failure modes
Reporting Guidelines:
Report effect sizes (Cohen’s d) for bias significance
Apply Bonferroni/FDR correction for multiple comparisons
Include 95% CI from bootstrap (≥500 resamples)
Visualize with heatmaps: Estimator × H × Scenario