Time Series Differs from Standard ML: Order Defines Everything
Unlike regular ML where rows are independent and shuffling preserves learning, time series observations depend on predecessors—yesterday's temperature shapes today's. Shuffling destroys meaning, as shown in electricity consumption: ordered data reveals rising trends, annual/weekly seasonality; randomized noise hides them. Never shuffle or random-split time series; use chronological train/test splits.
Classify data types to guide prep: univariate (e.g., stock prices, rainfall) tracks one variable; multivariate (e.g., temp/humidity/wind) captures interactions. Regular series have fixed intervals (hourly/daily); irregular have uneven timestamps (transactions). Most data science work uses discrete series at specific points, not continuous streams.
Core components drive behavior: trend (long-term up/down/flat); seasonality (fixed-period repeats like December sales spikes); cyclicality (repeating without fixed period, e.g., economic booms); noise (unpredictable residuals); lags (past values as predictors, e.g., lag-1 = yesterday, lag-7 = last week).
Stationarity Unlocks Reliable Modeling
Stationarity—constant mean, variance, autocovariance over time—is assumed by ARIMA/VAR/SARIMA. Non-stationarity from trends (e.g., inflation), seasonality (summer peaks), breaks (pandemics), or variance shifts (financial crises) yields misleading forecasts.
Test with Augmented Dickey-Fuller (ADF): null = non-stationary (unit root); reject if p<0.05.
Stabilize by cause:
- Differencing: First-order y'(t)=y(t)-y(t-1) removes linear trends; second-order for quadratics; seasonal y'(t)=y(t)-y(t-period) for cycles.
- Log transform: Handles exponential growth/variance increase, converting multiplicative to additive (e.g., log returns = % changes in finance).
- Detrending: Subtract fitted trend (regression for linear, HP/STL for complex).
These yield stationary residuals ready for modeling, preventing garbage-in-garbage-out.
Smooth, Autoregress, and Diagnose for Insights
Rolling averages smooth noise to expose patterns: window size trades detail for clarity—7-day catches weekly wiggles, 90-day reveals annual trends. Use as features (rolling mean/std/max over 7/30 days boosts predictions).
Smoothing variants weight data: SMA equal-weights all in window; WMA prioritizes recent; Exponential (EMA/EWM) decays weights via alpha (high=responsive, low=smooth). Holt's adds trend equation (alpha level, beta trend); Holt-Winters includes seasonality.
Autoregression (AR(p)) predicts y(t) from p past values: y(t)=c + φ1y(t-1)+...+φpy(t-p)+error. Correlations decay with lag, strongest at lag-1.
ACF plots raw lag correlations (high lag-1/7 signals trend/seasonality); PACF isolates direct links, cutting intermediate effects. Read: ACF tail-off = AR, cut-off = MA; PACF opposite. Bars beyond blue confidence bands are significant; inside = noise. Guides model order (e.g., AR(2): PACF significant to lag-2, then drops).