●most important and frequently used application of
predictive analytics
●long-range and short-range planning for the organization
●forecasting demand for product and service is an important
input for both plannings
●manpower planning, machine capacity, warehouse
capacity, materials requirements planning (MRP) depend
on the forecasted demand for the product/ service
Forecasting: Introduction
●Trend (T
t
) → Consistent long-term upward or downward
movement of data over a period of time
●Seasonality (S
t
) → repetitive upward/downward
movement from the trend that occurs within a year
(seasons, qrtrs, months, etc.)
●Cyclical component (C
t
) → fluctuation around the trend
line due to changes such as recession, unemployment, etc
●Irregular component (I
t
) → white noise or random
uncorrelated changes that follow a normal distribution
with mean value of 0 and constant variance
Components of Time-Series Data
Components of Time-Series Data
Y
t
=T
t
+ S
t
+ C
t
+ I
t
Additive Time-series
Y
t
=T
t
X S
t
X C
t
X I
t
Multiplicative Time-series
●Mean Absolute Error
●Mean Absolute Percentage Error
●Mean Square Error
●Root Mean Square Error
Errors in Forecasting
Moving Average
●Simple Moving Average
●simplest forecasting techniques which
forecasts the future value of a time- series
●uses average of the past ‘N’ observations
●Weighted Moving Average
●W
k
→ weight given to the value of Y at
time k (y
k
) and
Exponential Smoothing
●Assign differential weights to past observations
●SES (Simple ES) → weights assigned to past data decline
exponentially; most recent observations assigned ↑ weights
F
t+1
=??????Y
t
+(1−??????)F
t
Substituting
F
t
recursively:
F
t+1
=??????Y
t
+??????(1−??????)Y
t-1
+ ??????(1−??????)
2
Y
t-2
+...+ ??????(1−??????)
t-1
Y
1
+ (1−??????)
t
F
1
Exponential Smoothing
1. Uses all the historic data, unlike MA, to predict the future value
2. Assigns progressively decreasing weights to older data
1. Increasing ‘n’ makes forecast less sensitive to changes in data
2. Always lags behind trend as its based on past observations
3. Forecast bias & systematic errors occur when observations exhibit strong trend or seasonal patterns
●If data is smooth, we may choose higher value of �
●If data is fluctuating, lower value of � is preferred
●Optimal value: Solve a nonlinear optimization problem
Optimal � in Exponential Smoothing
●SES does not do well in presence of trend
●Introduce addnl eqn for capturing trend in time-series data
●2 equations for forecasting:
○Level (short-term avg)
○Trend
Double ES - Holt’s method
OR
●MA, SES, DES do not handle seasonality component
●Fitted errors => systematic error patterns due to seasonality
●TES → when data has trend as well as seasonality
●3 eqns for forecasting:
○Level
○Trend
○Seasonal
Triple ES - Holt-Winter method
●More appropriate in presence of predictor variables
Here F
t
is the forecasted value of Y
t
, and X
1t
, X
2t
, etc. are the
predictor variables measured at time t
Regression
Forecasting in presence of seasonality
●The initial ARMA & ARIMA models ⇒ Box & Jenkins in 1970
●auto-regression ⇒ regression of a variable on itself measured
at different time periods
●AR model assumption: Time-series is a stationary process
○The mean values of Y
t
at different values of t are constant
○The variances of Yt at different time periods are constant
○Covariances of Y
t
& Y
t-k
for different lags depend only on
k
●Non-stationary data ⇒ stationary before applying AR
AR, MA and ARMA
●Auto-regressive model with lag 1, AR(1), is given by
OR
AR models
� can be
estimated
using OLS
●Auto-regressive model with lag 1, AR(1)
●Auto-regressive model p lags, AR(p)
AR models (contd)
Forecast
●Q: How to identify the value of ‘p’ (number of lags)?
●Ans: Auto-correlation function(ACF) & Partial ACF
●Auto-correlation ⇒ memory of a process
●Auto-correlation of k-lags (correlation between Y
t
and Y
t-k
) is:
●A plot of auto-correlation for different values of k ⇒ ACF
●Partial auto-correlation of lag k (??????
pk
) ⇒ correlation b/w Y
t
& Y
t-k
w/o influence of all intermediate values (Y
t−1
, Y
t−2
, ..., Y
t−k+1
)
●Plot of partial auto-correlation for different values of k → PACF
AR model identification: ACF & PACF
AR model identification: ACF & PACF
The null hypothesis is rejected when ??????
k
>1.96/ sqrt(n) and ??????
pk
>1.96/ sqrt(n)
Thumb-rule: The number of lags is ‘p’ when:
●Partial autocorrelation, ??????
k
> 1.96 / sqrt(n) for first p values & cuts off to 0
●The auto-correlation function (ACF), ??????
k
, decreases exponentially
●Past residuals are used for forecasting future values of the
time-series data
●MA process is different from MA technique
●MA process of lag 1, MA(1) is given by:
●MA process with q lags, MA(q), is given by:
MA Process MA(q)
MA Process MA(q)
ARMA(p, q) process
●Can be used only when the time-series data is non-stationary
●ARIMA has the following three components:
○Auto-regressive component with p lags AR(p)
○Integration component I(d)
○Moving average with q lags, MA(q)
●In addition to ACF plot, Dickey−Fuller or augmented
Dickey−Fuller tests can check the presence of stationarity
ARIMA process
ARIMA process
●Consider AR(1) process as below:
●AR(1) process can become very large when � > 1 and is
non-stationary when |� | = 1
●DF is a hypothesis test with H
0
and H
A
as below:
●AR(1) ⇒
Tests: Dickey Fuller Test
●DF test is valid only when residual ??????
t+1
follows a white noise
●When ??????
t+1
is not white noise ⇒ series may not be AR(1)
●To address this, augment p-lags of dependent variable Y
Tests: Augmented DF Test
●1
st
step in ARIMA → identify order of difference (d)
●Factors for non-stationarity: Trend & Seasonality
●Trend stationarity: Fit a trend line and subtract from time series
●Difference stationarity: Difference the original time-series
○1
st
difference (d = 1), ▽Y
t
= Y
t
- Y
t-1
○2
nd
difference (d=2), ▽
2
Y
t
= ▽(▽Y
t
) = Y
t
- 2Y
t-1
+ Y
t-2
Non-stationary ⇒ Stationary process
ARIMA(p,d,q) model building
●Stage 1: Model Identification
○Refer flowchart
●Stage 2: Parameter Estimation & Model Selection
○Estimate coefficients in AR & MA components using OLS
○Model selection criteria: RMSE, MAPE, AIC, BIC
AIC & BIC ⇒ distance measure between actual & forecasted values
AIC = −2LL + 2K BIC = −2LL + K ln(n)
●Stage 3: Model Validation
○Should satisfy all the assumptions of regression
○The residual should be white noise
ARIMA(p,d,q) model building
●Comparison b/w Naïve forecasting & developed model
●Naïve forecasting model: F
t+1
= Y
t
●Theil’s coefficient (U-statistic) is given by:
●U < 1 ⇒ forecasting model is better than Naïve model
●U > 1 ⇒ forecasting model is not better than Naïve model
Power of Forecasting Model: Theil’s coeff
Recap
●Introduction
●Components
●Errors
●Moving Average
●Exponential Smoothing
●Regression
●ARIMA
●Tests
●Power of Forecasting model: Thiel’s coefficient