Forecasting time series powerful and simple

ivoandreev 454 views 32 slides Jan 21, 2022
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Time series are a sequence of data points positioned in order of time. Time series forecasting has two main purposes - to understand the mechanisms that lead to rise or fall, and to predict future values. Very often it analyses trends, cyclical events, seasonality and has unique importance in Econom...


Slide Content

January 15th
GLOBAL AI BOOTCAMP IS POWERED BY:
Powerful yet Simple
(or not that much)
Forecasting Time Series
AI and IoT Bulgaria Summit, 2022

Speaker Bio
•Software Architect @
o19+ years professional experience
•Microsoft Azure MVP
•External Expert Horizon 2020, Eurostars-Eureka
•External ExpertInnoFundDenmark, RIF Cyprus
•Business Interests
oWeb Development, SOA, Integration
oIoT, Machine Learning, Computer Intelligence
oSecurity & Performance Optimization
•Contact
[email protected]
www.linkedin.com/in/ivelin
www.slideshare.net/ivoandreev

Thanks to our Sponsors

Upcoming Events
Global Azure Bulgaria, 2022
May 14, 2022
Tickets (Eventbrite)
Sessions (Sessionize)

Agenda
•Time Series?
•Forecasting?
•ML.NET
•Azure ML Service
•ARIMA/AutoARIMA
•Regression
•FB Prophet
•Demo

Takeaways
Time Series
oIntroduction to Hierarchical Time Series
oOverview of Time Series Forecasting Models
oTime Series Analysis with Python
ARIMA
oTime Series Forecasting with ARIMA models
oARIMA, Auto ARIMA, Prophet, Regression(Youtube)
SSA
oA Brief Introduction to SSA
oForecast Service Demand with Time Series Analysis and ML.NET
FB Prophet
oFB Prophet Quickstart(FB GitHub)
oTime Series Analysis using FB Prophet
oGenerate Accurate Forecasts with FB Prophet in Python

Time Series –a sequence of
observations taken over time
Forecasting–the process of
predicting for new data

Describing or Forecasting
•Data are Temporal
oUnlike other data, the fact that a point is close to another is important
•Sample Data Look like…
•Time Series Analysis
oUnderstandingTime Series and underlying causes
oCreate a mathematical model that describes data
oDetermine seasonal patterns, trends, relations to external factors
oNote:assumptionsare often in place (i.e. the form of data)
•Forecasting
oScientific predictions based on historical time-stamped data
oUnivariate/ MultivariateTS Forecasting
oNote:Explanatory power is often low
Time Value
2021-11-01T00:00:00+02:00 66
2021-11-01T01:00:00+02:00 29
2021-11-01T02:00:00+02:00 6
2021-11-01T03:00:00+02:00 8
2021-11-01T04:00:00+02:00 91
2021-11-01T05:00:00+02:00 145
2021-11-01T06:00:00+02:00 14
2021-11-01T07:00:00+02:00 19
2021-11-01T08:00:00+02:00 64
2021-11-01T09:00:00+02:00 4
2021-11-01T10:00:00+02:00 22
2021-11-01T11:00:00+02:00 65
2021-11-01T12:00:00+02:00 30
2021-11-01T13:00:00+02:00 152
2021-11-01T14:00:00+02:00 30
2021-11-01T15:00:00+02:00 17
2021-11-01T16:00:00+02:00 9
2021-11-01T17:00:00+02:00 11
2021-11-01T18:00:00+02:00 19
2021-11-01T19:00:00+02:00 76
2021-11-01T20:00:00+02:00 117
2021-11-01T21:00:00+02:00 152
2021-11-01T22:00:00+02:00 53
2021-11-01T23:00:00+02:00 3
2021-11-02T00:00:00+02:00 13

Practical Use Cases
•Sample Data Sources
oSensorreadings (environmental data, temperature, pressure, humidity)
oFinancialmarketdata
oMedical data (body parameters, heartbeat, pulse rate, blood pressure)
•Sample Scenarios
oUnit salesfor each day in a store
oNumber of passengerson a station
oNumber of usersof a web site
oLiters of usageof hot water in a household
oStocks pricefor a day
oDiesel pricefor the next week
oWater levelof a dam during the year
oBody weightover the year ☺

Time Series come in
various flavourtypes

Hierarchical Time Series Forecasting
•Hierarchical TS
oEvident hierarchical structure
oLower levels are nested(i.e. geographical split)
•Grouped TS
oMultiple non-nestedlevels of detail (i.e. category, retailer, colour)
•Hierarchical Forecasting
oA collection of techniquesrather that another methodology
oGenerate forecast that is consistent across the whole hierarchy
oForecasts shall add up
•Approaches
oBottom up, Top-down
oMiddle-out(Mixed) –Bottom-up (above middle), Top-down (below middle)
oReconciliation–each level independently, Determine coefficients with linear regression
Bulgaria
East
VarnaBurgas
West
Sofia

Quacks like Time Series, Moves like …
•Do you have enough data?
oMore data = more optionsfor aggregation, model tuning, model testing
•Time horizon for prediction?
oShorter time horizon can be predicted with higher confidence
•Are forecasts updateable or static?
oRetrainafter new data are available for more accurate results
•Frequency of forecasts?
oDownsamplingand upsamplingof data affect accuracy (in both directions)
•Is time series stationary?
oTime series properties do not depend on observation time?

Time Series Stationarity
•Stationarity
oStatistical properties of TS do not depend on time of observation (mean, variance)
oRule: Non-stationary data are unpredictableand cannot be forecasted
oConclusion: Non-stationary TS data need to be converted to stationary
•Differencing
oMethod to transformtime series and remove time-dependent attributes (trend, seasonality)
oLag difference could be calculated on a larger time window (i.e. window size)
Note:Some TS forecasting methods do not require stationarity (i.e. ARIMA), as
preliminary differencing is performed. (ARMA does though)
difference(t) = observation(t) -observation(t-1)
Example: 1 2 3 4 5 6 7 8 9 10
Differencing: 1 1 1 1 1 1 1 1 1
inverted(t) = differenced(t) + observation(t-1)

Time Series Analysis
Observations close in time are
often correlated

Time Series Analysis
TS Analysis provides techniques to understand data and break into components:
•Trend (Tt)
oSmooth general long term tendency to increase, decrease or both
•Seasonality(St)
oRhythmic forces operate on smaller intervals (i.e. 1h, 1d, 1w, 1m)
•Cyclic(Ct)
oCyclic behaviourthat repeats over a long period (i.e. 4y, 1y)
•Random Noise(Rt)
oRandom irregular observations that cannot be explained (unpredictable)
Additive Model: Yt= Tt + St + Ct + Rt
Multipl. Model: Yt= Tt * St * Ct * Rt
Mixed Model: Yt= Tt * Ct + St * Rt; Yt= Tt + St * Ct * Rt

Advanced
Observation: Time series tend to display significant autocorellation
•Correlation
oMeasures the relationship between TS and a lagged version of it (T, T-k)
oMeaning: ±1 -perfect correlation; 0–no correlation
•Measured with Pearson Correlation
oPreconditions: normal distribution, no significant outliers, continuous variables
oCross-correlation-the correlation is observed across different lags
•Augmented Dickey-Fuller Test (python adfullerfunction)
oNull hypothesis (H0) –the TS has a unit root (non-stationary)
oAlternate hypothesis (HA)–the null hypothesis is rejected
•ADF p-value < 0.05
•H0 rejected = TS is stationary

Common Data Preparation
•Imputation
oReplacing missing data with substitute values
•Frequency / Resampling
oCould be too high for a model compared to prediction front
oIrregular time series may require resampling at regular intervals
•Outliers
oExtreme valuesneed to be identified and handled
oOutlier = Value ∉[Q1-1.5*IQR; Q3+1.5*IQR]
Does missing data have
meaning?
NO
Type of data
Large dataset, little
data missing at
random:
Remove instances with
"missing "? data
Does data follow simple
distribution?
NO
Impute with simple ML
model
YES
Impute with mean value
YES, with outliers
Impute missing values
with median
Large, temporary
ordered dataset:
Replace data with
preceding values
YES: Numerical
Convert missing values
to meaningful number

Forecasting Algorithms
Appreciate how genius was
made simple for you

Naïve Algorithms Baseline
Note:Naïve algorithms are often referred to as “benchmark models”
Naïve Model
•Forecasts for any horizon match the last value
SNaïveModel(Seasonal Naïve)
•Assumes a seasonal component with time window T
•Forecast matches the last T timestamps

ARIMA (AutoRegressiveIntegrated Moving Average)
•Auto Regressive-linear combination of past values of the variable
oAssume that future will resemble the past
oInaccurate when an unseen event happens
•Moving Average -linear combination of past forecast errors.
oSmoothimpacts of short-term fluctuations
oSimpleMA –arithmeticmean of the previous 5,10,20,100 etc. values
oExponentialMA -weighted average that gives greater importance to the most recent values
•Integrated–Differencingfor stationary time series
•ARIMA Parameters
op –number of observationsfrom the Pastto forecast future
od –degree of Differencing(number of times raw observations are differenced for stationarity)
oq –size of the window to calculate forecast Qualityerrors
ARIMA(p,d,q) = const + (weighted sum last P values) + (weighted sum of last Q errors) after D differencing

SeasonalARIMA
•ARIMA(p,d,q) is a non-seasonal ARIMA
•SARIMA (p, d, q, P, D, Q)
o P -number of seasonal autoregressive terms,
o D –differencing order (number of transformations to make TS stationary)
o Q -moving-average order of seasonal component
o m –periodsin a season(i.e. 12 for monthly data)
•The parameter space becomes larger
•Grid search for optimal parameters

AutoARIMA
•Identifies the most optimal parameters of ARIMA(p, d, q)
opip install pyramid-arima(mimics R auto.arima)
o.fit() does a magic
oUtilizes AIC (Akaike Information Criterion) to pick best model (smaller = better)
•N*ln(SSe/N)+2K–N (N-number of observations, SSe-SumSquareErrors, K –model parameters)
•Conducts differencing tests to determine the order of differencing
•Pros
oSaves time
oOne of the simplest techniques for TS forecasting
oEliminates the need of in-depth statistics understanding
oReduces the chance of human errordue to misinterpretation
model = auto_arima(train, [42 other optional arguments])
model.fit(train)

Singular Spectrum Analysis (SSA)
•Novelpowerful technique
•2 complementary stages
oDecomposition-extract independent components from time series
oReconstruction–reconstruct the series for forecasting, after removing noise
•Pros
oWorks with arbitrary statistical process
oNo assumptions for data (i.e. stationarity)
•ML.NET ForecastBySsaParameters
otrainSize–number of train samples (rows) from beginning (i.e. 300)
oseriesLength–length of series in buffer (how much data to use to train on)
owindowSize–length of the windowon the series (seasonality)
ohorizon–number of values to forecast (i.e. 24)
oconfidenceLevel–degree of certainty (i.e. 95% of estimates to contain the real)

SSA, How it Works
•How does it work
•Checkpoint
oAvoids replay of all previous data, provide only most recent observations
oBut if this creates a drift, a clean retrain on last observations (i.e. 1 month) may be better
MLContextmlContext= new MLContext(); //All ML.NET operations are within context
IDataViewdv = mlContext.Data.LoadFromTextFile(…) //Step 1: Load data from file
var pipeline = mlContext.Forecasting.ForecastBySsa([Parameters],…) //Step 2: SSA Pipeline
SsaForecastingTransformerforecaster = pipeline.Fit(dv); //Step 3: Data training
… //Step 4: Evaluate (i.e. calculate RMSE)
var forecastEngine= forecaster.CreateTimeSeriesEngine(mlContext);
ModelOutputforecast = forecastEngine.Predict(); //Step 5: Load trained model and predict
forecastEngine.CheckPoint(mlContext, outputModelPath); //Save Checkpoint
model = mlContext.Model.Load(file, out DataViewSchemaschema); //Load from Checkpoint
forecastEngine= model.CreateTimeSeriesEngine<TimeSeriesData, ChangePointPrediction>(mlContext);

Regression Model
•Forecasting Recap
oData are ordered in series as {Time: Value} pairs; No external knowledge
•Regression
oPredicting a single numeric value
oTime Series Forecasting involves Regression under the hood
oCan be applied to non-ordered data
oShall be applied multiple times to predict the same horizon
•Feature Engineering& Extraction
oDate–Year, Month, Day, Hour
oLag–What has happened at T-1, T-2, T-12, T-24, T-48, T-n observations
oDelta–What is the difference from T-1, T-2, T-12, T-24, T-48, T-n observations
oMoving Average –Mean(2), Mean(12), Mean(24), Mean (48), …
oSum–Sum(2), Sum(12), Sum(24), Sum(48),…
oDomain knowledge –Weather, Distance (not GPS), Ref. Price

Azure ML Service
•Azure Auto ML(Forecasting uses AutoARIMAunder the hood)
oThe easieststill powerfulway to do ML
oOptimizesthe iterative time consuming tasks of ML
oAzure Auto ML Python SDK
oAzure ML Studio –(ML Studio Classic retires August 2024)
UploadFile Select Task Type Parameters Metrics

•Created by in 2017
•Pros
oTrains quickly, highly accurate
oNo background required (like AutoARIMA)
oCan also be used for multivariate TS analysis
oHandles outliersand missing data well
oStrong at series with seasonal effects and few seasons in training data
oHandles random changes due to special events (i.e. market events)
•Under the Hood
oRequires prophet Python package
oUses additive regression model
Y(t) = Trend(t) + Seasonality(t) + Holiday(t) + Error(t)

•Prophet does not run on Python 3.9
•What’s the easiest
•Install Azure Data Science VM (< 4 Cores is sluggish)
•Find the 3.8 Kernel from JupyterLab
•Activate kernel
•Use Condapackage manager to install
•Condahas own C++ compiler to build the packages
•Select a channel
Prophet –Easy to Use, Hard to Install
C:\> activate py38_default
(py38_default) C:\> condainstall pystan-c conda-forge
(py38_default) C:\> condainstall -c conda-forge fbprophet

Demo
Ref. Comparing Prophet and Deep Learning to ARIMA in
Forecasting Wholesale Food Prices (2021)

Thanks to our Sponsors