AGENDA Introduction Causes of autocorrelation Consequences of autocorrelation Detecting autocorrelation Remedial Measures 2
Definition : autocorrelation measures the similarity between observations of a time series at different time lags. It assesses how a variable is correlated with its past values, providing insight into the internal structure and patterns within the data. It is the correlation of a signal with a delayed copy of itself. It quantifies the similarity between observations as a function of the time lag between them. Purpose : Used to identify repeating patterns, trends, or cycles in time series data. By analyzing autocorrelation, you can determine if and how past values influence future values, which is crucial for forecasting and understanding temporal dependencies. 4
Crop Yield Analysis : Imagine a farmer tracking the yield of wheat over several years. By calculating the autocorrelation of annual yield data, the farmer can determine if high or low yields in one year tend to be followed by similar yields in subsequent years. For instance, if high yields in one year are often followed by high yields in the next year, the autocorrelation value for a lag of one year would be high. This information helps in predicting future yields and making informed decisions about crop management and resource allocation. 5
Weather Pattern Analysis : In agriculture temperature and precipitation significantly affect crop growth. Suppose researchers analyze monthly temperature data over several decades to study its effect on crop performance. By examining the autocorrelation of temperature data, they can identify if periods of unusually high or low temperatures tend to recur in a cyclical pattern. For example, if there's a strong positive autocorrelation for a lag of 12 months, it suggests that the temperature in one month is likely similar to the temperature in the same month of the following year. Understanding these patterns helps farmers plan for seasonal variations and adapt their farming practices accordingly.
7
1. Mis-specified Model : When the model used does not adequately capture the underlying data generation process, such as omitting important variables or using an incorrect functional form, residuals can exhibit autocorrelation. For example, if a time series data has a seasonal pattern and this is not accounted for in the model, the residuals might show autocorrelation. 2. Measurement Errors : Errors in data collection or recording can introduce autocorrelation. For example, if a time series has errors that are correlated over time, these errors can cause the residuals to be autocorrelated as well 3. Seasonality : Natural cycles or seasonal effects in the data that are not accounted for in the model can lead to autocorrelation . For instance, monthly sales data might have annual seasonal patterns that, if not modeled, result in autocorrelation. 4. Inadequate Data Transformation : If the data is not properly transformed to account for trends or other non-stationarities, residuals can show autocorrelation. For example, if a time series has a strong trend and this trend is not removed, the residuals might exhibit autocorrelation. 8
9
Inefficient Estimates : Effect : Parameter estimates remain unbiased but are no longer efficient. Impact : Standard errors are underestimated, leading to overconfident statistical inferences. 2. Incorrect Inferences : Effect : Confidence intervals and hypothesis tests may be misleading. Impact : Increased risk of Type I and Type II errors.
3. Model Specification Issues : Effect : Autocorrelation can signal that the model is missing key variables or the model structure is incorrect. Impact : Leads to misinterpretation of the underlying data relationships and potentially poor predictive performance. 4. Violation of Assumptions : Effect : Violates the assumption of independence in classical regression models. Impact : Impairs the validity of the regression model’s results and predictions.
12
Plotting the Time Series Data : Visual inspection of the time series plot can sometimes reveal patterns or periodicity that suggest autocorrelation. 2. Autocorrelation Function (ACF) : Calculate and plot the autocorrelation function, which measures the correlation of the time series with its own lagged values. Significant spikes at certain lags indicate autocorrelation. 3. Partial Autocorrelation Function (PACF) : The PACF measures the correlation between the time series and its lagged values, controlling for intermediate lags. This helps identify the direct effect of each lag.
4. Correlogram : A correlogram is a plot of the autocorrelation coefficients (ACF) versus the lag. It visually helps in identifying the presence and extent of autocorrelation. 5. Ljung -box test : This statistical test assesses whether there are significant autocorrelations at lags up to a certain number. It’s useful for checking if a time series is white noise. 6. Durbin- watson test : Specifically used in regression analysis to test for the presence of autocorrelation in the residuals of a regression model.
7. Runs Test : A non-parametric test that assesses whether a sequence of data points is randomly distributed. It can help in detecting patterns that suggest autocorrelation. 8. Breusch-Godfrey Test : An extension of the Durbin-Watson test that allows for testing higher-order autocorrelation. 9. Box-Pierce Test : A test similar to the Ljung -Box test, used to determine if there are significant autocorrelations at multiple lags.
10. Visual Inspection of Residuals : In regression models, analyzing the residuals (differences between observed and predicted values) for patterns can help detect autocorrelation. 11. Spectrum Analysis : Analyzing the frequency spectrum of a time series can reveal periodic components that indicate autocorrelation. 12. Time Series Decomposition : Decomposing a time series into trend, seasonal, and residual components can help identify autocorrelation within the residual component.
The Durbin Watson d- test or d-statistic (DW d-statistic) It is the ratio of the sum of squared differences in successive residuals (numerator) to the RSS (denominator) . Note that in the numerator, summation goes from t=2 to n. NOT from 1 to n because first observation is lost in taking successive differences. (There is no et-1 for first observation) Also note that it is based on the estimated residuals.
The regression model includes the intercept term. That is, it cannot be used if the regression is through the intercept. The explanatory variables are nonstochastic, or fixed in repeated sampling. The regression model does not include the lagged values of the dependent variable as explanatory variables. For e.g. In the model DW test cannot be applied as lagged Y term is an explanatory variable. The disturbance term ut is normally distributed The disturbances are generated as follows: i.e. value of disturbance term at time t depends on its value in time period (t-1) and purely random term vt. The extent of dependence on past value is measure by ρ called as Coefficient of Autocorrelation.
DW d-test What will be the range of values d-statistic can take Since -1 ≤ ≤ 1, so 0 ≤ d ≤ 4. If d is closer to 0, evidence of positive autocorrelation ( 0-2) If d is closer to 4, evidence of negative autocorrelation (2-4) If d is closer to 2, evidence of NO autocorrelation.
Durbin- watson have derived a lower bound d l and an upper bound d u corresponding to different n and k. 2 sets of tables: at 5% and 1% level of significance. By comparing computed d with these lies outside these critical values, a decision can be made regarding the presence of positive or negative serial correlation.
21
Extract from DW tables N: no. of observations and k’= no. of explanatory variables excluding the intercept
DW D-TEST: DECISION RULE TRICK 1. Use + ve autocorrelation hypothesis if computed d lies between 0 and 2. 2. Use - ve autocorrelation hypothesis if computed d lies between 2 and 4 Note: dL and du come from Tables. Then we calculate 4-dL and 4-dU. Note the ranges: where to reject / not reject. If d computed lies in the indecision zones, then we cannot conclude anything from this test. We can use some other test.
24 DW D-TEST DECISION RULE
Steps to run the test Run the OLS regression and obtain the residuals. Compute d. For the given sample size and given number of explanatory variables, find out the critical d L and d U from Durbin-Watson tables. Compute 4-dL and 4-du. Follow the decisions rule. 25
Remedial Measures 26
Differencing : This involves subtracting the previous observation from the current observation to remove trends and seasonality. Seasonal Differencing : If autocorrelation is seasonal, you can subtract the observation from the same season in the previous cycle. For example, for a monthly series, subtracting the value from 12 months ago. Transformation : Applying transformations like log, square root or Box-Cox transformations can stabilize variance and sometimes reduce autocorrelation.
Adding Explanatory Variables : Sometimes autocorrelation is due to omitted variables. Including relevant explanatory variables that capture the underlying process can help reduce it. Filtering : Apply filters or smoothing techniques to the time series data to remove noise and reduce autocorrelation. Resampling : If the data has a high-frequency component, consider resampling to a lower frequency that might be more appropriate for your analysis. 28