CIA – 3B: Time Series Analysis on Sunspots By – Srishti Srivastava Siddharth Menon Submitted To – Dr. Ashish Sharma Dr. Stephen Raj
Table of contents Description of the dataset 01 About the Dataset Methods Used Methods used for making inferences 02 Analysis Analysis of the output obtained 03 Conclusion & Suggestions Conclusion derived with suggestions for improvement 04
About the Dataset 01 Description of the dataset
Sunspots
Sunspots, as seen in the above image are dark areas on the surface of the sun. Studying sunspots is crucial due to their potential impact on Earth’s environment and climate. About Why to study? https://data.world/hyuto/sunspots/workspace/file?filename=Sunspots.csv Time series analysis plays a vital role in understanding sunspot cycles and their implications for Earth’s climate. Data Source Use of Time Series Analysis
Methods Used & Analysis 02 Methods used for making inferences & Analysis of the outputs obtained 03 &
Conversion to Time Series To begin with the analysis of the data we convert the data into time series data and obtain a graph plot, A visualization of the Wolf sunspot number, a measure of solar activity, from 1750 to 1950. The graph shows a cyclical pattern, with sunspot numbers rising and falling over time. The graph suggests a possible long-term increase in sunspot numbers over the two centuries shown.
Multiplicative Decomposition Multiplicative decomposition is preferred when the magnitude of seasonality varies with the level of the series. The trend component shows a clear upward slope over time. The seasonal component seems to fluctuate around a constant value throughout the year. Since the data has a trend and potentially seasonal variations, it’s likely non-stationary.
Autocorrelation Function (ACF) Plot Gradually decreasing lines in an ACF plot suggest weak stationarity or a series close to being stationary. While this implies a weakening dependence on past values, it doesn’t guarantee complete independence. If the ACF lines gradually decrease towards zero at all lags, it suggests that the correlations between the current value and its lagged values are fading out as the lag increases.
Partial Autocorrelation Function (PACF) Plot The lines outside the confidence interval represent lags where the correlation might be statistically significant. If there are many spikes exceeding the confidence interval, it suggests that the current value might be influenced by past values at multiple lags. This indicates the need for a potentially higher-order AR (Autoregressive) model in your ARIMA analysis.
Augmented Dickey-Fuller (ADF) Test Using Augmented Dickey-Fuller test for stationarity based on the ADF test results, with a p-value of 0.01 (which is smaller than the commonly used significance level of 0.05),you have evidence to reject the null hypothesis of non-stationarity and conclude that the series data_sun is stationary.
Fitting of ARIMA model The model is an ARIMA(2,1,2) model. The coefficients represent the estimated parameters of the ARIMA model. Standard errors ( s.e. ) are provided for each coefficient estimate. The goodness of fit of the model has also been measured. The model’s information criteria include AIC, AICc and BIC.
Forecasting
Residual Analysis
Conclusion & Suggestions 04 Conclusion derived with suggestions for improvement
Conclusion The ARIMA(2,1,2) model fitted to the sunspots time series data provides a reasonably good fit to the observed patterns. The model captures the autoregressive and moving average dynamics of the data, as indicated by the estimated coefficients and their significance. The information criteria suggest that the model adequately balances goodness of fit with model complexity. However, there may still be room for improvement in capturing certain nuances of the data.
Suggestions Model Evaluation Model Refinement Incorporate External Factors Long-term Trends Further Research Use of SARIMA Model