[20240722_LabSeminar_Huy]WaveForM: Graph Enhanced Wavelet Learning for Long Sequence Forecasting of Multivariate Time Series.pptx

thanhdowork 89 views 18 slides Jul 23, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

WaveForM: Graph Enhanced Wavelet Learning for Long Sequence Forecasting of Multivariate Time Series


Slide Content

Quang-Huy Tran Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-07- 2 2 WaveForM : Graph Enhanced Wavelet Learning for Long Sequence Forecasting of Multivariate Time Series Fuhao Yang et al. AAAI -2023 : The Thirty-Seventh AAAI Conference on Artificial Intelligence

OUTLINE MOTIVATION METHODOLOGY EXPERIMENT & RESULT CONCLUSION

MOTIVATION Multiple interconnected streams of data, or Multivariate time series (MTS), have pervasive presence in real world applications : Recorded traffic flows from sensors, weather observations from weather stations, etc . Forecasting based on historical MTS observations for making meaningful and accurate application-wide predictions. Overview and Limitation Challenges: Existing work still overlooks long sequence forecasting (LSF) of MTS: Uses a given length of MTS to predict longer future sequence. Long-term MTS are often composed of more entangled temporal patterns than short-term. Overlooking leads unreliable discoveries of temporal dependencies. Transformer-based have proven effectiveness in modeling sequential data. Suffer from high computational cost in LSF.

INTRODUCTION P ropose a global graph constructor to extract global information on the interrelationship among variables in the wavelet domain : P reventing training from overfitting . Contribution P ropose a DWT-based end-to-end framework: T ransforms MTS into a wavelet domain for MTS long sequence prediction tasks. C apable of fully exploiting the inherent features of MTS in both frequency and time domains.

METHODOLOGY Problem Definition Given a n MTS : N -variate time series. : time series of the i-th variable, which consists of sequential recordings at T timestamps.   Problem: given An observation window H for historical time series and a forecasting window P for prediction. where is learnable parameter set.   Long sequence MTS forecasting: . Goal is to learn a mapping function : predicting the next P timesteps from H  

METHODOLOGY Main Architecture

METHODOLOGY Discrete Wavelet Transform Module and Its Inverse Version DWT module: Transforms an input MTS into its corresponding multi-scale frequency representations with DWT . Used to decompose input signals into a set of wavelets: capture frequency and time features of original signals . E ach DWT uses a high-pass filter h and a lowpass filter g to decompose a time series signal x into different r esolutions: , is the - th decomposition and .   where represents the length of after decomposing (l − 1) times and s represents the scale  

METHODOLOGY Discrete Wavelet Transform Module and Its Inverse Version DWT module: After l layers of decomposition, for each , outputs a set of coefficients . Let represent the layered wavelet coefficients:   where H is length of the input MTS and N is number of variables.   After following graph-enhanced modules, output the i-th variable’s coefficients for future P time steps apply Inverse Discrete Wavelet Transform (IDWT) to reconstruct corresponding sequence in the time domain   where are the synthesis version of h and g.  

METHODOLOGY Global Graph Constructor (GGC) After obtaining the wavelet coefficients at different scales : Model intends to forecast the coefficient changes overtime in the wavelet domain. assume the variables share the same basic interaction structure at different resolutions. Using a global graph rather than learning graphs in each GP module also avoids overfitting and saves memory. Apply graph structure learning to learn two embedding representations for each node after assigning each node/variable as an integer scalar. Variable representations obtained from 2 different layers: , Adjacency matrix:   where and is the hyper-parameter for activation function.  

METHODOLOGY Graph-Enhanced Prediction Modules Given the learnable adjacency matrix : Build Graph-enhanced Prediction (GP) modules to exploit the graphical information for predictions. where is dilated convolution operator and is sigmoid function .   Dilated Convolution Component : Input through stacked 1D dilated convolution: filter wavelet coefficients to incorporate wavelet information. Utilize multiple dilated convolution filters with different kernel sizes to capture respective features for wavelet coefficients at each level of resolutions.

METHODOLOGY Graph-Enhanced Prediction Modules Given the dilated convolution component’s output Z, the process of graph convolution component: where is representations output from the previous layer and is a hyperparameter controls the proportion of information maintained from the previous representation.   Graph Convolution Component : aggregate node information with its neighbors’ information to capture global dependencies among different variable. To mitigate over-smoothing of GCN, utilize MixHop layer to capture complex relationshipsof neighbors at various hops. The process of K-layer MixHop :

METHODOLOGY Graph-Enhanced Prediction Modules Skip Connection and Output : Improve representational capability by preserving original information. Given wavelet coefficients , initialize 2 factors:   where is a 1×1 convolution kernel in GP and is a 1×L convolution kernel for a skip connection layer.   Input to pass through a K-layer stacked GP modules: The skip-output and other output factor representations of current GP module: where is a hyperparameter to control the balance.  

EXPERIMENT AND RESULT EXPERIMENT SETTINGs Dataset: Electricity, Traffic, Weather and Solar-Energy. Baselines: Deep Learning: LSTM, Transformer, Informer [1], and Autoformer [2]. STGNN: GraphWaveNet [3] and MTGNN [4] . [1] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021, May). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 12, pp. 11106-11115). [2] Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer : Decomposition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems, 34, 22419-22430. [3] Wu, Z., Pan, S., Long, G., Jiang, J., & Zhang, C. (2019). Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121. [4] Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., & Zhang, C. (2020, August). Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 753-763). Measurement : Mean absolute error (MAE) and mean squared error (MSE).

EXPERIMENT AND RESULT RESULT – Overall Perfor mance

EXPERIMENT AND RESULT RESULT – Ablation Study

CONCLUSION P roposed WaveFor M, a novel framework for long sequence multivariate time series forecasting U se DWT to transform the time-domain series into wavelet-domain coefficients at multiple resolutions . Apply a graph convolution module to model the relationships between multivariate. Summarization T he transformed coefficients in the wavelet domain are more capable of describing the input series from multiple resolutions allowing the model to learn fine-grained complex patterns.