[20240703_LabSeminar_Huy]MakeGNNGreatAgain.pptx

thanhdowork 85 views 17 slides Jul 04, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction


Slide Content

Quang-Huy Tran Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-07-03 Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction Yicheng Zhou et al. IJCAI- 202 4: The 33 rd International Joint Conference on Artificial Intelligence

OUTLINE MOTIVATION METHODOLOGY EXPERIMENT & RESULT CONCLUSION

MOTIVATION Graph Neural Networks (GNNs) have demonstrated exceptional performance in Traffic speed prediction . Effectiveness stems from the capability to model spatial correlations and temporal dependencies through information aggregation across various graph topologies. Overview and Limitation Despite their promising performance, GNN-based methods are inherently limited. Topology regularized patterns. Restricts GNNs from recognizing topology-free patterns (latent or indirect relationships beyond the immediate graph structure).

MOTIVATION Topology-free patterns vary across different scales . Topology-free patterns exhibit different characteristics at different scales. i.e. R1 is in new city district, benefits from more lanes, overall higher speed while R2 is old city district, has fewer lane, leading to slower speed. But in finest spatial scale, R1 and R2 are considered arterial roads in the city, sharing similar functions. Challenges of Combining Topology-regularized and Topology-free Patterns Topology-free patterns are dynamically changing. i.e., distribution of R1 and R2 have lagged time and are divergence. I ntegration of topology-regularized and topology free patterns lacks a unified schema. Developing a unified schema that can blend these two types of patterns would enable amore robust and complete analysis of graph-structured data. A pivotal in complex systems analysis - explicit graph structures and implicit, non-structural relationships play crucial roles in shaping the overall dynamics of the system.

INTRODUCTION Proposed a generic framework for boosting current GNN-based traffic speed prediction models : Integrating cross-scale topology-free patterns flexibly. A two-stage architecture : Stage I: Topology-free pattern pr eservation. D evelop a Dual Cross-Scale Transformer(DCST): modeling topology-free patterns and dynamics via hierarchical attention interactions across scales in both spatial and temporal domains. Stage II: Topology-regularized/-free patterns i ntegration. D evise a distillation-style integration paradigm: injects topology-regularized into topology-free patterns - the original GNN- based is teacher model and DCST is student model. Contribution Proposed integration paradigm is model-agnostic and can serve as a wrapper to apply to any GNN-based model .

METHODOLOGY Problem Definition Given a graph : is road segment set , N segments . Traffic speed records feature Road segment is associated with a T-step traffic speed series .   Problem: Given the historical observations of period T. Predict future traffic speed in a period H of topology-regularized and topology-free patterns   where is a learnable function to capture the topology regularized patterns and a learnable function to automatically preserve the topology-free patterns without any prior geographical knowledge, and is a learnable integration function for prediction.  

METHODOLOGY Overview Architecture Stage I: Design a Dual Cross-Scale Transformer (DCST) to capture cross-scale topology-free patterns and corresponding dynamics. Stage II: a teacher-student learning framework Integrate topology-regularized/-free patterns, where the current GNN-based methods are taken as the teacher model, and DCST as the student mode. The well-trained DCST learned will generate predictions by considering both the topology-regularized and topology-free patterns. where MAE denotes Mean Square Error, and and are hyperparameters for Soft Loss and Hard Loss.  

METHODOLOGY Main Architecture of Stage 1

METHODOLOGY Dual Cross-Scale Transformer for Topology-Free Patterns Dual Cross-Scale Transformer : E mbedding layer: FC layer to tr ansform data into D-dimensional representations where denotes for the Spatial Transformer, and for the Temporal Transformer, are learnable parameters. MSA(Q, K, V) is multi-headself-attention block and Q, K, V serves as queries, keys and values. M LP represents a multi-layer feedforward block.   Apply multi-head self-attetion for u pdating process of the node representation on the - th layer: “∗” to represent s(spatial) or t (temporal) ; ignore the time subscript “t − T : t” for simplicity.   Prediction layer is a FC layer to generate prediction

METHODOLOGY Dual Cross-Scale Transformer for Topology-Free Patterns Temporal Scale Generation: Construct temporal scales by splitting the observations in terms of different unit time lengths. Larger the length, coarser the scale. Let denote unit time length for the - th temporal scale, the constructed temporal scales:   where , is reshaped as , and and denote weight and bias term.   where is j- th segment of node on - th temporal scale   The representation of j- th segment of node on - th temporal scale :  

METHODOLOGY Dual Cross-Scale Transformer for Topology-Free Patterns Spatial Scale Generation: Split the geospace into grids based on width and length. Nodes are distributed in the grids. The representation matrix in D-dimension. Where , and LN denotes layer normalization, and and denote the weight and bias terms.   The matrix representation for the m- th grid of the -th spatial scale:  

EXPERIMENT AND RESULT EXPERIMENT SETTINGs Dataset: METR-LA (Los Angeles ), PEMS-BAY(San Francisco),and PEMSD7(M)(California) . Baselines: Non-STGNN: Historical Average (HA), LSTNet [1], STAEformer [2], and STID[3]. STGNN: STGCN [4], DCRNN [5], GWNet [6], MTGNN[7], AGCRN[8], GMAN[9], and ASTGCN[10]. [1] Lai, G., Chang, W. C., Yang, Y., & Liu, H. (2018, June). Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval (pp. 95-104). [2] Liu, H., Dong, Z., Jiang, R., Deng, J., Deng, J., Chen, Q., & Song, X. (2023, October). Spatio -temporal adaptive embedding makes vanilla transformer sota for traffic forecasting. In Proceedings of the 32nd ACM international conference on information and knowledge management (pp. 4125-4129). [3] Shao, Z., Zhang, Z., Wang, F., Wei, W., & Xu, Y. (2022, October). Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (pp. 4454-4458). [4] Yu, B., Yin, H., & Zhu, Z. (2017). Spatio -temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875. [5] Li, Y., Yu, R., Shahabi , C., & Liu, Y. (2017). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926. [6] Wu, Z., Pan, S., Long, G., Jiang, J., & Zhang, C. (2019). Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121. [7] Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., & Zhang, C. (2020, August). Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 753-763). [8] Bai, L., Yao, L., Li, C., Wang, X., & Wang, C. (2020). Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems, 33, 17804-17815. [9] Zheng, C., Fan, X., Wang, C., & Qi, J. (2020, April). Gman : A graph multi-attention network for traffic prediction. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 01, pp. 1234-1241). [10] Guo, S., Lin, Y., Feng, N., Song, C., & Wan, H. (2019, July). Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 922-929). Measurement : Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).

EXPERIMENT AND RESULT RESULT – Overall Perfor mance

EXPERIMENT AND RESULT RESULT – Visualization Analysis of Topology Regularized/-Free Patterns Consider the effectiveness of and :  

CONCLUSION Studied the problem of traffic speed prediction. C urrent GNN-based methods exploit topology regularized patterns with graph topology. neglecting topology-free patterns beyond the graph structure. Developed a generic wrapper-style framework to boost current GNN-based methods: Devise a Dual Cross-Scale Transformer architecture: Spatial Transformer : cross-scale topology-free patterns learning. Temporal Transformer: capturing the dynamics . The topology-regularized patterns are integrated into topology free patterns with a teacher-student learning framework. Flexible and can be applied to any current GNN-based method. Summarization