EXPERIMENT AND RESULT EXPERIMENT SETTINGs [1] Venkatramanan , S., Chen, J., Gupta, S., Lewis, B., Marathe, M., Mortveit , H., & Vullikanti , A. (2017, August). Spatio -temporal optimization of seasonal vaccination using a metapopulation model of influenza. In 2017 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 134-143). IEEE. [2] Werbos , P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550-1560. [3] Cho, K., Van Merriënboer , B., Gulcehre , C., Bahdanau , D., Bougares , F., Schwenk , H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. [4] Hochreiter , S., & Schmidhuber , J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. [5] Li, Y., Yu, R., Shahabi , C., & Liu, Y. (2017). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926. [6] Wu, Y., Yang, Y., Nishiura, H., & Saitoh , M. (2018, June). Deep learning for epidemiological predictions. In The 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1085-1088). [7] Lai, G., Chang, W. C., Yang, Y., & Liu, H. (2018, June). Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval (pp. 95-104). [8] Yu, B., Yin, H., & Zhu, Z. (2017). Spatio -temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875. [9] Deng, S., Wang, S., Rangwala , H., Wang, L., & Ning, Y. (2020, October). Cola- gnn : Cross-location attention based graph neural networks for long-term ili prediction. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 245-254). [10] Gao, J., Sharma, R., Qian, C., Glass, L. M., Spaeder, J., Romberg, J., ... & Xiao, C. (2021). STAN: spatio -temporal attention network for pandemic prediction using real-world evidence. Journal of the American Medical Informatics Association, 28(4), 733-743. Baselines: Mechanistic causal models: SIR and PatchSEIR [1]. Statistical models: Autoregressive (AR) and Autoregressive Moving Average (ARMA). Deep learning models: RNN [2], Gated Recurrent Unit (GRU) [3], and LSTM [4]. STGNN SOTA models: DCRNN[5], CNNRNN-Res[6], LSTNet [7], STGCN[8], Cola-GNN[9], and STAN[10].