Realtime anomaly detection in surveillance data.pptx
KingrockPeter
15 views
18 slides
Jul 16, 2024
Slide 1 of 18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
About This Presentation
The complexity of both natural and technological systems has reduced the ability of humans to monitor, detect and fix anomalies before they occur and in real-time. In this talk, I examine types of anomalies and the different machine learning methods that can be applied to detect the anomalies in tim...
The complexity of both natural and technological systems has reduced the ability of humans to monitor, detect and fix anomalies before they occur and in real-time. In this talk, I examine types of anomalies and the different machine learning methods that can be applied to detect the anomalies in time series signals. Further, I present results in detecting anomaly in epidemiological data with noise and uncertainties. I then extend the discussions of the methods and results to performance and security anomaly detection in cloud computing environments. I conclude that despite the source of a signal that is being analysed for anomaly detection, the concept of outliers, weak signals and noisy signal processing can be combined with different model ensembles to achieve timely and robust detection of abnormal signal changes just before they occur.
Size: 39.85 MB
Language: en
Added: Jul 16, 2024
Slides: 18 pages
Slide Content
Real-time anomaly detection in disease surveillance data Dr. Peter Eze [email protected] (with Ivo Mueller, Nic Geard and Iadine Chades ) Research Fellow, AI for Decision Support School of Computing and Information Systems Faculty of Engineering and Engineering Technology University of Melbourne, Australia 23 rd May, 2022
Background and Problems Overtime, endemic diseases get neglected despite collected surveillance data, which together with other factors increase the time to disease elimination. Hence, endemic diseases require automated anomaly detection to trigger investigations and interventions. A unique interplay of the biological, environmental and social factors that allow malaria to flourish Disease year infections Deaths Malaria 2018 228 million 405,000
Questions In particular: How to automatically detect anomaly in reported malaria case data? How to provide possible epidemiological interpretations for detected anomalies How to use the interpretations to stratify risk and ensure dynamic spatio -temporal intervention targeting? What patterns can we find from malaria surveillance data?
Anomalies (or outliers) are observations that deviate from current expectation as to arouse suspicion that it was generated by a different mechanism . ( Hagemann & Katsarou ; 2020) (P. Bhattacharjee , A. Garg & P.Mitra ; 2021) Time
We transformed the Brazilian Amazon malaria time series data to help detect anomalies in testing and incident rate Proportion of positives Positive cases Number of tests negatives Source of Dataset: https://www.synapse.org/##!Synapse:syn21555933 Data source and features for anomaly detection
We chose the Para state in Brazil and stratified the data into 13 health regions in the state Proportion of positives = Time series data stratified by health regions
Time series data are composed of trends, seasonality, holidays and error terms (irregularities) y(t) = g(t) + s(t) + h(t) + e(t) g(t) = trend (changes over a long period of time) s(t) = seasonality (periodic or short term changes) h(t) = effects of holidays to the forecast e(t) = error term (the unconditional changes that is specific to a circumstance) Under the additive modeling approach, a time series y(t) is given as : (S.J Taylor and B. Letham , 2017) Most models represent different aspects of time series well Methods
Discovering patterns and anomalies using multiple machine learning algorithms Facebook Prophet LSTM Methods
Non-parametric models determine anomaly based on locally fitted models using weighted local data points Local linear/non-linear regressions Locality is defined within a sliding window of length, n . An upper and lower bound tolerance limit Limits defined by either confidence level or number of standard deviations ( n_sigma ) . Data points that lie outside of the boundary is detected as anomaly. Confidence level : 0-1 n_Sigma ( σ) : 1-6 The criteria for choosing the exact value for these parameters require expert advise on the health capacity and risk tolerance of a health administrative region within the time period
LOWESS (locally weighted scatterplot smoothing) is a non-parametric model that assigns higher weights to data points closer to the point being fitted in the model Where d is the distance of a given data point from the point on the curve being fitted The weight, w of a point x for fitting a local curve is: The locality of a curve is defined by the length of the sliding window, n .
The smaller the value of n-sigma , the higher the number of anomaly per time window. n_sigma ( σ )=1 produces more number of outliers than n_sigma ( σ )= 2 or 3 . Setting n_sigma ( σ ) will be determined by the health capacity of a region. This method assumes that health capacity closely follows proportion of positive cases. Each health region would adjust capacity at the end of each time window. Results
Given the same n-sigma (tolerance) for all health regions they will experience anomaly at different times. ARAGUAIA at times 35 and 131 experienced Flareup at the time when BAIXO and CARAJAS were experiencing Decline . Hence, at those times, ARAGUAIA would require to be targeted but still the success in BAIXO and CARAJAS will also need to be investigated to ascertain the cause. Results
But point anomaly may not be reliable to change policy or commission an investigation State-wide, there is a consistent case decline for 6 months. The ARAGUAIA also follows the state trend. However, CARAJAS and TOCANTINS has consistently flared-up over the same 6 months. Looking at the state-wide progress only, elimination may not happen. The incidence rate in TOCANTINS is up to 40%.
Limitation of traditional non-parametric LOWESS Small increase in incidence rate per window may sum up into undetected large outbreaks
Solving the Drift Problem Looking back n-lags or time steps to determine true trend while incorporating uncertainty. Compute anomaly only within the sliding window Train a model that detects baseline normal data and flags others as anomaly. Ongoing/Future Work
With the rising threats of pandemics and climate change, global attention and funding for mitigating the inequitable burden of malaria is more necessary than ever. Because data for endemic diseases such as malaria are not analysed by humans on daily basis, automated methods can help to provide proactive decision support. We have developed a tool to help identify appropriate anomaly thresholds for health regions: https://github.com/KingPeter2014/Anomaly_in_malaria_surveillance_data Summary
T. Hagemann and K. Katsarou . A Systematic Review on Anomaly Detection for Cloud Computing Environments .2020. doi : https://doi.org/10.1145/3442536.3442550 Understanding LSTMs. https://colah.github.io/posts/2015-08-Understanding-LSTMs/ B. Agrawal, T. Wiktorski & C. Rong . Adaptive Real-Time Anomaly Detection in Cloud Infrastructures . 2018 1st International Conference on Data Intelligence and Security (ICDIS). J. Clark, Z. Liu and N. Japkowicz , " Adaptive Threshold for Outlier Detection on Data Streams ," 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) , 2018, pp. 41-49, doi : 10.1109/DSAA.2018.00014. S.J Taylor and B. Letham . Forecasting at Scale . https://peerj.com/preprints/3190.pdf , 2017. SIVEP-Malaria database . IntegratedDataset.csv: Derived from Brazilian epidemiological surveillance system of malaria (2020). https://www.synapse.org/##!Synapse:syn21555933 . References