A presentation about using various algorithms to predict covid cases
Size: 2.97 MB
Language: en
Added: May 25, 2024
Slides: 11 pages
Slide Content
COVID-19 Case predictor with Linear regression and other algorithms By Michael Xu
Introduction My program uses a variety of algorithms to predict future weekly COVID-19 case averages in California counties Linear approximation Quadratic approximation Cubic approximation Linear regression The prediction of future cases can help people know which places need the most covid relief and help.
The data Data on Github as part of a global data repository from Johns Hopkins University compiled from a variety of sources (WHO, CDC, etc ) Large csv file detailing the number of covid cases for each county in the country on every day of the pandemic Program uses Bufferedreader and line.split () to read in the comma-separated values Every entry is stored as an instance of the cdate class, and an ArrayList cdata is made to store all the cdates Most of the covid cases are put into 7-day averages, as sometimes there are no day-to-day updates for some counties, leading to inaccuracies with my algorithms
Algorithms Linear Approximation makes a straight line out of two weekly case averages and uses it to calculate the third Difference between consecutive points is the same (same 1st derivatives)
ALGORITHMS Quadratic Approximation Makes a quadratic function from three weekly case averages and uses it to calculate a fourth Difference in the differences of two consecutive points are the same (same 2nd derivatives)
Algorithms Cubic Approximation Makes a cubic function from four weekly averages and uses it to calculate the fifth Difference in the difference of the difference of consecutive points are the same (constant 3rd derivative)
ALGORITHMS Linear regression Takes three to four weekly averages and uses a line of best fit to calculate a next one
Pros and cons Pros: Algorithms can be used to predict Covid cases with relative accuracy, with less than 5% error Can help with distributing Covid aid and relief more efficiently Cons: if other variables are added which cause unexpected changes in the data, my algorithms might not be as accurate and can lead to false conclusions and bad relief distribution
challenges Unfamiliar with how to read in a csv file at first Didn't know what csv stood for I had to average the data into weekly averages because the day-to-day values were inconsistent Making my own algorithms was challenging because I didn't have much experience processing data No linear regression library in Java so I used an open-source class I found online
Conclusion From the data and its analysis, I ranked the approximate accuracies of my algorithms linear approximation and linear regression were usually the most accurate, followed by quadratic approximation, then cubic approximation This is probably because the shape of the covid case curve is mostly a straight line on small portions According to the predictions, covid cases are flattening a bit here and there, but still increasing overall The difference between weekly averages is going down in most counties However, there's still a lot of people that have covid
References Linear Regression Algorithm: https://algs4.cs.princeton.edu/code/edu/princeton/cs/algs4/LinearRegression.java.html Covid dataset: https://github.com/datasets/covid-19 Graphs: https://www.desmos.com/calculator