Predicting Covid-19 Cases and Deaths with Machine Learning
For my first-year intro to Machine Learning course, my team and I created various machine learning models to predict
Covid-19 related deaths and cases for any given country.
Link to Project Report
To accomplish our task, we collected time-series data from the Johns Hopkins Coronavirus Resource Center (direct link to data here), Oxford University's Covid-19 Government Response Tracker (direct link to data here), and selected Covid-19 relevant health and economic datasets from the World Bank Open Data platform. The Johns Hopkins data provided us with daily statistics on Covid-19 confirmed cases, recoveries, and deaths on a national level and the Oxford Government Reponse Tracker also provided us with daily statistics pertaining to countries' policy measures to contain the spread of the virus. The policy measures are comprised of eight containment and closure measures decisions, four economic measures, and five health system measures, which are either categorical or continuous type and include school and workplace closing, stay at home requirements, Covid testing, and contact tracing. The World Bank datasets are national-level statistics from the latest year available, and include GDP, Life Expectancy at Birth, Physicians per 1,00 individuals, Diabetes Prevalence, and Health Expenditure per capita.
We used regression models to make predictions for new cases and deaths 12 days into the future given government policy response and health and economic indicators. Regression models used include linear, ridge, lasso, decision tree regression, and random forest regression. We also tested treating this problem as one of classification and fit a Multilayer Perceptron Neural Network to the data, classifying every future country-day observation to a discrete number of cases previously found in the data. We used temporal holdouts to train our model as we are working with time-series data and Time Series Nested Cross Validation to validate our model. For more detials about our methodology, see the Machine Learning and Details of Solution section in our project report.
Evaluation and Results: We used the Root Mean Squared Error (RMSE) metric to evaluate our models for both outputs, confirmed cases and deaths. As is clear from both charts below, the Decision Tree Regression and Random Forest Regression out performed all other models by a significant margin.
Project code can be found here.
Project teammates: Diego Diaz and Piyush Tank
Return to main
Data Viz & Extras:Government Response Index vs Confirmed Cases
Predicting Confirmed cases in Spain using Linear Regression and Neural Network
Models sometimes varied greatly within one country case, as with Austria
Example countries successfully containing virus with varying Policy Stringency Indices