Covid & Machine Learning

Predicting Covid-19 Cases and Deaths with Machine Learning

For my first-year intro to Machine Learning course, my team and I created various machine learning models to predict Covid-19 related deaths and cases for any given country.
Link to Project Report

Overview:
To accomplish our task, we collected time-series data from the Johns Hopkins Coronavirus Resource Center (direct link to data here), Oxford University's Covid-19 Government Response Tracker (direct link to data here), and selected Covid-19 relevant health and economic datasets from the World Bank Open Data platform. The Johns Hopkins data provided us with daily statistics on Covid-19 confirmed cases, recoveries, and deaths on a national level and the Oxford Government Reponse Tracker also provided us with daily statistics pertaining to countries' policy measures to contain the spread of the virus. The policy measures are comprised of eight containment and closure measures decisions, four economic measures, and five health system measures, which are either categorical or continuous type and include school and workplace closing, stay at home requirements, Covid testing, and contact tracing. The World Bank datasets are national-level statistics from the latest year available, and include GDP, Life Expectancy at Birth, Physicians per 1,00 individuals, Diabetes Prevalence, and Health Expenditure per capita.
Methodology:
We used regression models to make predictions for new cases and deaths 12 days into the future given government policy response and health and economic indicators. Regression models used include linear, ridge, lasso, decision tree regression, and random forest regression. We also tested treating this problem as one of classification and fit a Multilayer Perceptron Neural Network to the data, classifying every future country-day observation to a discrete number of cases previously found in the data. We used temporal holdouts to train our model as we are working with time-series data and Time Series Nested Cross Validation to validate our model. For more detials about our methodology, see the Machine Learning and Details of Solution section in our project report.
Evaluation and Results: We used the Root Mean Squared Error (RMSE) metric to evaluate our models for both outputs, confirmed cases and deaths. As is clear from both charts below, the Decision Tree Regression and Random Forest Regression out performed all other models by a significant margin.

Project code can be found here.
Project teammates: Diego Diaz and Piyush Tank

Return to main

Data Viz & Extras:

Government Response Index vs Confirmed Cases GovRspIdx

Predicting Confirmed cases in Spain using Linear Regression and Neural Network GovRspIdx

Models sometimes varied greatly within one country case, as with Austria GovRspIdx

Example countries successfully containing virus with varying Policy Stringency Indices GovRspIdx