Thursday, May 21, 2020

How to plot COVID-19 epidemic curves with R

It's been a while since I have had time to update my blog. Given the current situation of the COVID-19 outbreak, I guess it'll be useful to provide a step-by-step guide on how to extract and plot those epidemic curves that everyone seems to be plotting, with the help of R. To run these code, you need to have the latest version of R and RStudio installed.

Note that since I write these codes for one of my current projects, not all of them are crucial in creating the plots.
  1. First, we need to load the following packages and define a function to compute differences and logarithmic differences (this is to compute the growth rates):
  2. Then, we extract the latest data from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)'s Github repository. There are three main variables of interest: The daily cumulative numbers of confirmed cases, recovered cases and death tolls:   
  3. Next, we can do some reshaping the panel data from wide to long format:  
  4. The next step is to compute the growth rates of these variables:  
  5. I then create a nice theme for our ggplot2 plots:  
  6. Now we can begin plotting. The first thing to do is to get the data for some representative countries. I include the big players that are featured heavily in the news and two developing ones (my home country and its neighbor, can you guess which one I am from?). You can modify this country list anyway you like:  
  7. Then plot the daily cumulative epidemic curves using the confirmed cases and deaths. This result in the following plots:  
  8. And finally, we can plot different varieties of these curves using the changes in cases and in death tolls:  
That's it for now folks. If I find some more time, I'll discuss these plots and perhaps update them with new features. If you have any questions or comments please leave them below or contact me via email.