A coronavirus dataset was recently posted to Kaggle containing confirmed cases over time in different regions. Being as caught up in the news as everyone else, I decided to dig in a little bit. Below, I present a comparison between the confirmed case trends in the US, China, and Italy, as of March 15th.
Before I share the trends, a bit of background. A question that’s on the mind of everyone following the news right now is: “How will the outbreak unfold in my community?” People want to know what they can expect, how well is the virus being contained and mitigated – ultimately, how much worse can they expect it to get before it starts to get better.
Predicting the future isn’t easy, but one approach to estimating how the virus will spread in our own community is to look at how the virus has spread in other communities that are further along in the spread of the virus.
So here’s how I approached it: Let’s assume that the coronavirus spread in the US might mirror the spread of the virus in China or Italy. The confirmed cases data in China begins at a count of roughly 500, so we’ll align the data in US and Italy to that. In other words, we’ll make the (admittedly somewhat clumsy) assumption that “Day 1” of the outbreak is when a community reaches 500 cases, and look at how the virus spreads beyond that point.
The figure below illustrates these trends. We can make a few observations. First, in Mainland China the trend begins to flatten at around Day 25. Second, Italy is seeing a flatter growth trend than China, but at Day ~17 it’s still increasing. Third, the US is at Day ~7 now, and the growth trend so far is a perfect match to the trend in Italy.
Can we expect the next 10 days of coronavirus spread in the US to look like Italy? Maybe. It’s complicated and there are lots of other factors at play, but that’s one crude prediction we could make given the data.
For more detail including the python code used to generate this figure, see my Kaggle notebook (“Tracking US cases against Mainland China trend”).