Sunday, March 29, 2020

Playing with Control Charts for COVID19

I've been playing with Excel and control charts over the last week to get an understanding of where things are headed.  I get the human difficulties with understanding exponential growth, so I wanted to look at the data in a different way.  The italic text is my commentary on the blog.  The normal faced text is what I've been tweeting, the bold text is when.

3/20
If it looks to you like the exponent on infection growth rate is increasing, you are probably right. I just looked at the 5-day LOGEST values (estimate the exponential growth based on last 5 days activity), and the rate has risen 4 out of the last 5 days. Testing just started...

So, this isn't scary to me YET. What it means is not that the real exponential growth rate of infection is increasing, but rather that the rate of our knowledge of exponential rate is increasing. But more testing is still needed to get the numbers to settle down ...

There's gonna be lots of numbers for the epidemiologists and hyper-mathy folks to study RE impact of testing volumes (see ) on estimates of real growth rate when this is over. I don't recall signing up for that clinical trial though.

I suspect there they may even be some new hyper-mathy stuff that addresses statistical process controls applied to exponential growth curves. I kinda faked a control chart with my 5-day moving estimator for the growth rate. There should be some useful signal there ...

I know enough math to know that something should work, maybe even already exists (but maybe and even possibly not yet). I also have enough math to know it's not going to be me who actually proves it.

I've seen enough in that one graph to tell me what I already knew, the real rate is worse than a lot of people thought, or the current graphs show. The good news is that we are seeing a correction now. The bad news may be that it will take a longer time than we want to adjust.

But I'm still not panicking yet.
And neither should you.

3/27 (early morning) 
I just updated my spreadsheet. The good news is that the infection rate is declining, and has been for eight days in a row. However, it's also stabilized, which means it might be getting ready to inflect the other way.


If you look at the graph above, you see eight points in a row declining.  That's enough to trigger an alert in a statistical process control environment (actually 6 or 7 depending on whose set of controls you use).

3/27 (later in the day)
OK, my chart gets better. I can compute New Cases / Total Cases, which should be a constant for an exponential function. In fact 1/(1-New/Total Cases) recovers the base of it. Here's the new control chart on the base value, I find it more helpful.



This above graph was inspired by this great video on plotting exponential data.  It is worth including in this post, just so you can see what I'm doing is valid.  I still like the orientation of my approach better because you don't have the overwhelming sense associated with a graph plotted on logarithmic axes gives you.  It's just a little number bouncing around telling us how fast cases are growing, without really making it apparent how fast an exponential works.  Since I'm looking at that growth everyday, and listening to my family's response to it, I get the impact of it.

3/28
Today's update is promising. The downward trend in growth shows we are making progress. Note that it took two weeks to get farther than 1 standard deviation in the right direction. Maybe by tomorrow we'll be at 2.



Several additional notes here:
  1. Remember that set of 8 control points going down?  How come I don't have that once I switched to this format?  That's because I'm no longer reporting the 5 day average, just the current days rate.  See that dip on 3/21?  It gets included in the 5 day average in the previous format, which hides the current day's signal.  I kept the 5 day deviation because it tells me something about how the rate is settling in, or not.  So, now we have 6 on the same side of the average line, that's another kind of signal in statistical process controls (or 7 or 8, again, rules vary).
  2. Statistical process controls are for managing stable processes.  When the process is changed, you have to adjust the controls your are using.  I'm thinking that I should be marking the graphs above on a weekly basis with the per week mean and standard deviation lines so we can see HOW the process is changing over time.
  3. There's no API to work with the data I'm using, but I there's a way to scrape it from the page I'm recording numbers from.  I haven't bothered because it's licensed data.  What I've done above falls (as best I can tell) under fair use.  Scraping their page dynamically isn't what I'd call fair use.  If they do publish an API for this data set, well then I might take a crack at doing something with this little gem of a gist.
  4. There IS a source of publicly available data  from John's Hopkins that does work via APIs.  The API is simply: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/{date}.csv





0 comments:

Post a Comment