Pages

Tuesday, May 18, 2021

SANER Scales

This week's HL7 Connectathon 27 SANER Track is all about scalability.  For this Connectathon, we brought over 2 million clinical resources worth of test data for 2600+ patients, transported from Synthea's 100K patient COVID-19 test data set from Massachusetts into Cook County, Illinois, transposed in time to January of 2021 instead of March of 2020, across 15 hospital locations.  Because these patients were transported from the past, they aren't vaccinated, don't have access to Remdesivir, and probably too many are taking HCQ, but it was a good enough set to test the scalability of SANER.

My chief concern was computing MeasureReport on a daily basis, overnight, and whether that was going to be a huge data load for hospitals.  As it turns out, I'm able to put those concerns to rest (for me at least).

We computed 465 MeasureReport resources, one for each of 15 hospitals over the 31 days of January, using realistic hospital loads drawn from current statistics reported by the Illinois Department of Health.

Each measure report communicated around 240 (average) supplemental data elements (FHIR Resources) providing additional data to support stratification and analytics, which about 40 times what would actually be needed if just communicating metrics.

All told, this represented about 465Mb of uncompressed, pretty printed FHIR Resources in XML format, or about 23Mb of data compressed using GZIP.

Best, yet, I was able to collect data from the cloud, compute measures, store them locally and transmit all the data for all days for all hospitals to a server in the Cloud in about 11 minutes on a pretty high-end Dell Laptop (6 cores, 3.6Ghz Burst, 32Gb of RAM).

I've still got some bugs to look into which might slow things down once fixed (mostly on stratification), but with 12 virtual processors running, this load barely touched my machine.  Overall, CPU utilization was at a pretty steady 20%, and network bandwidth also nowhere near saturated.  My home office gets about 150-200Mb down, 20Mb up, I barely touched it.

I can process the data for a single hospital day in 10-20 seconds depending on the number of patients.  It's realistic to assume that more frequent, semi-real-time situational awareness measure evaluation and reporting is not only feasible, but also practical.

Most of the measures we have examined are written in a form that supports daily computation.  We'll probably have to experiment with measures designed for more frequent evaluation.

   Keith

* We keep hemming and hawing about near-real-time measures, and I've finally decided to call them semi-real-time, to clarify that they could be several minutes out of date, but still orders of magnitude better than daily.  With enough concentration, semi-real-time could in fact become near-real-time (so long as the data sources themselves are frequently updated).


After doing some more tweaking I'm actually:
  1. Overwhelming my server so hard it requires a restart to come back to life.  I really need to get my server set up in a production ready way.
  2. Running a bit slower but getting more data (so now it's taking about 28 second a hospital on average).

No comments:

Post a Comment