Thursday, June 25, 2020

Zoombie Jamborie

A camp song for 2020. If you've never heard Zombie Jamborie, on which it's based, here's one of the many recordings.

Well, now, back to back, telly to telly,
well, I don't give a damn 'cause I've done three already,
back to back and telly to telly at the zoombie jamboree
(Now hear the chorus)
back to back and telly to telly at the zoombie jamboree

zoombie jamboree took place with an at home functionary (Why?)
zoombie jamboree took place to be revolutionary
zoombies from all parts of the island. (Where?)
Some of them are great calypsonians. (Some.)
Since the season was COVID time they got together in groups of nine.

Oh, what ya' doin'?
Well, now, back to back, telly to telly,
well, I don't give a damn 'cause I've done six already,
back to back and telly to telly at the zoombie jamboree
(You can feel that)
back to back and telly to telly at the zoombie jamboree

One zoombie's phone wouldn't behave
Her echo's a painful sonic wave
In the one hand she's holding a quart of wine,
  in the other she's pointing she can't hear mine
I says, "Oh, no, my turtle-dove, I think you've got a bug."
Well, believe me folks, I had to run. (Why?)
four hours of a zoombie ain't no fun! (Don't knock it!)

Oh, what you doin'?
Well, now, back to back, telly to telly, well,
I don't give a damn 'cause I've done ten already,
back to back and telly to telly at the zoombie jamboree
(Oh, what a good game)
back to back and telly to telly at the zoombie jamboree

Right then and there she chats her tweet
"I'm a-going to try again, my sweet
I'm gonna call again and then retry."
Then says "Ok, I'm back, you guys!"
"I may be lyin' but you should see (What?)
My slides on this here zoombie." (Blah!)
Well, I never seen those slides in Life, I
crashed my zoombie without WiFi? (Yes!)

Well, now, back to back, telly to telly, well,
I don't give a damn 'cause I've done scores already,
back to back, and telly to telly at the zoombie jamboree (You're all alone, you know)
back to back, and telly to telly at the zoombie jamboree

Wednesday, June 17, 2020

Interpreting a Negative, part 2

likelihood ratio
In Interpreting a Negative, I talked about my lack of success in interpreting my negative COVID-19 test result.  I've made a bit more progress, although I haven't yet gotten back a response from my provider on my two questions.  For what it's worth, I learned a lot about this in my MBI degree program, but since I don't deal with test ordering or interpretation on a routine basis, I know about the math (this is a great article that can help you with the math in this post), but haven't had any need for application of it since graduate school.

You generally hear about the accuracy of laboratory tests used in the US based on reporting sensitivity and specificity.  These values help providers evaluate the likelihood of a true positive or true negative.  These values aren't generally included in the laboratory result, but you can often get to them by knowing who did the test (the lab), and what test equipment they used, or by looking for the LOINC codes (if you know where to find them), and traversing what that tells you back to the laboratory equipment.

You might also hear about the positive and negative predictive value (or agreement), abbreviated PPV/PPA and  NPV/NPA respectively.  This is what the COVID-19 test used on me reports to the FDA.  It compares the results from the Aptima test to those of another test (Panther Fusion) made by the same manufacturer (which could introduce another source of error, but according to the manufacturer's testing, that test is perfect).

That's based on the manufacturer's testing results, and doesn't necessarily account for real world implementation.  Variations in workflow, quality, et cetera, and assumptions under which the test is performed can have an impact on "accuracy".  In the case of COVID-19 laboratory tests, you can find the results of other's evaluations (e.g., one done by Norwell Health Laboratories).  For the Aptima test, there's one of those in the second row of the table found  at the link.  FWIW: That same lab also analyzed the reference test (Hologic Panther Fusion) used in the Hologic report on the Aptima.

As a patient, the first question I have from a test result is "How should this result affect my behavior?"

  • For a positive, do I seek treatment, further testing, et cetera.
  • For a negative, does that mean I don't have a problem, or should I seek further testing later (and if so, when)?

I won't go into the first issue for positives in detail.  I will say that both my wife and I actually decline certain diagnostics because false positive rates are high enough, and the therapeutic value of a true positive result is of limited value at our ages.

There are four different kinds of results that a laboratory test can produce:
True Positive: A positive result when in fact, you actually have a disease
False Positive: A positive result when in fact, you do not actually have the disease.
True Negative: A negative result when in fact, you do not actually have the disease.
False Negative: A negative result when in fact, you actually have the disease.

You can generally find these values in those tables I referenced.  For two tables I referenced, the values I have are:

Result TypeHoligic Aptima ReportedNorthwell Health Laboratories
True Positive5071
False Positive10
True Negative5475
False Negative04

I can use these numbers to compute two other numbers, called the positive likelihood ratio (LR+), and negative likelihood ratio (LR-) using the following formulae:

Sensitivity = TP / (TP + FN)
Specificity = TN / (TN + FP)
LR+ = Sensitivity / (1 - Specificity) = (TP / (TP + FN)) / (FP / (TN + FP)) 
LR- = (1 - Sensitivity) / Specificity = (FN / (TP + FN)) / (TN / (TN + FP)) 

Result TypeHoligic Aptima ReportedNorthwell Health LaboratoriesBoth
Sensitivity50 / 50 = 100%71 / 75 = 94.7%121 / 125 = 96.8%
Specificity54 / 55 = 98.2%75 / 75 = 100%129 / 130 = 99.2%
LR+100% / 1.8% = 55.694.7% / 0 = ∞96.8% / 0.8% = 121
LR-0% / 98.2% = 05.3% / 100% = 0.0533.2% / 99.2% = 0.032

As you can see, I also combined both evaluations into a composite result.

With the LR- value, I can now estimate the probability that my negative result is correct, but I need ONE more number.  That's the pre-test probability I had COVID-19.  There are a lot of different ways that I could come up with that number.  The most obvious one is to assess it based on the prevalence of disease in my area.

OK, so now let's think about this: Do I consider my country?  My state?  My county?  My town?  My region?  Where would I even find this data?  I might start with my state's dashboard.  But that doesn't really say anything about disease prevalence, just a report of increasing cases / 100000 (and that data is out of date for the actual prevalence, b/c COVID has an incubation period of about 5-14 days).

So back searching I go, and I find this paper on estimating prevalence, and it references a few others.  Since I live within Massachusetts, but shop in Rhode Island (b/c it has the closest grocery stores), I might want to consider both regions.   I can read off results that tell me I need to look at values for prevalence somewhere between 2 and 4%.  Because this paper reflects a novel method (e.g., untested), I should go look elsewhere too. An alternative model suggests multiplying the reported case rate by 10.  That would give me 14.5% (100158 * 10 / 6.893M) for my state, or about the same for my county.

Now, let's plug those numbers in and math some more using these equations:
Pretest Odds = Pretest Probability / (1 - Pretest Probability)
Post-test Odds = Pretest Odds * LR-
Post-test Probability = Post-test Odds / (1 - Post-test Odds)

Post Test
Post Test
2% 0.020 0.053*0.020 = 0.0010 0.1% 
4% 0.042 0.053*0.042 = 0.0022  0.2%
14.5% 0.170 0.053*0.170 = 0.0091 0.9%

You'll note I didn't bother computing the results based on the Hologic reporting, because according to the manufacturer, it doesn't produce false negatives, and so I'd just get 0% anyway.  I also didn't bother computing the results based on both because the Norwell Health Laboratories Reported results give me an upper bound.

What this tells me is, based on whatever prevalence data I believe in (more likely the higher number), I have less than 1 chance in 100 of it being wrong.  That's what I wanted to know in the first place.

Without the pretest probability, the lab cannot possibly report the likelihood of the result being incorrect.  Other variations in testing might effect this particular labs "accuracy" in reporting on the test, and of course, I don't have any way of knowing that information.  But using this math, I could say that even if there performance of the test had twice the false negative rate as the Northwell reported results, the chances that my test result were a false negative were less than 1 chance in 50.

I'm pretty good with that.  Applying what else I know (including other influenza-like but not COVID-like symptoms), I can pretty much assume that I'm not COVID-19 positive pretty reliably.

Why spend all of this time figuring out what others might just assume?
There are three reasons:

1. Because it's me, and I want to know.  Is there a reason I shouldn't?
2. Because I'm curious.
3. Because I understand that these tests have been released for use without the same level of testing that happens under normal circumstances, because some of the tests (e.g., the Abbott rapid test) have come under fire b/c of their accuracy, and as a result, I want to understand what risks I'm taking not just on behalf of myself, but also my family based on how I behave after getting this result.


P.S. There's a thing called a nomogram depicted above that can help you do all this math geometrically.  All you need to know is pretest probability and likelyhood ratio, that a ruler will compute the post-test probability for you.  I had one somewhere (it came with one of my Informatic's text books), you can print one out.  But first you needed to know how to get the key value, the likelihood ratio.  Using that, and the likelihood ratio for a positive result (55-121), if that had been my result, my post-test probability would be somewhere between 70 and 100%.

Monday, June 15, 2020

Interpreting a Negative

No, this is not a recap about reporting negatives in CDA (or in FHIR for that matter), instead, this is about how to interpret (or fail to interpret or explain how to interpret a negative test result).

If you haven't been on a call with me recently, it might be because I've had a flu.  What type of flu remains to be determined, though I'm fairly sure it's NOT COVID-19.  How do I know?  Well, I got tested.  Why am I not sure?  Because negative test results don't necessarily mean I don't have COVID-19, it just means that what I have is not detectable as SARS COV-2.

Wednesday morning I woke up fuzzy, feeling feverish (but no temp), and generally out of sorts, after having trouble sleeping.  I contacted my doctor (to apologize for missing a work related meeting), and he suggested I get tested (even though my symptoms were not specific to COVID-19).  So, I went and got tested at a drive up testing site at a nearby mall. Here's what it looks like.

It takes all of five minutes to take the sample.  It's not pleasant to have a swab that far back in your nasal cavity, but it's not really that painful either, just uncomfortable.  I wouldn't do it for fun.  I also got about 10 printed (full color) pages of stuff about the test, COVID-19, what to do if positive, et cetera, in three languages (looked like Spanish and Portuguese to me), reproducing two different information packets from CDC with overlapping information.  Stuff which I've already seen a dozen times or more.

I got a very nice recorded phone call the next morning after my test, telling me the result came back negative, and how I should treat that information.  But it was the usual, extremely digested baby-food level of information that is normally given to patients.

What I basically wanted to understand was, given the test reported that I was negative, what was the likelihood that result was wrong (a false negative).  So I went looking for more information.

So after the phone call, I looked at my personal e-mail.  I had received an e-mail from Quest (the lab) telling me that I had a new result that morning.  The e-mail showed up at 5am, about 4 hours before the phone call.  I checked the lab result, on Quest's portal, and felt not much more educated.  
It could have been from one of two different testing systems, Aptima and Panther (both from Hologic).  There were four different links (one for physicians and one for patients) to data about the testing system.  It was a typical reproduction of what is reported to FDA, so NOT that useful.  And the reported test result in the portal was again, standard pap (as in food for children).

Of course, being done by Quest, and having signed up, it was also available in my iPhone's Apple Health app, in FHIR format. The sad thing there is that the only value given for code was a string (SARS COV 2 RNA(COVID 19), QUALITATIVE NAAT), no LOINC code, nothing telling me much about the test itself.

Eventually, I determined from Quest (by zip code and lab) what the right LOINC code might be (I think the test was from Aptima), but am still uncertain, because it's reported based on where the test was sent, and I honestly am not certain (it could have been performed in MA or NY), because nothing in the lab report tells me that.  

There's data also available in MyChart, with actual links.  Though I cannot copy and past links from MyChart, nor can I click through them (it's a stupid s**t safety restriction that makes sense ONLY when you don't know how to implement a whitelist for websites).

So, next up I start looking at studies around false negatives for COVID-19, and actual sensitivity/specificity values for the test equipment in use based on real-world testing.  And honestly, I'm still feeling uninformed.  What I really want to know is the NPV, given that I have a negative result, what's the likelihood it's a true negative.  You cannot actually compute the NPV from Sensitivity or Specificity, you have to have the raw data to get that (or an estimate of disease prevalence).  Here's a good diagram showing all the details 

Unfortunately, none of the studies I dig into can really give me these details at a level that makes me feel comfortable interpreting an answer.

So, I asked my doctor the following two questions:
A) What's the NPV for this particular test (Hologic Aptima)?
B) What's your assessment of my pre-test probability, and how did you come up with that value?

We'll see how he answers and where I go from here.  Basically, my assumption based on symptoms is that I had a gastro-intestinal sort of flu infection rather than COVID-19.  Oh yeah, I'm still feeling icky, so trying to figure this stuff out while unwell is not my favorite past-time, nor the best time for me to figure it out.


Tuesday, June 2, 2020

Towards of Common Lexicon for Requirements

Interoperability requirements are notoriously unbalanced, and the explanation for that can be readily found in the oft-quoted Postel's law.

Senders (clients) should have care, receivers (servers) should be forgiving.  This creates a non-uniformity in specifying requirements within an interchange.

Those of us who work in this space are quite commonly familiar with the SHALL/SHOULD/MAY requirements language of RFC-2119.  Some attempts have been made to get that to address the reality of interoperability (see RFC-6919), but humor aside, there's a real challenge here for interoperability experts.

We want to be able to simply express for an exchange the following notions:

  1. The server must support this parameter/field, the client is free to use it as it wishes.
  2. The client must/is recommended/may send this field, the server is free too use it if need be.
  3. The server requires this of the client, it must comply.
  4. The server must support this to address deficient clients, but clients claiming conformance must do it differently if they expect their claims to be taken seriously.
  5. If the client has this data, it must be sent in this field to the server.  If the client doesn't have it, it can be omitted (or marked as not available, null, or in some other way).
Various different attempts have been made to address this:
Version 3 uses the terms mandatory, required and optional.  IHE has defined R2 (required if known), but this terminology is neither ubiquitous, nor readily understood.  FHIR includes "must support", but requires an implementation guide to define the meaning of it in practice.

Data sent in these transmissions have different permanence.  In some cases, a sent field is simply acted upon (or not) by the server, depending on it's function and is subsequently discarded, in others, it is forever indelibly scarred into the ether as an anomaly in the space-time continuum that a forward observer can reliably detect (i.e., written to storage).  And there are the in-between cases of these.

Cardinality constraints have often been looked at as a way to address this issue.  When the minimum cardinality is 0, an item it optional, when greater than 0, required.  But that fails the case of sometimes I must send it (because I know), but when I don't know I don't have to send it.  The data has utility, and use of it can affect outcomes significantly.

The value in an exchange of defining such a common lexicon for describing interchange requirements would be that it would enable the requirements to be readily determined of the data element from the perspective of either the client or the server.

I don't have the words for these concepts, but to explain the value for them, I have about 300 lines of CapabilityStatement generated for the SANER client, and another 300 lines for the server.  There are 30 lines of differences between the two, and these are all about SHALL/SHOULD/MAY sort of requirements, addressing this disparity in viewpoints.

Which leads me to an exercise to be performed, which is to take the 5x5 grid of SHALL/SHOULD/MAY/SHOULD NOT/SHALL NOT requirements for a client against those of a server (and the "if known" variants), determine how many of these are meaningful, and from their, determine what those meanings actually say.

It may well be that as for time, there's an algebra to these client server relationships, and a way to describing them meaningfully.  And as in all relationships, I'm sure I'll find some cases that are simply disfunctional.


Friday, May 8, 2020

Am I crazy, or just in SANER?

What does it mean to be in SANER?

  1. Are you overwhelmed by calls and crises and yet have a burning desire to help?
  2. Do you believe that something could be done differently even in the midst of a crisis?
  3. Do you think we could actually roll something out before the end of the year?  Before the next flu season?
  4. Do you think technology might have something to offer against COVID-19?
  5. Do you want to set digital paper on FHIR?
  6. Do you have public health portalitis (inflamation and irritation caused by too many portals calling themselves connectivity solutions)?

Then it my well be that you are in SANER too.

Tuesday, May 5, 2020

It's not always about the EHR

This Fortune article was an interesting read.  But it doesn't tell the whole story, and honestly, I don't think everyone knows the whole story, and most don't even know half of it.

Before SARS Corona Virus 2, a.k.a, COVID-19:

  1. There were no codes for COVID-19 (the disease) or SARS-COV-2 (the Virus), 
  2. There were no devices or test codes for now more than a dozen tests to detect the disease or the virus, 
  3. Nor were there result codes to encode the virus, 
  4. Nor did value sets exist identify symptoms of the disease.

Once there were codes, there were also hospital systems under a lot of stress, so even if an EHR vendor had a vocabulary or system update, hospital operations manager's weren't going to authorize a configuration change to the hospital's critical systems in a time of emergency.  And the health information management and technology professionals?  They were inundated with critical priorities to do things like enable tele-medicine applications, and develop processes for manual communications with public health. It wasn't about the EHR, it was about lack of preparation.

Collecting data for a research protocol is an involved process.  It's not just "give me the data over an API", it's a much more formal Extract/Transform/Load (ETL).  Batch APIs for FHIR exist, and are designed in fact to facilitate such efforts, but are honestly less than a year old, and haven't been widely deployed in existing systems. Where they are available in newer versions, see above notes on "not going to authorize a change in critical Health IT systems DURING a crisis".

Some systems already have the capacity to normalize and collect data for research.  The "All of Us Research" program from NIH has been going on for more than a year, numerous health systems are sending patient data to a centralized, NATIONAL research repository housed at Vanderbuilt university (or more accurately, in a cloud they control).  But this is a research protocol, there's a consent process, and not everyone joins.  To date, about 1 in 1000 people in the US have agreed to participate, about about 3/4 of them have completed the initial program steps (some of which require an in-person visit and have been suspended until further notice until the crisis abates).  That's a lot of data, but today 4 times as many have tested positive for COVID-19 than are participating in that research program.  Research is important, treatment for COVID-19 is important, but one doesn't just throw down a research protocol and tell patients to they have to participate without consent, and one doesn't start in the middle of a crisis to plan for it. The time to plan for a crisis is before it happens.  It's not that the EHR cannot send data to research, it's that we didn't plan for rolling our research programs in response to a crisis.

We all do crisis planning for our information systems, some better than others.  But we think about crisis planning most often in terms of how to maintain stability within a crisis, NOT how do we refocus our efforts on system XYZ to help abate this crisis.  About the only people who do think this way are those who are thoroughly engaged in disaster preparedness and response scenarios.  And for many of those, it's 95% waiting for something to happen, and then 5% of run like hell to respond.  They do simulations, and drills, and play "war games".  When's the last time your program even considered that?  When's the last time your IT department ran a generator test, and did they consider testing for water leakage (remember that from Hurricaine Sandy)?  Does your disaster response plan for your EHR system include "updating to latest version and patches" to support novel issues in a disaster?  How about updating to latest vocabulary?  Can you add vocabulary terms in near real time to your EHR and have everyone take advantage of them?  Do you have plans for rolling out new workflows in your facility to address how to code differently than you have been to address a disaster?

If you answered yes to all of these questions, I really want to talk to you and find out where you got the support to do this.  If you aren't thinking about how to do this now, you should be.  Yes, we are building the plane as we are flying it, and we don't know where yet we are going because we DON'T have all the answers, but at least we know which direction is North, and we have some idea about our current heading, and where we want to go.  And we'll know more as time progresses.

This isn't a failing of existing programs, because in all the work that's been done with EHR systems, public health was barely an afterthought, and the agencies supporting these efforts have limited mandates with respect to public health.  They can only work within that mandate.  Public health has long needed funding and support to address a disaster of the magnitude we are now experiencing, and while experience is the best teacher, it's also the hardest.  And basically, we got what we paid for.  But, now we are learning.  I'm sure getting those mandates will be a bit easier now (and in fact, some of them have already been issued).  It wasn't a failure of the EHR program or it's mandate, it was a failure to supply an appropriate mandate for public health.

Things aren't going to get better overnight.  But because we took action to mitigate the impacts (not fast enough, but enough to get by if we continue them), we have some time to prepare for the next wave.  I'm determined to make sure that some of that infrastructure that we should have had will be in place before that hits.  It's not about what the EHR can do. It's about what I can do, and what others can do.


Wednesday, April 29, 2020

Local First .. A SANER Approach

As I think about various models for communicating data to public health agencies, I keep thinking about a federated model, where hospitals push to their local public health network, and the local public health authorities then push data upwards to state and federal agencies.  There's a good reason for this, based on my own experience.  I live fairly close to Boston, and lived even closer in 2013, the year of the Boston Marathon Bombing.

Boston emergency management officials immediately knew when the bombs first struck what the state of the EDs were in the area, and were able to mostly route patients appropriately, and coordinate efforts.  While that same article notes that the number of available operating rooms and ICUs was not known, it also mentions practice and drill which very likely made it possible for hospitals to quickly clear and prepare operating rooms to treat incoming patients.

I think also about what's happening in thee City of Chicago right now, with Rush Medical coordinating efforts to capture data for the City's public health department, and then local public health passing that same data on to federal agencies on the hospital's behalf, and it just makes sense.  It certainly makes a lot more sense than what I've heard elsewhere, where hospital staff are having to collect data, log into different portals and send data to local or state public health, and then also to two different federal agencies, all the while a slightly different data feed containing similar data is silently being sent to the state department of health from a past program intended to meet the very same need.

I can't and won't argue the point that FEMA and CDC both need the data that is being requested.  But I will say that there should be a local public health network that supports this sort of communication without placing additional burdens on hospital staff.  Let the locals push to the state, and the state to the federal government as needed, and when needed (e.g., in cases of a declared emergency).  Don't make 6000+ hospitals do the same thing twice or thrice (even if with different data sets), when 50-odd state agencies could do it more efficiently and in bulk with better quality control.  Oh, and maybe fund that (or use existing funds that have already been allocated for that very kind of thing).

And when the emergency is over, the state or local public health agencies should still keep getting what they need to address local disaster response, much like what Boston had during the Marathon bombing.  It's too late after the disaster happens to "turn it on", and in fact, the switch might not even be accessible if you wait that long.

Compare the Boston stories to Dirk Stanley's story about being at the epicenter of 9/11, and you'll see that we've come a long way in handling local disasters, but still we can do better.  Even with Boston's amazing response, there are notes in some of my reading about it regarding the lack of information about operating rooms and ICUs.

For me, The SANER Project might have been inspired by COVID-19, and one nurse informaticist's complaint to me about the crazyness she was experiencing in trying to get data where it needed to go, but I've spent the last decade an then some looking at the challenges public health has been facing since AHIC first offered ANSI/HITSP what some of us still call "The Bird Flu Use Case", and which was preceded by the "Hurricane Katrina" use case, and before than the "Anthrax Use Case".  All of these were about public health and emergency response.  The standards we wanted weren't ready then, but they are now.  And so am I.  Let's get it right this time.

Monday, April 27, 2020

A SANER $convert operation for Converting a FHIR Resource to CSV format

One of the points of SANER is to make it easier for organizations to integrate with a FHIR Server to support measure reporting.  To address this, I introduced the concept of Coping Mechanisms, but need to take it a step further.  We also need to address some of the other missing interfaces to query for Measure and MeasureReport resources in a few different ways.

A MeasureReport is defined by a Measure, and carries with it a very small set of important data for the receiver in public health context.  Take for example, the Measure based on the CDC/NHSN COVID-19 Patient Impact and Hospital Capacity Module.  In CSV form, this needs only a few bits of information to be communicated:

  1. The facility identifier of the reporting facility.
  2. A unique identifier for the report.
  3. The date associated with the reported data.
  4. - 17. The set of reported values
How would one define an API to convert the MeasureReport to CSV form that would 
  1. Fit with the FHIR OperationDefinition concept, and 
  2. Yet be fairly simple for someone NOT understanding FHIR to use.
The $convert Operation seems to be a good starting point for this.

Obviously, _format (thinking in FHIR) or the Accept header (thinking in REST) should be text/csv for this operation to be acted upon as a CSV converter.

CSV output is well defined, but the conversion process isn't so well defined.  
One could arguably just dump the JSON path as the column header names, and the leaf node values (e.g., @value or @url).  In fact, there are several JSON to CSV converters that do just that.  While that would work, it misses requirement #2 by a long shot, and doesn't allow the user to control the column names.

This gets back to allowing the user control over naming things in the CSV header, and mapping each header to values in the report.  There are a number of values in the report that were mentioned above.  I could easily add a composite parameter like header=name$value to the report where name is a string giving the header name, and value is an expression giving the "path to the value" in the MeasureReport.
  1. For CDC/NHSN, this is the NHSN identifier.  We have an idea what this looks like, but no real examples.  It's fairly obviously either reporter.identifier or subject.identifier (and more likely the latter for reasons discussed later).  If I were to use FHIR Path, I could say something like: subject.identifier.where(system = 'NHSN Facility Identifier System URL').value and that would give me the right thing.
  2. This is either or perhaps better, MeasureReport.identifier.value (but for which identifier).  Since these are NHSN generated values (probably OIDs but we don't really know yet), the FHIR Path is probably something like MeasureReport.identifier.where(system='NHSN Report Identifier System URL').value.
  3. Easy:, or better yet, just date.
  4. Each of these is a group.measureValue, or a group.population.count (and note, I've started to drop MeasureReport from the FHIR Path, which is still valid, since MeasureReport is the context for the conversion).  But I have to identify which of these ... so group.where(code.where(coding.where(system = 'ick' and code='blech'))).measureValue or group.population.where(code.where(coding.where(system = 'ick' and code='blech'))).count.  Ick and blech are simply syntactic meta-variables which I named that way because as I'm writing these expression, my stomach is churning for the poor non-FHIR-aware end-user.
So, I like where this is heading, but FHIRPath is still to FHIRy for some.  How can we simplify this?

If MeasureReport is the context, can we relax FHIRPath a little bit so that it would be more friendly?

To start with, I really like Sushi's # notation for codes, so I could say something like group.where(code=ick#blech).measureValue.  And since I honestly don't care about system (since MeasureReport references Measure, and code systems are pretty well defined), I could further simplify to group.where(code=#blech).measureValue, which is getting better.

Could this be further simplified?  The measureValue field only appears in group, could there be another syntax to identify it?  In XPath, I might say //f:measureValue[../f:group/code ...], or *[code=...]/f:measureValue.  FHIRPath doesn't have a .. like traversal.  However, I might define *to mean any field, and thus use *.where(code=#blech) (where * is simply and alternate name for descendants()).

So, now we have *.where(code=#blech).measureValue, but why stop there?  How about making *S mean mean *.where(S) instead?  Now we have *code=#blech.measureValue.  This is getting better, but still not as good as it can get.  Code is an obvious index in FHIR, so in fact is system, and a few other "slicers".  In fact, the slicing pattern is a fairly common one.

What if * were a "slice" operator, where slices were attempted starting from near and heading to farther away, so that *#blech implicitly meant, a thing whose slice is identified uniquely by an identifier or code whose value or code was blech.  And *name# could mean, that thing whose slice is identified with an identifier or code system with a system URL of name.

There might be many such possible slices.  Each possibly slice would be ordered in size from smallest to largest (from inner to outer slices).  And we could provide a selection operator that worked on picking the right one using the . operator.  So *#bleck.measureValue simply means that measureValue who slicer is a code or identifier with the code or identifier of bleck.

Now, I might say something like:

Can I go JUST one tiny step further and say that if you are slicing, take the simplest content you can for the slice, so that if the slicer is a code in a field X (e.g., found in X.code, X.category, et cetera), then the sliced value is very likely the next simple value you could report?

This might even be prioritized: 
  1. (.*V|v)alue if there is one, 
  2. (.*C|c)ode if there is one
  3. (.*N|n)ame if there is one
  4. the first primitive type if none of the above match.
And furthermore, if since *S always has # in it, I might further simplify to say that if S has #, then * is not needed.

Now I might say something like:

And finally, if X$X, then I need say only X, so:

But # means something special and needs escaping, so just use | like FHIR does in query parameters.

And finally, if X$X, then I need say only X, so:

AND NOW that's an API that has a well defined meaning, maps back to FHIRPath with some additional rules, and makes some sense to the common user who's not seen FHIR yet.

Testing this against a few other resources:  If the slicer is a LOINC code, this would get Obserevation.value.  If the observation is a panel (e.g., urine test), I could get a CSV of the component values quite cleanly.

If the slicer is a string (for item.linkId in Questionnaire), it would get item.text, which is NOT quite what we want.  Let's add answer[X] to the prioritized list but what happens when there can be more than one answer?  Hmm, have to think about that one.  Perhaps that CSV answer is a comma separated list of values in the cell?  That's not clear.  It's probably good enough though for now.

I'll have to work up some grammar for this.

Sunday, April 12, 2020

Configuring Eclipse to run SUSHI over your Fish Tank to generate FHIR Resources and Profiles

As I write more and more FHIR Shorthand, I just want it to work better
in MY development environment.  I principally use Eclipse based tools (Eclipse and Oxygen XML Editor).

  1. I want a quick launcher.
  2. I want easy navigation to errors.
  3. I want better error messages.
  4. I want syntax highlighting.
Like I sometimes tell my kids.  Now you know how it feels to want.

I did manage to create a launch configuration, and you can see the content below.  So, now I've got #1, and you can have it as well.

Save this in a text file with the .launch extension

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<launchConfiguration type="org.eclipse.ui.externaltools.ProgramLaunchConfigurationType">
    <stringAttribute key="org.eclipse.debug.ui.ATTR_CONSOLE_ENCODING" value="UTF-8"/>
    <stringAttribute key="org.eclipse.ui.externaltools.ATTR_LOCATION"
       value="C:\Program Files\nodejs\node.exe"/>
    <stringAttribute key="org.eclipse.ui.externaltools.ATTR_TOOL_ARGUMENTS"
      value="&quot;C:\Users\{YOURUSERNAME}\AppData\Roaming\npm\node_modules\fsh-sushi\dist\app.js&quot; fsh -o ."/>
    <stringAttribute key="org.eclipse.ui.externaltools.ATTR_WORKING_DIRECTORY" value="${workspace_loc:/fhir-saner}"/>

Edit this file to fix the location where Node and Sushi are installed for you, and what fishtank you want it to run on, and then import it using "File | Import | Launch Configuration" into your eclipse environment.

Now, if you want to run SUSHI, you can easily do it.  I'm sure there's more that could be done to parameterize things like where node.exe is, and who the current user is, and how to run it on the default fish tank folder of the current project.  When I have time to mess with it, I may get around to it.  

I'm sharing what I've learned though, in the hopes that somebody else who knows how to make 'clipse stuff work might take up the task.

Saturday, April 11, 2020

Measure Participations for The SANER Project

There's so many people and organizations involved in creation of a computable measure, it's hard to keep it all straight.  That's because there's the "Measure Resource", and the measure it represents, and these are separate things, getting to the former from the latter is a process (executed by Publisher, so lets start)


For The SANER IG, the publisher is going to be HL7.  So we'll follow the same protocols for the Measure resource as we would for the IG content at this time.  Other attributes determined by the publisher include: contact, copyright, status and experimental, because the attributes are about the Measure Resource, not the originally authored measure.  The status of the measure will certainly impact the status of the Measure Resource, but these describe the Measure Resource artifact, not the original measure content.  If you want to get into metadata about the author's original measure, I suggest you look into DocumentReference, which is intended to do just that.  We aren't going to use that right now due to complexity.


Author is who wrote the content describing the measure, not necessarily put it into Measure Resource format.  For the Covid 19 Patient and Hospital Impact Module, that would CDC who wrote the form and instructions.  Having established that, the effectivePeriod for the measure becomes whatever the author says it is, which could be before the publication date as a Measure Resource.  That's legit.  The effective time captures the start and ending period over which the measure is approved to be in use by the author, at least to start.


The editor is who prepared the Measure resource for publication.  That would be The Saner Project.


For measures going through something like a ballot process, the reviewers are the balloters, or the organization (HL7) running that process.  There are other review processes, and I expect we'll eventually evolve those in The Saner Project.


Endorsers are organizations that "approve" the measure as fit for a particular purpose -- certifying or regulatory, or similar bodies.  There can obviously be multiple endorsers.  A measure can be published without endorsement, and endorsements of a measure are asynchronous from publication.

Fighting COVID-19 with FHIR®

Friday, April 10, 2020

SDOH in COVID19 Measures and The SANER Project

While talking to some healthcare providers in my local, and a few other regions, I've heard statements about the apparent impacts of poverty on COVID-19 risk, mostly based on anecdotal evidence.  I honestly don't doubt it exists, and although I don't have the data available to prove it ... others do.

That led to a creation of a measure request based on social determinants of health in The SANER Project.

Not much after we added measure requests for staffing and supplies, CDC added two new COVID-19 modules into their reporting for similar items.  We had already agreed we were not going to spend much time deciding on "experimental measures" for our Connectathon release of the guide.  But I did reference the recently released CDC guidelines because it has the categories they think are important, and frankly, I don't need to second guess them.

Having thus concluded that for the purposes of COVID measurement, we'd try to use the CDC as an authority where possible, it occurred to me to look into how CDC was evaluating social determinants of health.  The National Center for Health Statistics publishes an annual report, titled Health, United States, and in it, you will find rather detailed descriptions of how they classify certain categories that impact Social Determinants of Health.

Age, Gender, Race and Ethnicity would likely be covered in an existing measure request, and the singular for "Gender" is likely to get into a discussion around gender, sexual preference, and sex at birth.  Given these are deemed a given in EHR systems certified by ONC, I think we can take it for granted that the data should be available, though perhaps not always readily accessible.  Age gets interesting because current reporting (available to the public) is in 20 year chunks, though I think I've seen some data in 10-year chunks and one which pulled out 0-2 for special attention, but other reporting looks at 0-18, 19-44, 45 - 64, and 65+.  We're rapidly reaching a point where 65 is no longer the age break for social security or medicare benefits, and the justifications for 18 because it is the age of adulthood is perhaps questionable.  I'd stick with what people are using for COVID reporting right now though (e.g., 10 or 20 year brackets up to 80) because it's relatively simple.

The regional classification associated with the patient (urban, rural, et cetera) is likely a readily available datum for stratification if you can get to the demographics for the patients counted by the measure.  But, as you can see if you clicked the link, there are at least 3 different classification systems that might be used.  Geodata can get to census tract from an address, hospital counties are readily accessible, and for -ish sorts of things, that might be good enough (though some note that the Grand Canyon is classified as metopolitan, which, if you've ever been, surely isn't).

That leaves a few other factors to address:


This one doesn't seem that hard.  In the Health, United States report, it's a simple three tiered classification no difficulty, some, a lot or cannot do (where the last two are clustered into one bucket). The determination is based on the report of one or more categories related to ability to function (see the report for details on how they classify).  If we want to make SDOH data useful, it should be aligned with where existing research has already gone.  I'd stick with the 3-tiered classification.

Education Level

It gets as granular as years of schooling, but the key categories are no high school diploma, diploma or GED, some college, bachelors or higher.  Some include AA degree as a stratum, but it's not much different from "some college" according to this chart.


Do you have it?  Yes or No. If Yes, is it private, or is it Medicaid?  These are the important strata used in the report.

Food Security

I'm not really clear here on where to go.  There's not really anything I spotted in the Health US report, and well, it's late as I write this.  Honestly, I think income and housing are probably as indicative of food security.  But, I also learn, there's a Z-code for food insecurity in ICD-10.

Income Level

This one gets tricky, because it's different based on the size of the family unit, and it changes annually.  The Health, US report covers it as number of 100%-ile units above/below the HHS poverty guidelines generally, rather than as the census covers it with Poverty Thresholds.  If you want to understand the difference, Google it.  What's interesting here is that other research on poverty and health in the CDC uses two breakpoints: 130%, and 350%.  Part of the reason for that is that many Federal guidelines use 130% as a qualification point for certain types of federal assistance, and 350% splits the remaining population into generally equal sized chunks.  I'd go with the latter, because fewer is less work, and the 130% mark would seem to address some confounding challenges around food security.  But then there's a Z-code, and it breaks at 100% and 200%, and that aligns with Gravity work in HL7, and the PRAPARE tool in use by Kaiser Permanente and others.  I think we go with that because it's already accessible to some.

Housing Security

Again, not much to go on, but I'd guess there might be three big strata: Homeless, rent, or own. For homelessness, there's a Z-code in ICD-10 (see previous link under Food Security).

Employment Status

I'm not sure how good an indicator this is given the current rapid rise in unemployment.  I'm sure it's a factor though.  There's a Z-code for this in ICD-10.

My Proposed SDOH Strata for COVID

A lot of the data above is not readily obtainable without additional efforts on the hospital side, which is likely something to avoid.  What's likely already known: Insurance, Patient Address (and a proxy for homeless vs. Rent/Own), and functional/disability status (though not a completely), and possibly employment status.  The ICD-10 Z-codes are also somewhat in alignment with the PRAPARE tool that others have examined, including the Gravity project in HL7.  Z-Codes have the benefit of having been around long enough to already be in the EHR system.

So, what I'd go with is the minimum set of Insurance class (none, Private, Medicaid), the six Z codes covering employment, education, homelessness, food insecurity, and income level, as individual strata for COVID+/All patients.

This isn't a perfect stratification, and I'm sure we could debate the merits of other formulations.  It's going to be marked experimental (like anything other than the CDC/NHSN or FEMA measures for connectathon, and I think it's good enough to see what others can do with it.


P.S.  It's amazing how analysis paralysis disappears when you need it yesterday, and you have to work with what's available now, not next month, or with a little more work and research, and that's a key attribute in your decision making criteria.

Thursday, April 9, 2020

How will SANER Cope with Existing HealthIT Infrastructure?

One of the principles of The Saner Project is that it is both a BIG change to the way we do things, but also a SMALL one.  Which is to say that the software components (which we are now starting to parcel out) will be able to work with existing and available interfaces, and can support better automation.  FHIR is the interlingua in which we might do the heavy compute and automation, but the transmission is much simpler.

The measures inside the MeasureReport resources that we've created thus far have a simple relationship inside to data elements as they flow over the wire.  We define the measure precisely so that we can simply extract and transmit the essentials, and effectively reconstruct it at the other end.

Those small pieces of software which connect those existing interfaces to a FHIR endpoint are "Coping Mechanisms", but that's a mouthful, so I'm calling them Copes.  What's a Cope?  It's something that allows two things to work or fit together.  In carpentry, it's a way to join things together at the bendy bits so that they make for tight seam that looks appealing.  There are a number of ways to cope a joint, some are more difficult than others, and the way you do it might depend on a number of other factors.  When both ends are somewhat mobile and fungible, a mitre box and cut at 45° will remove the least waste and provide a nice fit.  When one end is stuck where it is, the other end has to do or take all the work to make it fit.  We will probably need both.

I have thoughts about a dozen or so Copes:
  1. CSV-to-MeasureReport
    • I'm already using one of these to convert data from The COVID Tracking Project to a Saner MeasureReport (using the FEMA Measure).  It could have been put together in about an uninterrupted day's work, not that I have such at the moment, so it was a night's work.
  2. MeasureReport-to-CSV
    • This is even simpler in some ways.  The critical data in a measure is well identified, just yank it out and put in into an orderly set of rows following a header.
  3. CSV-to-XLSX
  4. XLSX-to-CSV
    • This is really just a hack to deal with the FEMA Spreadsheet.  FWIW: One should never send native Word or Excel documents around if one doesn't want one's name and institution to be known.  It's buried in the metadata in the spreadsheet.  Realistically, sending FEMA data should be done via CSV and not XSLX.  But I know who to talk to about it because they didn't clean the metadata before it was published.
  5. CustomJSON-to-MeasureReport
  6. MeasureReport-to-CustomJSON
  7. CustomXML-to-MeasureReport
  8. MeasureReport-to-CustomXML
  9. ER7-to-MeasureReport
  10. MeasureReport-to-ER7
    • The six above are simply recapituations from structured to differently structured, where the critical bit is mapping from MeasureReport to data fields in a custom thing, or vice versa.  In the last two, the ER7 acronym refer to HL7 pipes and hats format, and are really about extracting specific values out of a V2 message to populate a MeasureReport.
  11. Aggregator
    • Collect a bunch of MeasureReport values, add them up and spit it back out as an aggregate report.  How does one collection?  By time?  Geographic region?  Hierarchical structure of some sort (e.g., city/town, county, state, region, nation.
  12. Push-Me-Pull-You
    • If A wants to push, and B wants to pull, they cannot talk to each other.  The Push-Me-Pull-You Cope sits between the two, and acts as sort of a store and forward channel.  This BTW, is simply a classic FHIR Server, although we will see customizations on the Pull side to support different kinds of search.
  13. Pull-Me-Push-You
    • Similarly, if A is expecting pull, and B is expecting to be pushed to, we need to put the Pull-Me-Push-You in the middle to periodically collect and transmit data from A to B.
  14. V2-to-FHIR
    • I just happen to have one of these laying around, but so do others.  HL7 O&O has been working on this project in HL7 for the past year and more.  Yes, there are some useful feeds that contain observations that are exactly about Situational Awareness of groups of things.  Mine used to be configured using a ConceptMap.  No human should ever have to write so many angle brackets ... it's crazy making.  Thanks to Sushi, we don't, and can remain saner.
  15. FHIRtoYAML
  16. YAMLtoFHIR
    • It's about time we did these two.  I don't have any excuse other than it seems like a good idea at the time (then again, it is 1:13am as I write this)
14 copes sounds like a lot to write.  I'm hoping that entrants into The Resiliency Challenge can help with these.

Oh, and one coping mechanism that I don't think we'll spend much time on with Saner?  That's not really an effective coping mechanism after all, that's fingers on keyboards.  No, I'm hoping the keyboarding is all about writing code to getting data from where it is, to where it needs to be, without unnecessary human intervention (which doesn't mean it lacks human oversight).

The point about Saner isn't to replace existing infrastructure immediately.  We need to cope with what we have, and that's what makes Saner, well .. Saner.

Friday, April 3, 2020

What is 6000 * 5 * 4 * 7 * 52 / 60 / 24?

To damn much if you ask me.

6000 hospitals.
x    5 minutes / form reported
x    4 forms reported / day
x    7 days / week
x   52 weeks / year
= 30333.333 hospital minutes

How fast could a computer report that if we but had the right infrastructure?  Surely 0.333 minutes would be enough.

I do NOT want to hear someone's brilliant idea about how to use the web to solve the COVID surveillance problem one more time if all they can do is implement <FORM>.  Sure, it's FORM on top of a Bootstrap/React UI with Node.js and NOSQL backend, but the front end input is still a human at a keyboard, the same as it was when Tim Berners Lee invented HTML in the mid-90's.

I do NOT want to hear about one more front-line medical profession typing in stuff into a form.


Use a camera, a sensor + arduino, an ultrasonic tag, or spare printer parts.  Do whatever it takes to enable people in a crisis to operate at the top of their license, not the bottom.

If "all you need is a phone", understand that in that phone is a supercomputer that is 100,000 times more powerful than necessary to land on the moon.  Use it the damn thing.


Tuesday, March 31, 2020

Classifying Strata for Beds and Ventilators for The SANER Project

I'm presently working on terminology to help classify what is being counted in various measures for Beds, Ventilators and other equipment for The SANER Project.

As I look at the classifications that others are using:

Physical Capacity: Total number of beds (or other things being counted)

This breaks down in two different ways, by licensure and staffing.
Licensed Capacity: Number licensed (interesting but not important in emergency cases)
Surge Capacity: Number of additional that can be added in overflow situations.

Staffed Capacity: What has staff to support treatment.
Unstaffed Capacity: What does not have staff to support treatment.

Type of location:
A Bed is located in a part of a hospital (or similar facility) and is intended to support:
Inpatient care - Beds meant for patients with acute disease, but not needing emergency treatment.  Inpatient care includes patients admitted for "Observation".  They aren't well enough to return home, but they aren't sick enough to require a higher degree of attention.  The distinction between Inpatient and Observation is generally a billing distinction, not one that truly addresses other characteristics of location.

Acute care: A subtype of inpatient care that provides for treatment of patients with care, but not intensive.
Intensive care: A subtype of inpatient care, providing a higher level of care, treatment and staffing than acute care.
Critical care: Some institutions have a level of care between Acute and Intensive which has higher staffing levels and treatment needs that normal acute, but lower than "Intensive".  Cardiac care units and critical care units fit into this category.  For the current situation, we think that CCU (whether cardiac or critical care) should fit into the Intensive care category when counting.

Burn Units: A burn unit is a specialization of an ICU that supports those needing treatment for burns (heat or chemical).  These have additional equipment needed to treat patients who have significant loss of skin due to burns (e.g., cooling baths, higher temperature controls, and additional treatment resources).  Such units might be used to treat patients who need ICU for which a normal ICU is not available, but this is not ideal use of resources.  There's some question about whether these should be counted the same as ICU beds or differently from ICU beds.  For the current crisis, this question may not be able to be answered.  There will be crises in the future where this distinction is critically important.

Emergency care - Emergency beds are those meant to treat patients who have urgent or emergent
care needs which must be addressed before admission (or discharge to home or another location for treatment.

Post-Acute care - Other facilities also provide spaces for treatment of non-acute disease, which I would describe as care needed to support rehabilitation and recovery, or long-term care.

Outpatient care - Outpatient care beds include those meant for patients who are recovering from a procedure, or in other similar situations.  See rooms below.

Other hospital facility space includes:
Operating rooms - facilities for performing surgical procedures.
Procedure rooms - facilities for performing other procedures (usually diagnostic).
Recovery rooms - facilities for treating patients recovering from surgery or other procedures that do not need post-procedure acute care.

These spaces may be reconfigured in emergency to support other uses.

Beds may be designated to treat patients within a certain age group:
Neonatal care beds (Nursery, NICU) are designed to support newborns and infants.  They cannot be readily used to treat older children or adults, simply due to size limitations.

Pediatric care beds are designed to address the needs of children.  They might be used to treat adults in cases of emergency.

Adult care beds are designated to address the needs of adults.  They can be used to treat children in cases of emergency.

Many reasons for the distinctions between adult and pediatric beds are to address the different needs of these two populations when staffing and resourcing and planning the facility, rather than physical capacity restrictions (unlike the situation for neonatal care).

In Use/Available/Unavailable
Beds and other assets are either available or in-use or otherwise not available (broken, missing, on loan, et cetera).  When being counted according to current guidelines, nobody seems to have addressed the "not available yet not-in-use" state.

Supplies (test kits, masks) are either on-hand (available for use), consumed (used for their intended purpose), disposed of (e.g., due to contamination, transfer to another location, or otherwise).  For test kits, we need to distinguish between "test kits" which do the actual diagnostic test, and "specimen collection kits" which are used to collect and safely transport the specimen from the collection location to the testing facility.  A shortage of either can cause problems.  We also need to clarify the type of specimen used (blood vs. nasal or other swab) b/c sending a blood sample to a site that can only test via swab isn't going to help.

For equipment, we see discussion around ventilators and ventilator slots.  We should be counting both in some way if both are in use.  Adding slots to a ventilator (treating two or four patients with one ventilator) is being done today.  It's a modification of a medical device that should only be done under specialized scruitiny and only in emergencies, but it is being done in New York at this stage.

There should be some clarity around how these are counted, because if people start thinking about how to redeploy ventilators to support patients, and one facility is counting slots and another counting ventilators, that's a problem.  I'd suggest to "count ventilators" when talking about equipment, and count slots when talking about patients.

There's also the distinction between invasive ventilator (those requiring a tracheal tube), an non-invasive ventilation (CPAP and BiPAP), and also between those that are designed for long-term use and solutions that are coming out of the maker/innovator community which are specifically designed as "surge capacity devices."

Sunday, March 29, 2020

Printing a FHIR IG

panicking leslie nielsen GIF For folks who are used to working in Microsoft Word or with PDF documents, trying to review a FHIR Implementation Guide on the web can be a little bit daunting for the first time.  A colleague of mine tediously copied all the pages for a guide that we had produced for the Mobile Health workgroup so that others could do corrections in line.

It's really not that hard to do, just a tedious way to spend 15 minutes to an hour (depending on page count).

I've turned other highly linked web formats into single page HTML specs in the past ... it really isn't all that hard to do.  You need a scraping tool, HTML Tidy (jtidy won't cut it any more, it just hasn't gotten the love), and a very little bit of XSLT.  If the specification already has a table of contents page, it's even easier because then you don't have to worry about how to order the material.

Just about every FHIR IG now produces a ZIP file with the entire specification.  So, download the zip file, unzip it to a folder, and run the this XSLT against it (after you clean up the html).  I threw this together in like 10 minutes between five other things I'm trying to get done, but it worked.

If you are using a spec I worked on, there's a "download this specification" somewhere on the home page.  Otherwise, just replace index.html with if the person who wrote the spec is using the FHIR IG Builder or older tools.  That will get you the full data.

  1. for %f in (*.html) do c:\util\tidy -utf8 -asxhtml -n -m %f
  2. java -cp c:\saxon9h2\saxon9he.jar net.sf.saxon.Transform -s:toc.html -xsl:gen_single.xsl  -o:single.html
  3. single.html

After you've launched the output, you can:

  1. Print to PDF
  2. Copy and paste everything on the single page to a Word document and distribute it.
  3. Turn on your printer and spend a ton of money printing something out.
There are a lot of places this could go.  I would love to see this little gist turned into something that would turn a FHIR IG not just into a PDF, but rather into an e-book.  It's not that difficult to do.  But, maybe I'll get back to it when the crazyness settles.

Meanwhile, stay sane, stay safe, and stay home.


Playing with Control Charts for COVID19

I've been playing with Excel and control charts over the last week to get an understanding of where things are headed.  I get the human difficulties with understanding exponential growth, so I wanted to look at the data in a different way.  The italic text is my commentary on the blog.  The normal faced text is what I've been tweeting, the bold text is when.

If it looks to you like the exponent on infection growth rate is increasing, you are probably right. I just looked at the 5-day LOGEST values (estimate the exponential growth based on last 5 days activity), and the rate has risen 4 out of the last 5 days. Testing just started...

So, this isn't scary to me YET. What it means is not that the real exponential growth rate of infection is increasing, but rather that the rate of our knowledge of exponential rate is increasing. But more testing is still needed to get the numbers to settle down ...

There's gonna be lots of numbers for the epidemiologists and hyper-mathy folks to study RE impact of testing volumes (see ) on estimates of real growth rate when this is over. I don't recall signing up for that clinical trial though.

I suspect there they may even be some new hyper-mathy stuff that addresses statistical process controls applied to exponential growth curves. I kinda faked a control chart with my 5-day moving estimator for the growth rate. There should be some useful signal there ...

I know enough math to know that something should work, maybe even already exists (but maybe and even possibly not yet). I also have enough math to know it's not going to be me who actually proves it.

I've seen enough in that one graph to tell me what I already knew, the real rate is worse than a lot of people thought, or the current graphs show. The good news is that we are seeing a correction now. The bad news may be that it will take a longer time than we want to adjust.

But I'm still not panicking yet.
And neither should you.

3/27 (early morning) 
I just updated my spreadsheet. The good news is that the infection rate is declining, and has been for eight days in a row. However, it's also stabilized, which means it might be getting ready to inflect the other way.

If you look at the graph above, you see eight points in a row declining.  That's enough to trigger an alert in a statistical process control environment (actually 6 or 7 depending on whose set of controls you use).

3/27 (later in the day)
OK, my chart gets better. I can compute New Cases / Total Cases, which should be a constant for an exponential function. In fact 1/(1-New/Total Cases) recovers the base of it. Here's the new control chart on the base value, I find it more helpful.

This above graph was inspired by this great video on plotting exponential data.  It is worth including in this post, just so you can see what I'm doing is valid.  I still like the orientation of my approach better because you don't have the overwhelming sense associated with a graph plotted on logarithmic axes gives you.  It's just a little number bouncing around telling us how fast cases are growing, without really making it apparent how fast an exponential works.  Since I'm looking at that growth everyday, and listening to my family's response to it, I get the impact of it.

Today's update is promising. The downward trend in growth shows we are making progress. Note that it took two weeks to get farther than 1 standard deviation in the right direction. Maybe by tomorrow we'll be at 2.

Several additional notes here:
  1. Remember that set of 8 control points going down?  How come I don't have that once I switched to this format?  That's because I'm no longer reporting the 5 day average, just the current days rate.  See that dip on 3/21?  It gets included in the 5 day average in the previous format, which hides the current day's signal.  I kept the 5 day deviation because it tells me something about how the rate is settling in, or not.  So, now we have 6 on the same side of the average line, that's another kind of signal in statistical process controls (or 7 or 8, again, rules vary).
  2. Statistical process controls are for managing stable processes.  When the process is changed, you have to adjust the controls your are using.  I'm thinking that I should be marking the graphs above on a weekly basis with the per week mean and standard deviation lines so we can see HOW the process is changing over time.
  3. There's no API to work with the data I'm using, but I there's a way to scrape it from the page I'm recording numbers from.  I haven't bothered because it's licensed data.  What I've done above falls (as best I can tell) under fair use.  Scraping their page dynamically isn't what I'd call fair use.  If they do publish an API for this data set, well then I might take a crack at doing something with this little gem of a gist.
  4. There IS a source of publicly available data  from John's Hopkins that does work via APIs.  The API is simply:{date}.csv

Saturday, March 28, 2020

Technical Environment for The SANER Project

This started out as a page describing the technical environment for the SANER Project, but I'm busy reintegrating some of my development tool set back into the build, and so have not been able to push a new build.

Resource utilization data can be provided from a number of sources:

  • Systems Having Clinical Data
    •   Electronic Health Record (EHR) systems
    •   Emergency Department systems
    •   Labor and Delivery systems
    •   ICU and/or Nursing Central Monitoring systems (or stations)
    •   Laboratory Information Systems (LIS)
    •   Clinical Data Repository (CDR)
  • Systems not having Clinical Data
    •   + Bed Management (a.k.a., Housekeeping) systems
    •   + Asset Management systems
    •   + Surgery/Operating Room Scheduling systems
    •   + Staff Scheduling systems
    •   + Inventory Control systems

Systems Having Clinical Data

Systems in this category have access to some or all of the health records for a patient,
and so can often provide information indicating COVID-19 positive or suspected patients, as
well as associated problems, patient demographics, and patient acuity (severity of
illness) data.

These systems are often used to place or discontinue orders that involve medical
equipment (ventilators, viral tests), and so may be used to determine in use (in
the case of ventilators) or consumed (in the case of test kits) equipment or
supplies.  Some orders might also indicate use of special equipment (e.g., isolation

EHR Systems

Comprehensive Hospital EHR solutions may include the capabilities of the other systems
listed above, or may be integrated with other systems but not have direct access
to all data available to the other systems. Even when those capabilities are available
in the comprehensive EHR solution, other solutions may still be chosen by the facility
for a variety of reasons (features, cost, legacy, et cetera).

An EHR may have access to beds in use (because it has access to the active patient
census), but may not be able to report status of beds as known by the bed management
or housekeeping system (e.g., beds available for use, beds needing cleaning, beds
taken out of or added into service, et cetera).

An EHR may also be able to make an educated guess about number of ventilators in use
based on the number of orders for ventilation on the patients it knows about, and the
current status of the order.

Emergency Department Systems

Emergency department systems are simply specialized EHR systems that facilitate patient
care in an emergency room setting.  They may also support or be integrated with central
monitoring solutions enabling ED staff to monitor the status of patients on monitoring

Labor and Delivery Systems

Labor and Delivery systems are another form of specialized EHR system that facilitate
treatment of mothers about to give birth in the hospital.  They generally support the
ongoing monitoring of the pregnant mother, and integrate with specialized equipment
used to support newborn delivery (e.g., fetal heart rate monitors, infusion pumps used
for anesthesia, et cetera), as well as routine charting while a mother is still in labor
but not yet ready to deliver.

ICU/Central Monitoring Systems

These systems bring real-time data from the EHR together with a variety of monitoring
and treatment equipment, often to provide clinical decision support for patients
needing intensive ongoing treatment and monitoring.  As a result, these systems have
awareness of the use of medical equipment, patient acuity, disease progression, as well
as the in-use status of ICU beds, and perhaps the total bed capacity of an ICU (but
not necessarily the availability).

Laboratory Information Systems (LIS)

These systems are used to track and control incoming laboratory orders, to manage
laboratory automation equipment, and to manage outgoing reports on orders.  They have
some access to patient clinical and demographic data, usually enough to facilitate the
interpretation of the laboratory test, but may not have access to more data.  Some
data available in an LIS might be used to assess patient acuity, but the EHR would
be a better source of this assessment.

An LIS may also be connected to external public health reporting systems to support
biosurveillance efforts (tracking of disease in populations).  Just the placement of
certain kinds of laboratory orders may be used as a trigger to initiate alerting to
public health (e.g., highly contagious disease such as Ebola or Zika, or a condition
which may indicate a high risk situation in the community such as food poisoning).

Past biosurveillance efforts have not generally considered the impact of disease (such
as COVID) on available beds, but the impact of COVID on hospital bed capacity has now
made this a significant consideration.  Some organizations do use data from internal
laboratory information systems to track the prevalence, type and locations associated
healthcare acquired infections (HAI) (e.g., due to antibiotic resistant strains of
bacteria), in order to provide appropriate treatment and infection mitigation precautions.

Clinical Data Repository (CDR)

Hospitals (especially those affiliated with academic medical centers) utilized CDRs
for long term storage of clinical data to support analysis, research, measurement and
quality improvement efforts.

CDRs may have information about the long term impacts of disease, treatment procedures
and other factors on hospital operations might be used to aid research, but do not
generally have real-time data that could support utilization reporting.  Some of the
data in a CDR might show impacts of high utilization on hospital operations, which could
aid in identifying and addressing long term monitoring efforts.

Systems Not Having Clinical Data

Systems in this category do not generally have access to health records for a patient,
but may have information about the status of hospital equipment and supplies.

Bed Management (a.k.a Housekeeping) systems

Bed Management systems are designed specifically to keep track of the status of beds
as it impacts the operations of the housekeeping in a facility.  After a patient is
discharged, the bed and room in which they resided need to be cleaned, special precautions
may be needed when rooms have been contaminated (exposed to blood, or infectious organism),
et cetera. These systems may also have operational data about ongoing bed turnaround
time (e.g., from unoccupied to available for use) which can also impact availability.

In smaller hospitals, the classic method of bed management is a bed board, which can
be as simple as a whiteboard with a table drawn on it, with room numbers, more complex
systems might use a magnetic board with pretty colored magnets. Modern bed boards get
really fancy, with ward layouts and color codes, and all sorts of bells and whistles
and reports and graphs.

Asset Management systems

Asset management systems usually involve solutions that enable a facility to manage
equipment inventory, tags that can be attached to equipment for tracking, and sensors
that can detect nearby tags deployed in the hospital environment.  Sensors typically
need to be connected in some way to the hospital network, and the tags need to be able
to operate in a radio-frequency and sound and barrier rich environment.  This is combined
with mapping software which can plot the location of a device in a facility (in 3

A typical small hospital might have 10 ICU beds.  With an average ventilator utilization
ranging somewhere between 15-45% (see,
a facility could reasonably get by with a half dozen or so ventilators for the ICU.
Under normal circumstances, such a small number would be reasonably managed with manual
processes, and for that reason, many facilities may not use asset management systems
to track ventilator locations.

Surgery/Operating Room Scheduling systems

In certain crises, the availability of an open OR for a patient needing immediate emergency
surgery would be a critical piece of facility resource utilization data.  Operating
rooms use a variety of complex, expensive medical equipment and resources.  Hospitals
that want to optimize use of these spaces and equipment will use information systems designed to
ensure greater utilization of available capacity.  These systems will be aware of the
availability of operating room schedules and equipment needs.

Staff Scheduling systems

Staff scheduling in a hospital is an ongoing effort of looking at the current patient
load, forecast patient load, existing staff schedules and available surge staffing resources.
Again, in smaller facilities, much of the essential management might be done through
human effort, and on a whiteboard. In larger facilities, software might be used to address
optimization of schedules, improving staff utilization.

Inventory Control systems

Inventory control systems are used to track expendable supplies, and manage replenishment
and distribution across the facility.  These systems may have information about available
supplies such as surgical and N95 masks and personal protective equipment (PPE) that is
frequently replaced.  But these systems don't always automate the counting process that is
often needed to track current inventory levels for this kind of equipment.