Tuesday, June 26, 2012

Computing in HQMF and QueryHealth

This post begins to address continuous variable measures and similar kinds of computations in Query Health and HQMF.

Most of what HQMF does is allow you to specify how things are being counted, and as I mentioned yesterday, what you are counting is a function of your implementation model.  Change your implementation model (or enter it at a different point), and what you count can go from being patients to encounters, or even something else.

An HQMF counting diabetic patients who have an A1C result greater than 9% is similar to the SQL COUNT() function in a select statement with a complex join and criteria.

SELECT COUNT(P.pid) FROM Patients P 
JOIN Conditions C on C.pid = P.id 
JOIN Results R on R.pid = P.pid
WHERE C.Condition = 'Diabetes' 
AND R.type = 'HgA1C' 
AND R.value > 9
AND R.unit = '%'

Other measures need more than counting.  Suppose you wanted to compute a quality measure for the average number of days of stay for delivery of a newborn.  In order to compute that, you would need to find all inpatient encounters for delivery of a newborn, and then take the average over the lengthOfStayQuantity attribute of each of these encounters.  In SQL, this would look something like this:

SELECT AVG(E.lengthOfStayQuantity) FROM Patients P
JOIN Encounters E ON E.pid = P.pid
JOIN ReasonForVisit R ON E.eid = R.eid
WHERE R.reason = 'Delivery'

SQL provides the AVG, SUM, MIN, MAX and COUNT aggregate statistic functions.  In HQMF, to get to SUM(), what you do is create a <measureObservation> element that contains a <derivationExpr> element describing what should be accumulated, and references to the appropriate criteria elements from which the observation is computed.  I've used the criteria reference elements in the example below.  In the original HQMF this would just have been an encounter element.

<observation classCode="OBS" moodCode="DEF">
  <id root="b421c8a3-7949-11de-8a39-0800200c9a66"/>
<sourceOf typeCode="DRIV">
      <id root="b421c8a9-7949-11de-8a39-0800200c9a66"/>

In the example above, for Query Health and HQMF, we can completely drop the sourceOf element.  That's because the HQMF document supports definitions of local variables for all data criteria elements.  I'd also change the name from observation to something else with more meaningful name.  Perhaps accumulatedValue, with classCode fixed to OBS, and moodCode fixed to DEF.

  <id root='b421c8a3-7479-11de-8a39-0900200c9a66'/>

This is greatly simplified.  Now, if you want to do more than count things, you can do so.  One of the challenges that I tried to address in Simple Math is what kind of expression should go into derivationExpr. HQMF doesn't actually address the language syntax.  The HL7 preferred language for this kind of stuff is GELLO, but frankly, I have a hard time with GELLO.  I have to translate this expression into something that is executable in Java, C#, SQL or XQuery.  I don't have the luxury of being able to install a GELLO interpreter for that purpose (can you imagine trying to convince a data center team that it is OK to install a GELLO intepreter in a data center that has to have 5 9's availability?).  The same is true for any other complex language, what we really need is Simple Math.

The next issue that comes up is what if you want to sum things only if certain conditions are met, just as you want to count things where certain conditions are met.  To do that, you need to add a <precondition> element to the <accumulatedValue> element, and that needs to support the same kinds of preconditions that we count with. This lets me attach the precondition to the encounter that says "it must be an encounter for a newborn delivery."

So now, I could also compute the number of central line days and the number of central line infections for a time interval by computing the sum of central line days for each patient, and the number of separate central line associated infections for the patient for a one month period.  The results of the HQMF would give the necessary data to compute the CLABSI infection rate.

It seems obvious how to compute when there is one variable.  Even the case above is simply two different summations over a single patient related variable.  Moving up the scale, you might even want to compute standard deviation and variance.  That's also straightforward.  If you have COUNT(x), SUM(x) and SUM(x2), you can compute standard deviation and variance, by simply expanding the formula:  VAR(x) = SUM(x - AVG(x))/(n-1)

But as soon as you move into a case where you have multiple variables (necessary for dealing with regression statistics) it gets more challenging. The reason for that is because you aren't necessarily working in the same scale.  What would happen if you had two variables in the expression?

The short answer is that I haven't figured that out yet.  But I will, and when I do, I'll report on that as well.


Post a Comment