Wednesday, November 30, 2011

Loading I2B2 from CDA Documents

As part of my evaluation of models for I2B2, hQuery and CIM, I decided to map from CDA Release 2.0 using  the business rules applied in C83 and the CDA Consolidation guide to the I2B2 Star Schema.  The point of this exercise is to show how an I2B2 data repository would be populated from a collection of CDA documents, and as a result, build the mapping between the I2B2 model and C32 (which also leads to hQuery, since it's model is based on the C32).  While I've based this work on C32 and the CDA Consolidation project, the rules are general enough that they can be applied to a variety of different CDA documents, and they need not conform to the templates for those guides.  The I2B2 Data Repository design documentation (pdf) was essential to this work, and I wish I'd had it when I started on my SQL Proof of concept.  Oh well, I'll have to go back and rework that one later, and it's my fault for not catching up on the summer concert listening.

Here's a table showing my initial mappings.  The first column indicates the I2B2 fact or dimension table.  The second column indicates the field.  The third is an XPath expression giving either the context for the table (for table heading rows), or the data element (relative to the table context element) that appears within the table field.  XPath expressions using the cda: namespace identifier can be found in the CDA schema.  Those with the rim: namespace identifier represent extensions defined by HL7 SDWG on behalf of HITSP to represent the field.  The last column describes either the table or the field within the table based on the I2B2 documentation

To load a CDA document, one would iterate over each document, stopping at the table context points, and create a row of data using the field specifications.  Then each fact or dimension table would be loaded from the unique rows produced.  This an overly simplified description of the algorithm (table load order is important for referential integrity), and that are lot of other details I'll get into later.  First, let's look at the (somewhat simplified) mapping:

Table   Field CDA I2B2 Definition
Observation cda:act|
cda:observation| cda:substanceAdministration|
In healthcare, a logical fact is an observation on a patient. It is important to note that an observation may not represent the onset or date of the condition or event being described, but instead is simply a recording or a notation of something. For example, the observation of ‘diabetes’ recorded in the database as a ‘fact’ at a particular time does not mean that the condition of diabetes began exactly at that time, only that a diagnosis was recorded at that time (there may be many diagnoses of diabetes for this patient over time)
Encounter ID ancestor-or-self::cda:*[@classCode='ENC']
patient visit number
Patient ID //cda:patientRole/cda:id patient number
Concept Code @classCode or cda:code Code for observation of interest (i.e. diagnoses,
procedures, medications, lab test)
Provider ID ancestor-or-self::cda:*[@typeCode='AUT' or @typeCode='PRF'][1]/cda:*/cda:id Practitioner id or provider id
Start/End Date Range cda:effectiveTime Starting and ending date-time of observation
Modifier (computed) Code for modifier of interest (i.e. “ROUTE”, ”DOSE”), note that value columns are often used to hold the amounts such as “100” (mg) or “PO"
Instance ID cda:id Encoded instance number that allows more that one modifier to be provided for each concept_cd. Each row will have a different modifier_cd but a similar instance_num.
Value Type cda:value/@xsi:type Format of the concept
N = Numeric
T = Text (enums/short messages)
B = Raw Text (notes/reports)
NLP = NLP result text
Value cda:value
Location Code ancestor-or-self::cda:*[@typeCode='LOC']/cda:*[@classCode='SDLOC']/cda:id A location code, such as for a clinic
Patient //cda:patientRole Each record in the patient_dimension table represents a patient in the database. The table includes demographics fields such as gender, age, race, etc. Most attributes of the patient dimension table are discrete (i.e. Male/Female, Zip code,
Patient ID cda:id
Vital Status (computed) Contains a code that represents the vital status (alive or dead) of the patient and the precision of the vital status data.
Birth Date cda:patient/cda:birthTime
Death Date cda:patient/rim:deceasedTime
Gender cda:patient/
Age (computed)
Language cda:patient/
Race cda:patient/cda:raceCode
Marital Status cda:patient/
Religion cda:patient/
Zip Code cda:addr/cda:zip
StateCityZipCode cda:addr/(cda:state|cda:city|cda:zip)
Provider //(cda:author|cda:performer) Each record in the provider_dimension table represents a physician or provider at an institution. The provider_path is the path that describes how the provider fits into the institutional hierarchy. Institution, department, provider name and a code may be included in the path
Provider ID cda:id
Provider Name cda:name
Encounter //cda:*[classCode='ENC'] The visit_dimension table represents sessions where observations were made. Each row represents one session (also called a visit, event or encounter.) This session can involve a patient directly, such as a visit to a doctor’s office, or it can
involve the patient indirectly, as in when several tests are run on a tube of the patient’s blood. More than one observation can be made during a visit. All visits must have a start date/time associated with them, but they may or may not have an end date. The visit record also contains specifics about the location of the session, such as the hospital or clinic the session occurred, and whether the patient was an inpatient or outpatient at the time of the visit.
Encounter ID cda:id
Patient ID ancestor-or-self::cda:*[
  typeCode='SBJ' or
  typeCode='RCT' ]/cda:*/(cda:id|rim:id)[1]
Active Status cda:statusCode
Start/End Date cda:effectiveTime
Encounter Type Code cda:code
Location Code ancestor-or-self::cda:*[typeCode='LOC']/cda:*[classCode='SDLOC']/cda:id

Now for some comments on it...

Concept Codes
You'll need to look at both the "act" classCode attribute, and maybe the code element within the act, and map that to the I2B2 ontology to figure out how to populate the concept code.

Modifier Codes
In I2B2, a single fact can have multiple parts.  Each part of the fact is identified by the Instance identifier, and the part being represented (e.g., medication, Dose, route or frequency for a medication) can be separately represented.  In CDA, the "fact" is represented by one of the basic "act" classes, and the properties of that class represent each of the fields.  So, some acts will need to be represented as several facts (e.g., medications), while others (e.g., a lab result), will just be represented as a single fact.  This shouldn't be too hard to understand.

Value and Value Type
I2B2 has four different basic value types.  CDA has a few more that need to be mapped into the SQL tables.  Also, I2B2 has different columns in which each value type is placed.

Location Codes [sic]
In the I2B2 schema, location codes really identify specific locations, and so are identifiers, not codes.  Thus my mapping to cda:id for a specific location.  Locations are set in the document context for each observation, and apply unless overridden later in the document (a rare occurence).

A CDA document is "documenation of" an "encompassing encounter".  Usually, what is recorded in the  document with respect to the encounter and its location applies to everything in the document (it's part of the context of the document).  That could be overridden subsequently in the document, indicating that the fact was a component of a different encounter that had a different location participant, but again, that is usually not the case.


Usually, the "author" of the document is also the performing provider, but again, that can be overridden with a performer participant in the encounter (there are several types of performers as well).

So, if you wanted to load a CCD document into an I2B2 data repository, this is enough to get you started.

My next task is to look at the NQF HQMF documents created by the Measure Authoring Tool, and see what I interpreted incorrectly, and see how well my transforms work against it, and comment on its structure.  While HQMF may be the right standard to represent queries, we will need implementation guidance given on how to represent queries in the Query Health environment.  The IHE Quality Measure Definition (ftp to Word document) profile might be one source for that guidance, and I've been drafted to help on that profile.  I'll  certainly be taking what I learn from this project into that one.

Tuesday, November 29, 2011

Query Health Face to Face

I'm spending today at the Query Health Technical Committee face to face meeting.  We spent the morning deciding on how to "build" the code that will be used to pilot the technologies.  I saw this slide for the first time this morning:

It means that the work I've been doing demonstrating implementation of HQMF for the last few weeks has been largely successful.  There are now four  major coding tasks.  One involves the PopMedNet policy framework, another involves the hQuery back-end, and the other the I2B2 back-end.  The fourth piece, which Sean Nolan labeled the "Keith Stream" of work (and which I heard as "Keith's Dream", which is a pretty good interpretation), is transformation of HQMF queries into hQuery or I2B2 implementations which can then be run against either back end.

The intention is that HQMF be the standardized, model based query specification, with an option for use of  I2B2 or hQuery when something really convoluted might be needed.   Both hQuery and I2B2 have query builder interfaces (whose output might be transformable to HQMF), and  NQF is working on some sort of HQMF Editor (called the Measure Authoring Tool) the  as well.  I may build some front-end transformations so that users familiar with the existing tools could generate an HQMF in those interfaces that could run against either back-end.

I also have some hope of designing (and maybe even building) an XQuery based implementation, but that would require more resources that I currently have at this point in time.

After lunch, we started talking about the plugin model.  It was then that I became very thankful that I no longer code up web user interfaces.

This afternoon, we will be discussing some details about information models and standards mapping to code components.  One of the most fun conversations we had on models this morning was how we needed to look at an I2B2 *-schema, the hQuery GreenC32 Javascript object model, and the S&I Framework CIM through hazy glass to see what silhouette emerges.  That's exactly what I was talking about in Models for Query Health.  Given the way that CDA, CCD, and C32 have permeated throughout healthcare IT in the US, I'm pretty sure what I'll see  (it's gotten to the point that I can map to C32 in my sleep or sniff out a C32-based data model blind-folded).

Updated to reflect correction on the Measure Authoring Tool.

Monday, November 28, 2011

Some notes on HL7 GreenCDA

If you read John Halamka's blog discussing the November HIT Standards Committee meeting, he spent a good deal of time talking about Green CDA.  Much of the discussion I'm seeing now has to do with the use of "Green CDA" on the wire.  The HL7 Structured Documents workgroup created a position statement back in February on that topic that encouraged experimentation.

The debate of Green CDA on the wire (both within HL7 and without) is something I've discussed previously.  

If you read the HL7 project description for Green CDA, you'll see where it talks about how it is two things:

  1. A description of a process to produce an XML-based API for creating CDA documents
  2. The output of executing that process (largely through manual efforts).
The process to date is a manual one, with the intention that there could be tools that support automation of it.  This is one of the goals of the ONC sponsorship of the MDHT project, or was at least when they started.  But right now, the only published "Green CDA" implementation is the one developed to demonstrate the process in the HL7 Implemenation Guide.  I don't know where MDHT is at with respect to creating a "Green CDA" specification, but I suspect it is still a release or two away.  

As a process, Green CDA is not well defined enough to enable automation.  There will be a lot of discovery about what needs to be done to automate.  As an implementation, the sample Green CDA implementation is OK for representing many HITSP C83 constructs, but isn't up to date with the CDA Consolidation guide.  I'd hate to see us put more manual effort into creating another manual implementation, because it would really not move the industry forward.

Green CDA takes what HL7 learned from the CCR wars to heart, making the implementation of CDA easier for developers.  But these days, I don't think it goes far enough.  I have my own idea about what will make it easier for developers.  I think we could even combine the "greening" of the models with the development of an HTML5-based CDA implementation that would be even easier for developers to understand and create.

I think the HIT SC either went too far, or not far enough.  If we really are going to spend the next 9-12 months designing a format for the next generation of Health IT solutions that could be deployed under Meaningful Use Stage 3, why not take a significant leap forward, instead of remaining with what we have learned thus far.  After all, if you are going to introduce "breaking incompatibilities" with current Health IT solutions, why not get something really worthwhile out of it.

I can think of nothing more interesting to work on (even Query Health is a distant second).


P.S.  My own guess about the ONC response to the HIT SC Green CDA recommendation, which you can take with a grain of salt, is that the MU Stage 2 rule will NOT recommend Green CDA, but that it WILL reference the CDA Consolidation work.   Let's come back to this in 3 months and see if I was right.

P.P.S.  Now that CDA Consolidation guide is out in draft form, my next big project is a gap analysis between it, and the HITSP C32/C83/C80 set of documents.

Friday, November 25, 2011

Will Healthcare ever grow up?

My daughters are both growing up.  Yesterday morning, my eldest attended a high school football game as a guest member of the marching band.  They asked the Middle school flag teams to march with the band for Thanksgiving day. As she grows up, she is learning more and more about how to care for herself (and others).  Afterwards, she was cooking a dish for the thanksgiving meal we are about to share with friends.

She knows that she needs to see a doctor regularly, she knows how to ask for her records, and how to read them to some degree.  Here time horizon is also expanding.  We played Monopoly last night, and as she kept running short on funds, she kept thinking about the kinds of financial decisions she'd be making on her own in college ... hmm, pizza now, or groceries for the week?  In short, she's beginning to learn what she needs to support herself as an adult.

Over the last hundred years, healthcare has advanced from something that only the rich could really afford, to something that was considered to be an essential component of every person's basic needs.  Over that same time, the costs of healthcare have grown, seemingly without bound, and it is now regressing back to something that only the rich and/or well-employed (and healthy) can really afford.

I find myself wondering what kind of healthcare system my daughters will have to teach their children about.  Will it be something that they, as digital natives, find familiar and reassuring and part of their usual lives, or will it be even more frustrating, expensive and cumbersome than it is today?  I'm hoping for the former, but it is so hard to predict where we will be in twenty years.

Our political leaders often cannot seem to think beyond the next election, our financial sector cannot even seem to think beyond the next quarter or maybe as far as the next year, and our healthcare system within the 18-month average horizon that an insurer needs to care about a patient, or the deadline for the next regulatory hurdle affecting payment?

These are the time frames in which children think, next month, next Christmas, next birthday, next school year, when I get to the next school, et cetera.  As a parent, I had to start thinking about college for my kids when they were born.  That is the kind of time frame that adults have to think in.  What will healthcare leaders be thinking like when my daughter is an adult?  Like her, or like her children?

I'll keep pushing because our healthcare system needs adults right now to keep pushing.  And if we keep it up, eventually it might just grow up, just like children do.  I'm optimistic, but still not certain.

Wednesday, November 23, 2011

Learn about ONC-Sponsored Model-Driven Health Tools Project

I love it when the Government gets involved in something worthwhile. This is definitely one of those cases, and I've been an MDHT fan-boy for quite some time...

The Office of the National Coordinator for Health Information Technology
A government/private sector open source collaboration was recently formed between the Office of the National Coordinator for Health Information Technology (ONC) at the U.S. Department of Health and Human Services (HHS), Veterans Health Administration (VHA), IBM Research (NYSE: IBM), and Open Health Tools. This project focuses on simplifying standards-based health information exchange.  
The Model-Driven Health Tools (MDHT) project leverages models of existing standards to produce integrated and accessible guides along with actual working and tested software supporting the standard.  This approach replaces current industry practice requiring each participating company or organization to interpret the standard and independently develop the standards.  The result lowers both cost and hurdles preventing organizations from participating in electronic health information exchanges.
Additionally, MDHT supports a full lifecycle standards development process providing a solution that allows standards designers and implementers to actively collaborate.  The agile process enables designers to quickly validate all aspects of the standards rules as practical and feasible before they become normative standards.

ONC's participation, which began in January 2011, was spearheaded by Dr. Doug Fridsma, Director of the Office of Standards and Interoperability within ONC. The MDHT Project 1.0 release was a response to a 2008 request for initiation by the VA and VHA. This release is the first of many sets of standards the group intends to create.
MDHT currently supports the Meaningful Use Standard, Healthcare Information Technology Standards Panel (HITSP) Patient Summary Document (C32) and the Consolidated CDA Project.

MDHT also provides support to other ONC S&I Framework Initiatives, including the Transitions Of Care and the Consolidated CDA initiatives.
For more information about MDHT or to get involved, please visit

HHS logo

"Greening" the HQMF

So my next project is making HQMF easier to create and read.  I started with an HQMF I wrote to support NQF Measure 59, Poor A1C Control.

The first set of steps simplify the document in ways that ensure compliance with the HL7 tooling.  Subsequent steps will "green" it in ways that current tooling don't support.

Named Sections
The first step was to name the required and optional sections in the XML.  I started with something like this:

      <code code="34089-3" codeSystem="2.16.840.1.113883.6.1"/>
      <title>Measure Description Section</title>
      <text>This is a description of the measure.</text>
And modified it to something like this:

      <title>Measure Description Section</title>
      <text>This is a description of the measure.</text>

The code attribute for the section can be fixed, and thus omitted from the XML, since it is implied by the section name.  So, I added MeasureDescriptionSection, DataCriteriaSection, PopulationCriteriaSection and MeasureObservationSection elements.

Component and Definition Relationships
Within the DataCriteriaSection, there were a lot of <sourceOf typeCode='COMP'> elements.  I reduced that to <component>.  There were also a lot of <sourceOf typeCode='INST'> elements which I reduced to <definition>

Within each <component> there were numerous criterion, expressed as HL7 acts:
<act classCode='ACT' moodCode='EVN' isCriterionInd='true'> 
<observation classCode='OBS' moodCode='EVN' isCriterionInd='true'> 
<supply classCode='SUP' moodCode='SUP' isCriterionInd='true'>
<substanceAdministration classCode='SBADM' moodCode='EVN' isCriterionInd='true'>
<procedure classCode='PROC' moodCode='EVN' isCriterionInd='true'>
<encounter classCode='ENC' moodCode='EVN' isCriterionInd='true'>

These were simplified (using defaults for classCode and moodCode) to:

Measure Parameters
Some of the data criteria were actually measure parameters.  These were in in event mood, but did not have isCriterionInd set to true.  These were always observations (the only HL7 Act having a value).  I realized also that these aren't just components, but are specifically control variables for the measure, and so there is a better act relationship: Has Control Variable.  In the HQMF XML, this would change typeCode from COMP to CTRLV, and in these cases, component became controlVariable.

So this:
      <entry typeCode="COMP">
        <observation classCode="OBS" moodCode="EVN">
          <code code="52832-3" codeSystem="2.16.840.1.113883.6.1"/>
          <value xsi:type="TS" value="20100101"/>
          <code code="52832-3" codeSystem="2.16.840.1.113883.6.1"/>
          <value xsi:type="TS" value="20100101"/>

Next were references to criteria and definitions.  The same XML was used as for components and definitions, but they included only a single <id> element that pointed to the actual act being referenced.  These became:

Precondition Conjunctions
Preconditions can be joined with conjunctions, specified in the conjunctionCode element.  There are three different types:  AND, OR, and XOR.  I added ANDprecondition, ORprecondition and XORprecondition elements which fixed the value of conjunctionCode to AND, OR and XOR respectively.

Named Criteria

Several of the criteria in PopulationCriteriaSection and MeasureCriteriaSection are identified using HQMF specified codes, including the initial patient population, the numerator, denominator, denominator exceptions, measure criteria, and classifers I created yesterday.So, I created named model elements to represent those, enabling me to fix (and thus drop) the code and value elements from these observation:

Everything I've done thus far is supported by the HL7 modeling tools (as far as I know), and most of it is consistent with the HL7 methodology.  At this stage, the HQMF file is reduced by about 10% by line count, and 30% by file size, just to give some metrics supporting how much simplification has occurred.

I suspect that I'll get a little bit of grief about "fixing" element values, but from a modeling perspective, it is certainly comprehensible.  The next set of changes go one step further, and are more in line with "Green" CDA.  In this, I start to combine act relationships and acts when it makes sense, and restructuring the XML in ways that is transformable to the HL7 representation, but is not consistent with the current methodology.

Local Variable Names
There are a double dozen (or more) variable names associated with each precondition.  Since these are simple strings, I just made the localVariableName a "name" attribute on the element to which it applied.  I can now make these be of the ID type to ensure that variable names were unique.

Act Relationships merged with Acts
Sections are always components of the document.  For the named sections, I dropped the component element, as it is implied by the named section element.

The observationParameter element always appears inside a controlVariable element.  So, I simply drop the observationParameter element and moved it's children inside the controlVariable element.

The various criterion elements are always components, so I moved the component children down into the criterion elements, and dropped the component element.

The definition elements are also implied by their content, so I dropped those.  All definition and criteria reference element just contain a single id element pointing back to the definition or critieria in the measure critiera section.  So, I shifted the id attributes (root and extension) to the reference element itself.  So this:

  <id root="0" extension="ageBetween17and64"/>
<observationCriterionRef root="0" extension="ageBetween17and64"/>

ID/IDREF or Global Identifiers?

I debated mentally with myself upon whether I should use ID/IDREF for these references or not.  HL7 act references are done by the instance identifier for the act, and these are globally unique.  When using  ID/IDREF the identifiers are only unique within the document.  Observation criteria references clearly point back inside the measure definition document, and ID/IDREF would allow the schema to ensure that the criteria point back to the appropriate thing.  But it wasn't clear to me whether that would also be true for definitions, because one could refer to a definition that was defined outside of the scope of the HQMF document.  If I shifted to IDREF for pointing, the definition reference could just be a single attribute on the precondition in which it appeared.  Since I was using local variables extensively, I realized that there was actually a lot of value to this.  Then I also realized that there is a URI representation of the II data type, so I could actually use key and keyref to support BOTH, and that decided it for me.

I had a couple cases where I had an act precondition that simply served as a grouper for other preconditions.  In that case, clearly I could drop that act precondition, so I did.

For others, what I realized was that a precondition was either a reference to an atomic criterion, or a collection of other preconditions using the AND/OR/XOR conjunctions.  All the preconditions of the same type could have their references merged into a single precondition element.  Since the criteria could be referenced using IDREFS, I could also merge them via an attribute.  But I forgot about negation on preconditions.  So, I crossed precondition types with negationInd on precondition and came up with the following new names:

It was tempting to use IDREFS with these values as attribute names, but I needed to be able to combine preconditions [e.g., to handle (A AND B) OR (C AND NOT(D))], so I left the preconditions as elements.

I found a number of cases where I had a single <AllTrue> element appearing with a single criteria reference inside it.  That wasn't worth maintaining, so I dropped the wrapping element.

In the Classifier criteria entries, the criteria would always show up inside an <OnlyOneTrue> element.  I could also safely drop that.

Definition references were singlular in each criterion (when present), so I could drop the reference and use a definition attribute to point the applicable model definition defined in the measure, following the same pattern as criteria references.

In shifting to this key/keyref strategy, the definition reference element names were dropped in favor of a single definition attribute on the criterion.

The DenominatorExceptionCriteria now looks like this:

      <observationCriterionRef ref="HasPolycysticOvaries"/>
        <observationCriterionRef ref="HasDiabetes"/>
    <observationCriterionRef ref="HasSteroidInducedDiabetes"/>
    <observationCriterionRef ref="HasGestationalDiabetes"/>

Either the patient has PolycysticOvaries and not Diabetes, or they have Steriod Induced or Gestational Diabetes.  That's a lot simpler to read than:

<observation classCode="OBS" moodCode="EVN" isCriterionInd="true">
  <id root="c75181d0-73eb-11de-8a39-0800200c9a66"/>
  <code code="ASSERTION" codeSystem="2.16.840.1.113883.5.4"/>
  <value xsi:type="CD" code="DENEXCEP" codeSystem="2.16.840.1.113883.5.1063"
    codeSystemName="HL7 Observation Value" 
    displayName="Excluded from Denominator"/>
  <sourceOf typeCode="PRCN">
    <conjunctionCode code="OR"/>
    <act classCode="ACT" moodCode="EVN" isCriterionInd="true">
      <sourceOf typeCode="PRCN">
        <conjunctionCode code="AND"/>
        <observation moodCode="EVN" classCode="OBS" isCriterionInd="true">
          <id root="0" extension="HasPolycysticOvaries"/>
      <sourceOf typeCode="PRCN" negationInd="true">
        <conjunctionCode code="AND"/>
        <observation moodCode="EVN" classCode="OBS" isCriterionInd="true">
          <id root="0" extension="HasDiabetes"/>
  <sourceOf typeCode="PRCN">
    <conjunctionCode code="OR"/>
    <observation moodCode="EVN" classCode="OBS" isCriterionInd="true">
      <id root="0" extension="HasSteroidInducedDiabetes"/>
  <sourceOf typeCode="PRCN">
    <conjunctionCode code="OR"/>
    <observation moodCode="EVN" classCode="OBS" isCriterionInd="true">
      <id root="0" extension="HasGestationalDiabetes"/>


There are some common patterns for representing the medication participant in substanceAdministration and supply criteria.  The participant is either a product or consumable, and the role is therapeutic substance or manufactured material.  For the most part, what we care about is the code.  So, I've simplified both of these to <medication> and moved the attributes of the code on the entity up to that element.

Thus far, the greening reduces the line count by about 55% and file size by 65%.  You can get to the before, middle and after examples in this zip file (on Google Docs).

Additional Work
There are a couple more refinements I'd make here, but these are just thoughts that I haven't executed on yet:

<value> elements in the criteria have to specify a type.  That's error prone in implementations because type names have to be specified by namespace.  I'd prefer to see element names like valueTS, valuePQ and valueCD so that implementations don't have to check type in complex ways.  We don't need every data type as a choice for this, because CE derives from CD, et cetera.  In criteria, valueTS and valuePQ would use the IVL_TS and IVL_PQ data types respectively, because criteria specify boundaries.

IVL_PQ has a couple of different ways to specify the units.  From a best practices perspective, I hate intervals where the lower bound is specified in a different unit than the upper bound.  It makes for error prone implementations.  So, I'd drop the unit from the high/low components of the IVL_TS type, and shift it to the parent element, forcing implementers to use a single unit attribute.

I note that expressions of time in the queries are often in relationship to the start or end date.  There's some more opportunity for simplification there.

Also, the fact that expressions have to have a nullFlavor="DER" attribute is another place where simplification can occur. And I still need to deal with expression syntax.

And then I need to show that the same implementations I've done previously will run over this XML (which they will, I just need to prove it).

A Thanksgiving Post -- Engaging with Grace

Many of you have already headed off to see family this weekend.  Some of you will be on your way shortly.  As you travel, I'd like you to think about the following five questions, and then share your answers with loved ones:

For more information, or to pass this along to a colleague, friend or loved one, see

Tuesday, November 22, 2011

Classifying Results in HQMF

As I said in yesterday's post I'd be thinking more about how to automate classifications in a measure so that results could be classified by different grouping, e.g., by age group and gender.  It didn't take me that long to figure it out, and I have to admit I thought that problem would be much more challenging.

What I wound up doing was adding a new kind of entry to the HQMF format to identify a classifier group.  Each precondition in that entry would describe a single, exclusive group in which the patient belonged.  I've already proposed that this be added to the HQMF format.

<observation classCode="OBS" moodCode="EVN" isCriterionInd="true">
  <id root="c75181d0-73eb-11de-8a39-0800200c9a66"/>
  <code code="ASSERTION" codeSystem="2.16.840.1.113883.5.4"/>
  <value xsi:type="CD" code="CLASSIFIER" codeSystem="2.16.840.1.113883.5.1063"
    codeSystemName="HL7 Observation Value"
    displayName="Measure Group Classifier"/>
  <sourceOf typeCode="PRCN">
    <conjunctionCode code="XOR"/>
    <observation classCode="OBS" moodCode="EVN" 
      <id root="0" extension="ageBetween17and21"/>
  <sourceOf typeCode="PRCN">
    <conjunctionCode code="XOR"/>
    <observation classCode="OBS" moodCode="EVN" 
      <id root="0" extension="ageBetween22and29"/>

This looks just like the definition of numerator or denominator criteria, which means that classifiers can be as complex as selectors for members of a population, numerator, denominator, et cetera. You can have as many CLASSIFIER entries as you need to group your results by.  In the case I worked with, I just added age and gender to the original HQMF I built for the HbA1C measure (NQF Measure 59).

The hQuery implementation was pretty easy to modify.  I just used the same code I already had to create a variable that would include each patient in the classified group.  That looks something like this:
  var ageBetween17and21= patient.Demographics().match(
    new CD("424144002","2.16.840.1.113883.6.96), null, 
    new IVL(new PQ("17","a"),new PQ("21","a")));
  var ageBetween22and29= patient.Demographics().match(
    new CD("424144002","2.16.840.1.113883.6.96), null, 
    new IVL(new PQ("22","a"),new PQ("29","a")));
  var ageBetween60and64= patient.Demographics().match(
    new CD("424144002","2.16.840.1.113883.6.96), null, 
    new IVL(new PQ("60","a"),new PQ("64","a")));
  var genderMale= patient.Demographics().match(
    new CD("263495000","2.16.840.1.113883.6.96), null, 
    new CD("M","2.16.840.1.113883.5.1));
  var genderFemale= patient.Demographics().match(
    new CD("263495000","2.16.840.1.113883.6.96), null, 
    new CD("F","2.16.840.1.113883.5.1));

The next step was enable the classification of the results.  The new emit statements are shown in bold below:
  if (population(patient)) {
    if (ageBetween17and21.contains(patient)) {
    else if (ageBetween22and29.contains(patient)) {
    else if (ageBetween60and64.contains(patient)) {
    if (genderMale.contains(patient)) {
    else if (genderFemale.contains(patient)) {
    emit("population", 1);
    if (denominator(patient)) {
      if (numerator(patient)) {
        emit("denominator", 1);
        emit("numerator", 1);
      } else if (exclusion(patient)) {
        emit("exclusion", 1);
      } else {
        emit("denominator", 1);

Essentially what happens is that for each patient in the population, the hQuery script will output a value indicating the age and gender classification.  The reduce function can then handle consolidating the correlated results for population, numerator, denominator and exclusions.

In the XQuery implementation, I create classifier functions just like I do for other numerator and denominator criteria.  Here is an example of one of those for gender:

 declare function local:genderMale($ids, $ccds) {
                for $d in local:Demographics($ids, $ccd)
                where local:matches($e, '',<observation xmlns="urn:hl7-org:v3" classCode="OBS" moodCode="EVN" isCriterionInd="true">
      <id root="0" extension="genderMale"/>
      <code code="263495000" codeSystem="2.16.840.1.113883.6.96" displayName="Gender"/>
      <value xsi:type="CD" codeSystem="2.16.840.1.113883.5.1" code="M"/>
      <sourceOf typeCode="INST">
         <observation classCode="OBS" moodCode="DEF">
            <id root="0" extension="Demographics"/>
                return $e//cda:recordTarget/cda:id

Then, when I go to output the results, instead of just generating the complete list of identifiers for population, numerator, et cetera, I apply the classifiers to segment the lists, using the XQuery intersect operator.  Here is an example of what gets generated:

declare function local:denominator($ccds, $ids) {
  let $result0 = ((local:HasDiabetes($ccds, $ids)) and (local:EDorInpatientEncounter($ccds, $ids) or local:AmbulatoryEncounter($ccds, $ids)) or local:DiabetesMedAdministered($ccds, $ids) or local:DiabetesMedIntended($ccds, $ids) or local:DiabetesMedSupplied($ccds, $ids) or local:DiabetesMedOrdered($ccds, $ids));
  let $resultageBetween17and21 = local:ageBetween17and21($ccds,$ids);
  let $resultageBetween22and29 = local:ageBetween22and29($ccds,$ids);
  let $resultageBetween30and39 = local:ageBetween30and39($ccds,$ids);
  let $resultageBetween40and49 = local:ageBetween40and49($ccds,$ids);
  let $resultageBetween50and59 = local:ageBetween50and59($ccds,$ids);
  let $resultageBetween60and64 = local:ageBetween60and64($ccds,$ids);
  let $resultgenderMale = local:genderMale($ccds,$ids);
  let $resultgenderFemale = local:genderFemale($ccds,$ids);
  <group name="ageBetween17and21">
    <group name="genderMale">
     { return $result0 intersect $resultageBetween17and21 intersect $resultgenderMale; }
    <group name="genderFemale">
     { return $result0  intersect $resultageBetween17and21 intersect $resultgenderFemale; }
  <group name="ageBetween60and64">
    <group name="genderMale">
     { return $result0  intersect $resultageBetween60and64 intersect $resultgenderMale; }
    <group name="genderFemale">
     { return $result0  intersect $resultageBetween60and64 intersect $resultgenderFemale; }

Finally, the SQL.  For that, I created a view for each classifier that had two columns.  The first column was the patient identifier, and the second was the result of the classification.  Originally I was going to fill the second column using a SQL CASE statement, but I realized that each classification could be complex, and I wasn't sure that CASE would be a general enough solution.  So, I modified the view to be the UNION of SELECTS which matched each classifier in a classification, where the second column of the SELECT was just a fixed value.  The SQL for the Gender VIEW looks like this:

/* genderMale*/
    WHERE (
        CODE = 'M' AND CODESYSTEM = '2.16.840.1.113883.5.1'    )
 UNION /* genderFemale*/
  SELECT DISTINCT PATIENTID , 'genderFemale' CLASS FROM Demographics
    WHERE (
        CODE = 'F' AND CODESYSTEM = '2.16.840.1.113883.5.1'    )
You can see in this example how CASE would simplify the above view, but then there's no easy way to combine multiple criteria with a single classifier when the criteria needs to access two different tables.  Here's an example where the classification critieria puts men in group 1, and women without gestational diabetes in group 2:

/* group1*/
    WHERE (
        CODE = 'M' AND CODESYSTEM = '2.16.840.1.113883.5.1'    )
/* group2 */
 (/* genderFemale */
    WHERE (
        CODE = 'F' AND CODESYSTEM = '2.16.840.1.113883.5.1'    )
  /* !HasGestationalDiabetes*/
        CODE IN (SELECT CODE FROM VALUESETS WHERE ID = '2.16.840.1.113883.3.464.1.67')  
        AND EFFECTIVETIME &gt; @StartDate AND EFFECTIVETIME &lt; @EndDate    )

Having created these views, I alter the generation query so that it joins with each classification table, and outputs the classification column (which I rename).  Then I group and order the results based on the classifications, which gets the counts grouped and ordered as well.  Here's the output SQL:

    Age.CLASS AS Age, Gender.CLASS AS Gender, 
GROUP BY Age, Gender
ORDER BY Age, Gender

So, there you have it.  For now, this completes the different features that I'm going to try to implement using HQMF to support Query Health.  I think I've successfully shown that HQMF is powerful enough to represent a variety of different queries that are of interest.  My next project is to see what can be done to make HQMF a more friendly specification for us to work with, and I'll provide that as input into the HL7 update of the DSTU.