Thursday, November 17, 2011

SQL Implementation for QueryHealth

This is the third in a series of implementations showing that the HL7 EQMF Standard can support multiple back end query implementations.  This implementation uses SQL to produce the query results.  I'm glad I took on this one last because:
  • My SQL chops are rusty, and this was the hardest to implement yet
  • I wouldn't have been able to get here without having determined that I could use a simple database model after the first two implementations.
  • I HATE AM NOT TERRIBLE FOND OF SQL
A couple of caveats.  Like the XQuery implementation, this one hasn't been tested against a SQL Database.  Also, given the many variations in transactional SQL, I won't guarantee this to be totally valid for your SQL database server.

First off is value sets.  Rather than bore you with 20 pages of output, I'll just point you to my post on Value Sets and Query Health which provides a not quite correct stylesheet to insert values into a SQL Table from an SVS XML result.  Can you spot the error in the XSLT? 

Next is parameter declarations.  This was pretty simple, but like most of the SQL transform, LONG:

  <xsl:template name="generateParameterList">
    <xsl:text>&#xA;# Generate named variables&#xA;</xsl:text>
    <xsl:for-each
      select="$DataDeclarationSection//emf:entry[emf:localVariableName != '']
             /emf:observation[@moodCode='EVN' and not(@isCriterionInd='true')]">
      <xsl:variable name="name" select="../emf:localVariableName"/>
      <xsl:text>DECLARE @</xsl:text>
      <xsl:value-of select="$name"/>
      <xsl:choose>
        <xsl:when test="emf:value/@xsi:type='TS'"> TIMESTAMP;&#xA;</xsl:when>
        <xsl:when test="emf:value/@xsi:type='ST'"> TEXT;&#xA;</xsl:when>
        <xsl:when test="emf:value/@xsi:type='PQ'">
          <xsl:text> REAL;&#xA;</xsl:text>
          <xsl:text>DECLARE @</xsl:text>
          <xsl:value-of select="$name"/>
          <xsl:text>_unit TEXT;&#xA;</xsl:text>
        </xsl:when>
        <xsl:when test="emf:value/@xsi:type='CD'">
          <xsl:text> TEXT;&#xA;</xsl:text>
          <xsl:text>DECLARE @</xsl:text>
          <xsl:value-of select="$name"/>
          <xsl:text>_cs TEXT;&#xA;</xsl:text>
        </xsl:when>
        <xsl:otherwise>;&#xA;</xsl:otherwise>
      </xsl:choose>
      <xsl:text>SET @</xsl:text>
      <xsl:value-of select="$name"/>
      <xsl:text> = </xsl:text>
      <xsl:choose>
        <xsl:when test="emf:value/@xsi:type='TS'">
          <xsl:text> '</xsl:text>
          <xsl:value-of select="emf:value/@value"/>
          <xsl:text>';&#xA;</xsl:text>
        </xsl:when>
        <xsl:when test="emf:value/@xsi:type='ST'">
          <xsl:text> '</xsl:text>
          <xsl:value-of select="emf:value"/>
          <xsl:text>';&#xA;</xsl:text>
        </xsl:when>
        <xsl:when test="emf:value/@xsi:type='PQ'">
          <xsl:text> </xsl:text>
          <xsl:value-of select="emf:value/@value"/>
          <xsl:text>;&#xA;</xsl:text>
          <xsl:text>SET @</xsl:text>
          <xsl:value-of select="$name"/>
          <xsl:text>_unit = </xsl:text>
          <xsl:text> '</xsl:text>
          <xsl:value-of select="emf:value/@unit"/>
          <xsl:text>';&#xA;</xsl:text>
        </xsl:when>
        <xsl:when test="emf:value/@xsi:type='CD'">
          <xsl:text> '</xsl:text>
          <xsl:value-of select="emf:value/@code"/>
          <xsl:text>';&#xA;</xsl:text>
          <xsl:text>SET @</xsl:text>
          <xsl:value-of select="$name"/>
          <xsl:text>_cs = </xsl:text>
          <xsl:text> '</xsl:text>
          <xsl:value-of select="emf:value/@codeSystem"/>
          <xsl:text>';&#xA;</xsl:text>
        </xsl:when>
        <xsl:otherwise> = '';&#xA;</xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>
  </xsl:template>

The output looks like this:
DECLARE @StartDate TIMESTAMP;
SET @StartDate =  '20100101';
DECLARE @EndDate TIMESTAMP;
SET @EndDate =  '20101231';

OK, so that was a lot of XSLT for four lines of output code.  Next up was generating the predicates for the data criteria.  This required a two pass execution in the XSLT.  The first pass produces some XML that I use later in another template to produce output.  Unlike JavaScript and XQuery, there's no simple (and standard) way to create "functions" or non-scalar variables in SQL.  So, I created some XML to store my predicates that looked like this.

<variables>
  <variable name="ageBetween17and64" table="Demographics">
    CODE = '424144002' AND CODESYSTEM =
      '2.16.840.1.113883.6.96' 
    AND VALUE &gt; 17 AND VALUE &lt; 64 AND UNITS = 'a' 
  </variable>
  <variable name="EDorInpatientEncounter" table="Encounter">
    CODE IN (SELECT CODE FROM VALUESETS WHERE ID =
      '2.16.840.1.113883.3.464.1.42') AND EFFECTIVETIME &lt; DATEADD(yyyy, -2,
    @EndDate)
  </variable>
  <variable name="AmbulatoryEncounter" table="Encounter">
    CODE IN (SELECT CODE FROM VALUESETS WHERE ID =
      '2.16.840.1.113883.3.464.1.1142') 
    AND EFFECTIVETIME &lt; DATEADD(yyyy, -2, @EndDate)
  </variable>
  <variable name="HasDiabetes" table="Problem">
    CODE IN (SELECT CODE FROM VALUESETS WHERE ID = 
      '2.16.840.1.113883.3.464.1.37')
  </variable>
  <variable name="HbA1C" table="LabResults"> 
    CODE IN (SELECT CODE FROM VALUESETS WHERE ID =
      '2.16.840.1.113883.3.464.1.72') 
    AND VALUE &gt; 9 AND UNITS = '%' 
  </variable>
</variables>

The stylesheet to create that XML is pretty short:

<xsl:template name="generateVariableList">
    <variables>
      <xsl:for-each
        select="$DataDeclarationSection//
                  emf:entry/emf:*[@isCriterionInd='true']">
        <variable name="{../emf:localVariableName}">
          <xsl:if test="not(../emf:localVariableName)">
            <xsl:attribute name="name">
              <xsl:value-of select="generate-id(.)"/>
            </xsl:attribute>
          </xsl:if>
          <xsl:attribute name="table">
            <xsl:value-of
              select="emf:sourceOf[@typeCode='INST']/*/emf:id/@extension"/>
          </xsl:attribute>
          <xsl:call-template name="criteria">
            <xsl:with-param name="act" select="."/>
          </xsl:call-template>
        </variable>
      </xsl:for-each>
    </variables>
  </xsl:template>

The criteria template turns the criterion in XML into an appropriate SQL WHERE Clause.  It turns a criterion containing a single code into this:
    CODE = '@code' AND CODESYSTEM = '@codeSystem
And a criterion with a value set specified into this:
    CODE IN (SELECT CODE FROM VALUESETS WHERE ID = '@codeSystem') 
And a criterion with a an effective time specified into this:
    EFFECTIVETIME > low AND EFFECTIVETIME < high 
And a criterion with a PQ range specified into this:
    AND VALUE &gt; 17 AND VALUE &lt; 64 AND UNITS = 'a' 
And so on.

You may have observed that somehow my expressions (originally in JavaScript in the hQuery implementation) have somehow magically transformed themselves into appropriate SQL code:

   EFFECTIVETIME < DATEADD(yyyy, -2, @EndDate)

It's a trick, I changed the EQMF sample to work with the SQL implementation.  I still haven't dealt with the expressions, but I think using the mimeType with expressions will be the short term solution.  An EQMF document using expressions may need to be able to specify the expression in multiple mime types to be able to work in multiple formats.  Not a BIG deal right now, but it could be one later.

Having created this XML to maintain the logic associated with my predicates, the next piece of this was to turn the measure population, numerator, denominator, and exclusions into expressions using those predicates.  For this I used views.  Here are two of the simpler ones:

CREATE VIEW POPULATION AS (/* ageBetween17and64*/
  SELECT DISTINCT PATIENTID FROM Demographics
    WHERE (
        CODE = '424144002' AND CODESYSTEM = '2.16.840.1.113883.6.96' AND 
        VALUE &gt; 17 AND VALUE &lt; 64 AND UNITS = 'a'
    )
);
CREATE VIEW NUMERATOR AS (/* HbA1C*/
  SELECT DISTINCT PATIENTID FROM LabResults
    WHERE (
        CODE IN (SELECT CODE FROM VALUESETS WHERE ID = '2.16.840.1.113883.3.464.1.72') AND 
        VALUE &gt; 9 AND UNITS = '%'
    )

If you recognize where I reused the predicates in the views, you might start to get an idea how I generated the SELECT statements.  I reused the processEntries template which walked the expression of the measure criteria in my last two implementations, and tweaked it to use UNION and INTERSECT operators to handle the AND/OR logic.  Negation was a little tricky.  Orginally I had put the entire SELECT clause with its WHERE clause in the variable XML I had generated previously.  But you cannot do a ... UNION NOT(SELECT ...) or ... INTERSECT NOT(SELECT ...).  So, that's why the table name is saved as an attribute in the XML, and just the WHERE clause is stored.  Now I could write ... INTERSECT (SELECT ... WHERE NOT(whereClause)) when I needed negation, and just ... INTERSECT (SELECT ... WHERE (whereClause)) when I didn't.

So here is that template:
  <xsl:template name="processEntries">
    <xsl:param name="entries"/>
    <xsl:param name="depth" select="1"/>
    <xsl:param name="vars"/>
    <xsl:if test="$entries[emf:conjunctionCode/@code='AND' or 
                           not(emf:conjunctionCode)]">
      <xsl:text>(</xsl:text>
      <xsl:call-template name="and">
        <xsl:with-param name="args"
          select="$entries[emf:conjunctionCode/@code='AND' or 
                           not(emf:conjunctionCode)]"/>
        <xsl:with-param name="depth" select="$depth"/>
        <xsl:with-param name="vars" select="$vars"/>
      </xsl:call-template>
      <xsl:text>)</xsl:text>
      <xsl:if test="$entries[emf:conjunctionCode/@code != 'AND']">
        <xsl:text> INTERSECT </xsl:text>
      </xsl:if>
    </xsl:if>
    <xsl:if test="$entries/emf:conjunctionCode/@code=&quot;OR&quot;">
      <xsl:text>(</xsl:text>
      <xsl:call-template name="or">
        <xsl:with-param name="args"
          select="$entries[emf:conjunctionCode/@code='OR']"/>
        <xsl:with-param name="depth" select="$depth"/>
        <xsl:with-param name="vars" select="$vars"/>
      </xsl:call-template>
      <xsl:text>)</xsl:text>
      <xsl:if test="$entries[emf:conjunctionCode/@code = 'XOR']">
        <xsl:text> INTERSECT </xsl:text>
      </xsl:if>
    </xsl:if>
    <xsl:if test="$entries/emf:conjunctionCode/@code='XOR'">
      <xsl:message terminate="yes">THIS DOESN'T WORK!</xsl:message>
    </xsl:if>
  </xsl:template>

 Now, about that message above for XOR conjunctions.  HL7 defines the XOR function differently than other languages.  Exclusive OR over more than two predicates is true if only one of the predicates is true.  Thus, XOR(A,B,C) is true if only one of A, B or C is true.  This is actually pretty useful, but isn't the usual way that XOR is defined (In the usual definition, XOR(A,B,C,...) is true if an odd number of predicates is true). The HL7 definition of this operation translates into (A AND NOT(B OR C OR ...)) OR (B AND NOT(A OR C OR ...)) ... so for N predicates, each one is repeated N times, for N2 predicates in the output.  I could have figured out a way to implement it, but it wasn't fun, and didn't matter.  There are some implementation specific ways that it could have been done.

Here is the EXCEPTIONS view which shows a little bit more complex logic being generated.  It matches patients who have a diagnosis of polycycstic ovaries without an accompanying diagnosis of diabetes, along with patients with diagnosis of steroid induced or gestational diabetes. These are the patients who are not counted in the denominator if they don't meet the numerator criteria.

CREATE VIEW EXCEPTIONS AS 
(
  /* HasPolycysticOvaries*/
  SELECT DISTINCT PATIENTID FROM Problem
    WHERE (
      CODE IN (SELECT CODE FROM VALUESETS 
               WHERE ID = '2.16.840.1.113883.3.464.1.98') AND
      EFFECTIVETIME > DATEADD(yyyy,-1,@StartDate) AND 
      EFFECTIVETIME < @EndDate    
    )
  INTERSECT 
  /* !HasDiabetes*/
  SELECT DISTINCT PATIENTID FROM Problem
    WHERE NOT(
      CODE IN (SELECT CODE FROM VALUESETS 
               WHERE ID = '2.16.840.1.113883.3.464.1.37'
    )    

UNION 
/* HasSteroidInducedDiabetes*/
SELECT DISTINCT PATIENTID FROM Problem
  WHERE (
      CODE IN (SELECT CODE FROM VALUESETS 
               WHERE ID = '2.16.840.1.113883.3.464.1.113') AND
      EFFECTIVETIME > @StartDate AND EFFECTIVETIME < @EndDate    
  )
UNION 
/* HasGestationalDiabetes*/
SELECT DISTINCT PATIENTID FROM Problem
  WHERE (
      CODE IN (SELECT CODE FROM VALUESETS 
               WHERE ID = '2.16.840.1.113883.3.464.1.67') AND
      EFFECTIVETIME > @StartDate AND EFFECTIVETIME < @EndDate    
  )
;


Now, you'll note that there are some opportunities to "optimize" this query.  The last two clauses can be merged to the equivalent:



SELECT DISTINCT PATIENTID FROM Problem
  WHERE (
      CODE IN (SELECT CODE FROM VALUESETS 
               WHERE ID = '2.16.840.1.113883.3.464.1.113'
                  OR ID = '2.16.840.1.113883.3.464.1.67'
      ) AND
      EFFECTIVETIME > @StartDate AND EFFECTIVETIME < @EndDate    
  )


Great, that's an optimization.  But I'm not going to write a SQL Query optimizer in XSLT.


Now, having created the four views, how can I get the results?  There's one more query to write, but like hQuery, this one would be the same for all measures:


SELECT COUNT POPULATIONIDS, COUNT DENOMINATORIDS, COUNT NUMERATORIDS,  COUNT EXCEPTIONIDS FROM
(
SELECT 
    P.PATIENTID  AS POPULATIONIDS ,
    D.PATIENTID AS DENOMINATORIDS,
    N.PATIENTID AS NUMERATORIDS,
    E.PATIENTID AS EXCEPTIONIDS
FROM POPULATION P
LEFT OUTER JOIN DENOMINATOR D
ON P.PATIENTID = D.PATIENTID /* column 2 contains null where X in P and not in D */
LEFT OUTER JOIN NUMERATOR N
ON D.PATIENTID = N.PATIENTID /* column 3 contains null where X in N and not in D */
LEFT OUTER JOIN EXCEPTION E
ON D.PATIENTID = E.PATIENTID AND E.PATIENTID NOT IN (SELECT PATIENTID FROM NUMERATOR) /* column 4 contains a PATIENTID if In the denominator but not found in numerator, NULL otherwise */
)

So, having now done SQL, XQuery and hQuery implementions, the next step as I said is to compute some other interesting queries.  In today's discussions with some members of the IHE QRPH committee, I mentioned the immunization correlation, and we decided that I would add age stratification to that.  So, I think my next measurement effort will be to produce age stratified correlations between patients with and without vaccinations that have had and not had flue.

After that, I need to do some work to figure out:

  • What cleanup and simplification is needed in HQMF
  • What the best practices are for writing a query in HQMF format
  • How to "green" the dang thing
By the way, IHE is profiling use of HQMF for public health measures in the QMD profile (link to Word Document on the IHE FTP site).  Looks like another project I could get sucked into.  The more I look at quality measures, the more opportunities I see for making it easier to both define and implement them.  Gah!  None of this was not on my list of things to accomplish this year.


3 comments:

  1. I think you are missing some of the constraints from the measure. E.g. the numerator select patients for whom the *most recent* HbA1c test result *during the measurement period* was > 9%. Your SQL for the numerator appears to just select any patient with a HbA1c result > 9%.

    ReplyDelete
  2. That should be fixable, but you are correct, I missed that in the measure definition. There are probably a few other errors in my implementation as well. That one is worth investigating to see what the representation should be.

    ReplyDelete
  3. Can you post the example xml input file and xsl transformation file for this post? It would also be illustrative if you could provide the original sourced xml so it can be compared to your updated file. Thanks.

    ReplyDelete