Thursday, November 3, 2011

When the XML Sucks

One of the complaints about HL7 is that their XML sucks.  One of the complaints about ebXML RIM specifications is that their XML sucks.  One of the points of GreenCDA is to get to less suckful XML in the content.  There are a lot of other XML schemas that "suck".

The evaluation of "Sucks" is a subjective measure, and is often an assessment made through the eyes of developers who have not yet drunk a particular flavor of kool-aid, be it HL7 RIM, ebXML RIM, whatever.  For most problems, the requirements imposed by model oriented XML schemas are complex enough that in fact, the XML does suck for the average developer.  Few real world problems are truly simple.  Simple XML can probably solve 80% of the problem, but then we get into the dicey parts, and well, that's where solutions (and XML) gets complicated.

Hey, even HTML sucks, but we've learned to live with it.

A long time ago, in a place far-far-away, I used to write code in a programming language that almost but not-quite sucked.  But it had one very important feature, and that was a pre-processor that could write amazingly complex expressions in simple, easy to use terms.  It occurs to me that we could do the same with XML today, we just need to define the preprocessing mechanism.  Let's say that I wanted to take this:


<component>
  <section>
    <code code="57026-7" codeSystem="2.16.840.1.113883.6.1"/>
    <title>Population Criteria Section</title>
    <text>
      This section describes the Initial Patient Population, 
      Numerator, Denominator, Denominator Exceptions, and
      Measure Populations</text>
    <entry>
      <observation classCode='OBS' moodCode='EVN.CRT'>
        <id root="0"/>
        <code code="ASSERTION"
          codeSystem="2.16.840.1.113883.5.4"/>

        <value xsi:type="CD" code="IPP"
          codeSystem="2.16.840.1.113883.5.1063"/>

        <sourceOf typeCode="PRCN">
          <observation classCode="OBS" moodCode="EVN.CRT">
            ...
          </observation>
        </sourceOf>
      </observation>
    </entry>
  </section>
</component>

And turn it into this:


<PopulationCritieriaSection>
  <title>Population Criteria Section</title>
  <text>
    This section describes the Initial Patient Population, 
    Numerator, Denominator, Denominator Exceptions, and
    Measure Populations</text>
  <InitialPatientPopulation>
    <id root="0"/>
    <precondition>
      <observationCriteria>
       ...
      </observationCriteria>
    </precondition>
  </InitialPatientPopulation>
</PopulationCriteriaSection>

This is a pretty simple transformation of hard-to-read XML into something that is pretty readable.  How could I do this in a way that was "standards-compliant".  The keyword here is transformation.  I need some way to tell the receiver that it needs to transform the XML before using it.  Hey, there's even a standard for that!

What if, at the top of the document, I put in this little piece of XML:

<?xml:stylesheet href='http://myxmlisbetter.com/yxs.xsl' ?>

And then, if I included in that stylesheet, the following:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  version="1.0">
  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>
  <xsl:template match='PopulationCritieriaSection'>
    <component>
      <section>
        <code code="57026-7" 
          codeSystem="2.16.840.1.113883.6.1"/>
        <xsl:apply-templates/>
      </section>
    </component>
  </xsl:template>
  <xsl:template match='InitialPatientPopulation'>
    <observation classCode='OBS' moodCode='EVN.CRT'>
      <xsl:copy-of select='id'/>
        <code code="ASSERTION"
          codeSystem="2.16.840.1.113883.5.4"/>
        <value xsi:type="CD" code="IPP"
          codeSystem="2.16.840.1.113883.5.1063"/>
        <xsl:apply-templates/>
    </observation>
  </xsl:template>
  <xsl:template match='precondition'>
    <sourceOf typeCode='PRCN'>
      <xsl:apply-templates/>
    </sourceOf>
  </xsl:template>
  <xsl:template match='observationCriteria'>
    <observation classCode="OBS" moodCode="EVN.CRT">
      <xsl:apply-templates/>
    </observation>
  </xsl:template>
  <xsl:template match="*">
    <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>


If your application didn't understand my better XML, but did understand the geeky original, it could simply regenerate the original XML by applying the transformation (and it does work, I tested it).  BUT, if it did understand the "Better" XML, it could just process that directly.  OK, so this is pretty much the way that "GreenCDA" works today, except that it doesn't put an XML stylesheet processing instruction right up front in the "green" document.

To go a little bit further, let's say I wanted to reuse some content over and over again.  In C (or later derivatives), I would say
#include "stdio.h"

In XML, I can say (again using a standard):
<xi:include href='mystuff.xml'/>


So, if you have to deal with XML that sucks, you can fix it up using this technique (at least before you have to release for production).  There are a bunch of challenges in using this technique that I haven't addressed.

  1. Governance:  Who gets to define transformations that are allowed to be used in an exchange.
  2. Retrieval and security issues: The transformation resources becomes referenced as part of the communication, but it's specification as a URL seems to imply that separate retrieval is required (this isn't necessarily the case, but I digress).  The real issue is that if you've gone to the trouble to set up a secure transport, and then throw this into the mix, it messes with things. BTW: There is a way to include the XSLT inline in the original XML using HL7 V3 extensions and XML Fragment identifiers
  3. Overkill:  For most purposes, XSLT is probably a little bit too powerful.  It certainly does the job, but there is probably  a better way to represent the mapping for 80% of the cases. (see above on solutions getting tricky after that first 80%).
Anyway, this was an interesting digression, and gives me an excuse to use simpler XML as I start working through Query Health examples.  If I do it right, I can always transform them back to compliant implementations, and who knows, maybe the improvements I come up with could be used to improve the standard.  After all, HL7 is revamping HQMF, which is great timing for Query Health.

4 comments:

  1. In my experience, after a relatively small portion of kool-aid, the complaint changes from "the XML sucks" to "the XSD sucks" or "the tooling sucks". These statements are all technically correct, of course.

    ReplyDelete
  2. You ahould consider the QuantalUEL (for Quantal Universal Exchange Language) technology that Quantal Semantics, Inc. is developing as an XML-like metadata language with a built in probabilistic capability. You can check out our HealthCom '11 preprint here: http://quantalsemantics.com/documents/p173-robson-corrected.pdf.

    Tom

    ReplyDelete
  3. I'm not sure the *better* XML really does very much to improve things. It is still arcane. You need to remember the right tags instead of the right attributes ...and why don't we all officially use shorthand anyway?

    Are you factoring out the complexity or are you just pushing it around? Seems like none of this goes away unless I no longer need to understand the transformations - the way I can use a modern programming language without being able to read binary or knowing my way around the registers of my CPU.

    So the real question is whether the XML is sucky because it has to be or if it is just sucky.

    ReplyDelete
  4. Oh the HQMF XML sucks and not because it is necessary/ Take as an example the observation that a patient is 18 years or older. The current representation of this observation (say in measure CMS-138, tobacco cessation counseling) takes 22 formatted lines and 6 levels of nesting.

    Now, be aware, this is the expression of the measure logic that developers are supposed to execute against the patient chart to evaluate the measure. Also note that is expression is just one simple conjoined clause in the initial patient population (probably the simplest clause). This level of complexity is mind boggling and completely unnecessary to express the measure logic.

    Why would anyone develop something so arcane? Possibly, they just don't know better, they design by committee or back-fit to some other arcane specification (e.g., QDM, RIM), or higher complexity is simply more lucrative in this environment.

    ReplyDelete