Tuesday, March 6, 2012

Liberating Consolidated CDA Templates from the Trifolia Workbench Data

One of the major promises of the CDA Consolidation project is the ability to automate the creation of validation and development tools from the model driven data for the templates.  I had hoped that there would be stronger collaboration MDHT project, and eventually I expect that will happen.  A few weeks back, HL7 and Lantana released a copy of the Trifolia Workbench (pdf) with the CDA Consolidation data (zip).  Trifolia Workbench is released under the Eclipse Public License. Source code, as it stands today is the database schema, and a Microsoft Access front end, along with tools to load a MySQL database with the data from the CDA Consolidation guide.

As far as tools go, it's a start, but I need more.  My developers want a Schematron for the CDA Consolidation Guide.  I want the data in a format that I can compare with the source materials to perform a gap analysis with the HITSP C32.  So, I took a break over the last two days to work on getting the data into a format that I can use it (XML), and started building a Schematron from it.  The first part works very well and you can find it under the Download heading below.  The second is still a work in progress, so I won't make it available today.

To get the data out of the database into an XML Format, I used three tools:  
Oxygen includes Xalan, and that includes a SQL Extensions library, which means that I can perform  queries to unlock the data in the Trifolia Workbench tables.  I wrote a two pass transform where the first pass rips the tables, and the second pass builds the appropriate XML structure.  The Trifolia template database has five main tables:
  • Templates (template)
  • Constraints (template_constraint)
  • Code Systems (dictionarycodesystem)
  • Value Sets (valueset)
  • Value Set Members (valuesetmember)
The first two main tables are the most important from a validation perspective.  The remaining three support simple value set maintenance.  There are also a number of code tables containing human readable terms for various purposes:
  • Vocabulary Binding Types (vocbinding_type) [STATIC, DYNAMIC]
  • Contexts (dictionarycontext)
  • Conformance Types (conformance_type) [SHALL, SHALL NOT, SHOULD, SHOULD NOT, MAY]
  • Template Types (template_type) [document, section, entry, subentry, unspecified]
I don't deal with the implementation guide tables (implementationguide or associationtemplateimplementationguide), or the list of users (tdb_users), but you should know these tables are there.

In the first pass, the contents of these tables are copied into <templates>, <constraints>, <codeSystems>, <valueSets>, and <members> elements inside the <tdb> element.  Copying the data is pretty straightforward.  The first step is to perform the appropriate query against the tables, resolving  terms in the code tables with LEFT JOIN operations.

I wrote a template that processes the result set, and generates and attribute or element for each cell in each row using the column labels.  I reproduce it below because it can be used to liberate data from any SQL query in an XML format.

  <xsl:template name="SQLtoXML">
    <!-- resultXML is the rowset returned by the query -->
    <xsl:param name="resultXML" select="."/>
    <xsl:param name="result" select="exslt:node-set($resultXML)"/>
    <!-- map is a string containing either an a or an e for each cell
         indicating whether it should be output as an attribute or an
         element
    -->
    <xsl:param name="map"/>
    <xsl:param name="meta" select="$result/sql/metadata"/>

    <!-- name is the element name to use for each row, and defaults
         to the table name -->
    <xsl:param name="name" select="$meta/column-header[1]/@table-name"/>

    <!-- for each row in the result set -->
    <xsl:for-each select="$result/sql/row-set/row">
      <!-- generate an element for the row -->
      <xsl:element name="{$name}">
        <!-- for each column in the result set -->
        <xsl:for-each select="col">
          <xsl:variable name="pos" select="position()"/>
          <!-- if there is an a in the map position for the cell -->
          <xsl:if
            test="substring($map, $pos, 1) = 'a' and string(.) != ''">
            <!-- generate an attribute for it -->
            <xsl:attribute name="{@column-label}">
              <xsl:value-of select="."/>
            </xsl:attribute>
          </xsl:if>
        </xsl:for-each>
        <!-- And again for each column in the result set -->
        <xsl:for-each select="col">
          <xsl:variable name="pos" select="position()"/>
          <!-- if there is an e in the map position for the cell -->
          <xsl:if
            test="substring($map, $pos, 1) != 'a' and string(.) != ''">
            <!-- generate an element for it -->
            <xsl:element name="{@column-label}">
              <xsl:value-of select="."/>
            </xsl:element>
          </xsl:if>
        </xsl:for-each>
      </xsl:element>
    </xsl:for-each>
  </xsl:template> 

The first pass extracts the data to an XML Fragment, but the appropriate containment isn't established.  That fragment is restructured in the second pass.  In that pass, each constraint element is placed inside the appropriate <template> or <constraint> depending upon its parentConstraintID and templateID attributes, and each <member> is inserted inside the appropriate <valueSet> element based upon it's valueSetOID attribute (see notes below).

Ripping the data out took about half a day, and the XML format is remarkably close to what I suggested as a metamodel not quite a year ago. The next step is crafting a Schematron from this.  I've gotten that nearly working, but have to work around this notion of branches that is in Trifolia.  If I had a notion of what MDHT wanted as an import format for templates, I could also produce that.

Download
You can download the Stylesheet from Google Code.  It extracts the data from the Trifolia Workbench database into an XML Format.  Document on that format will come as soon as I'm happy with it.

Instructions
  1. Install the Trifolia Database according to the Instructions (requires MySQL Community Edition)
    Note: If MySQL is already installed, be sure to execute the set charset latin1 command before running the source command.
  2. Install the JDBC Driver for MySQL.  Copy the jar file to the appropriate lib folder.
  3. Modify the <xsl:param> elements at the top of the stylesheet to insert your user and password for creating the database connection.
  4. Run the stylesheet.  It will output a file containing the contents of the template database.
Issues and Questions
If you have issues or questions, please add comments below.  I'll do my best to address them here.

Notes
The stylesheet fixes up a couple of problems in the Trifolia Workbench data.  Contexts in the workbench data aren't namespace qualified, so you cannot use them directly in stylesheets or Schematron.  I've patched that for now in the stylesheet, and logged a bug to the Lantana JIRA Bug tracker via this e-mail (only use this e-mail to report bugs in the Trifolia Workbench).

Also, be aware that the valueSetOID field in template_constraint is actually the ID of the value set, not it's OID.  I patched that to in my stylesheet, and logged a separate bug for that.

3 comments:

  1. Hi Keith,

    Please tell me how to "run the stylesheet" in Oxygen. I have set up steps 1 - 3 but I can't complete step 4.

    Thank you,
    Gary

    ReplyDelete
  2. Hi Keith,

    Have they closed access to the data model. All link revert to the Lantana Consulting demo site with no indication of availability to download it. I tried at HL7.org as well.

    -Sanjay

    ReplyDelete
  3. Hi Keith,

    Do you know why they closed access to this data model?
    Can you send us the copy?

    ReplyDelete