Tuesday, December 4, 2018

A one day build for Version 2 FHIR StructureDefinition

I love Adam Savage's one day builds.  I also like pieces of useful software I can build in a day.  Most recently I've been investigating how to do conversions of legacy data types (e.g., HL7 Version 2) to FHIR, and one of the things that I discovered I might need was a way to represent a V2 resource as a FHIR StructureDefinition (especially since I already have access to one Grahame built for CDA).

You see, I already have some tools that can plow through the details of a StructureDefinition to support conversion to FHIR.  If I want a reversible conversion, I probably need to work with the same data structures on either side.  Most of the data I need to populate the StructureDefinition can be found in various places, but the easiest (for me) to access and use are the HL7 Version 2 schema.  You can get schema for every Version 2 version from 2.1 all the way up through 2.8.2.  I chose to work with the 2.8.1 schema because that's the highest version that HAPI HL7 V2 supports at the moment.

I'm working with 2.8.1 because its generally backwards compatible (not completely, but nearly so, and I can fix the removed content) with all prior versions of HL7 V2, and I could be seeing many different versions.  Dumbing down a version is easier that augmenting one.

To convert a schema to a StructureDefinition, I'm going to need to pick some tools.  There's lots of ways to go from one to the other, but if you've been reading this blog, you already know I spend a lot of time (way too much in fact) using XSLT.  So, this build is going to use XSLT Version 2.0 as one of my tools, and the XSLT transformer will be Saxon Personal Edition 9.8 (because that's the version that my XML Editor uses).  For XML editors, just about any will do, but I happen to like the tools from Oxygen.  These days I'm using XML Developer, though I have in the past also used XML Editor.

There are a lot of different ways still to handle this.  I could build something that generally understood XML Schema, but that is by no means a one day build.  Schema is way to complicated for that.  Fortunately, the V2 XML Schema's are produced from the V2 database, and have a pretty constrained use of XML Schema, which will make my work a great deal simpler.

The first step is to take a look at the schema, and figure out how to map the data to a FHIR StructureDefinition (if you think that having to create a mapping to create a mapping to ... is a bit recursive, you are surely correct).

There's actually a few dozen schema, one for each message type.  That's ok, the computer doesn't really care how many of these it has to build.  Lets pick one, like ADT_A01 to start with.
<?xml version="1.0"?>
<!--   v2.xml Message Definitions for Version 2.8.2 - here: ADT_A01-->
<!--   Copyright (c) 1999-2016, Health Level Seven. All rights reserved.   -->
<!--   (generated on 17.11.2016 by HL7-Database) D:\Eigene Dateien\HL7\Datenbank\hl7_92.mdb-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="urn:hl7-org:v2xml"
    xmlns:hl7="urn:hl7-org:v2xml" targetNamespace="urn:hl7-org:v2xml" version="1.1">
    <!-- import segment definition for version -->
    <xsd:include schemaLocation="segments.xsd"/>
    <!-- MESSAGE ADT_A01 -->
    <!-- .. message definition ADT_A01 -->

Already we can see some values in comments that will be useful for the StructureDefinition data (things like copyright, dates, publisher, descriptions, version, et cetera).  Most of the other messages look the same.  

Then we need to look at segments, which again generally all look the same.
    <xsd:complexType name="MSH.CONTENT">
        <xsd:sequence>
            <xsd:element ref="MSH.1" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="MSH.2" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="MSH.3" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.4" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.5" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.6" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.7" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="MSH.8" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.9" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="MSH.10" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="MSH.11" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="MSH.12" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="MSH.13" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.14" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.15" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.16" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.17" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.18" minOccurs="0" maxOccurs="unbounded"/>
            <xsd:element ref="MSH.19" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.20" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.21" minOccurs="0" maxOccurs="unbounded"/>
            <xsd:element ref="MSH.22" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.23" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.24" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="MSH.25" minOccurs="0" maxOccurs="1"/>
            <xsd:any processContents="lax" namespace="##other" minOccurs="0"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:element name="MSH" type="MSH.CONTENT"/>

Now I can see though, where cardinality comes from, and that I will have to traverse through element to complexType/sequence to get to parts of this via type/name links.

Next I look at Fields:
    <!-- FIELD MSH.2-->
    <xsd:attributeGroup name="MSH.2.ATTRIBUTES">
        <xsd:attribute name="Item" type="xsd:string" fixed="2"/>
        <xsd:attribute name="Type" type="xsd:string" fixed="ST"/>
        <xsd:attribute name="LongName" type="xsd:string" fixed="Encoding Characters"/>
        <xsd:attribute name="minLength" type="xsd:integer" fixed="4"/>
        <xsd:attribute name="maxLength" type="xsd:integer" fixed="5"/>
    </xsd:attributeGroup>
    <xsd:complexType name="MSH.2.CONTENT">
        <xsd:annotation>
            <xsd:documentation xml:lang="en">Encoding Characters</xsd:documentation>
            <xsd:documentation xml:lang="de">Weitere Trennzeichen</xsd:documentation>
            <xsd:appinfo>
                <hl7:Item>2</hl7:Item>
                <hl7:Type>ST</hl7:Type>
                <hl7:LongName>HL7Encoding Characters</hl7:LongName>
            </xsd:appinfo>
        </xsd:annotation>
        <xsd:simpleContent>
            <xsd:extension base="ST">
                <xsd:attributeGroup ref="MSH.2.ATTRIBUTES"/>
            </xsd:extension>
        </xsd:simpleContent>
    </xsd:complexType>
    <xsd:element name="MSH.2" type="MSH.2.CONTENT"/>
    <!-- FIELD MSH.3-->
    <xsd:attributeGroup name="MSH.3.ATTRIBUTES">
        <xsd:attribute name="Item" type="xsd:string" fixed="3"/>
        <xsd:attribute name="Type" type="xsd:string" fixed="HD"/>
        <xsd:attribute name="Table" type="xsd:string" fixed="HL70361"/>
        <xsd:attribute name="LongName" type="xsd:string" fixed="Sending Application"/>
    </xsd:attributeGroup>
    <xsd:complexType name="MSH.3.CONTENT">
        <xsd:annotation>
            <xsd:documentation xml:lang="en">Sending Application</xsd:documentation>
            <xsd:documentation xml:lang="de">Sendende Anwendung / Sendender
                Bereich</xsd:documentation>
            <xsd:appinfo>
                <hl7:Item>3</hl7:Item>
                <hl7:Type>HD</hl7:Type>
                <hl7:Table>HL70361</hl7:Table>
                <hl7:LongName>HL7Sending Application</hl7:LongName>
            </xsd:appinfo>
        </xsd:annotation>
        <xsd:complexContent>
            <xsd:extension base="HD">
                <xsd:attributeGroup ref="MSH.3.ATTRIBUTES"/>
            </xsd:extension>
        </xsd:complexContent>
    </xsd:complexType>

    <xsd:element name="MSH.3" type="MSH.3.CONTENT"/>

And stuff starts to get interesting, because some fields are complex and some are simple, and some data is actually in fixed attributes rather than schema or annotations.  And I'll have to navigate UP a type hierarchy through extension/@base.

and finally, we come to Datatypes, the (eventually) terminal nodes of the element tree in StructureDefinition.

For some things, I want to create the StructureDefinition the same way that Grahame did it for CDA, so I'll look at some key places there as well.  The stuff I've highlighted is where I've identified stuff I think I want to do the same way Grahame did.

  <extension
             url="http://hl7.org/fhir/StructureDefinition/elementdefinition-namespace">
    <valueUri value="urn:hl7-org:v3"/>
  </extension>
  <url value="http://hl7.org/fhir/cda/StructureDefinition/ClinicalDocument"/>
  <version value="0.0.1"/>
  <name value="CDAR2.ClinicalDocument"/>
  <title value="ClinicalDocument (CDA Class)"/>
  <status value="active"/>
  <experimental value="false"/>
  <date value="2018-07-26T05:39:34+00:00"/>
  <publisher value="HL7"/>
...
  <fhirVersion value="3.0.1"/>
  <kind value="logical"/>
  <abstract value="false"/>
  <type value="ClinicalDocument"/>
  <baseDefinition value="http://hl7.org/fhir/StructureDefinition/Element"/>
  <derivation value="specialization"/>

So here are my first cut at mappings for the "header" of the StructureDefinition.

Field
Where does it come from in the Schema
extension

  url
  value
/schema/@targetNamespace
url
identifier

version
String of the form #[.#+]+ in a comment containing the text version
name
Text of version comment up through version for message
Value of complexType/annotation/appInfo/LongName for CONTENT definition of Field
display
Same as name
status
active
experimental
false
publisher
From Copyright line between , and .
date
From generated on comment in dd.mm.yyyy form
description
Same as name
copyright
First comment containing the text Copyright
code
HL7 V2 Event code from name of first element
fhirVersion
1.0.2
kind
Same as for CDA
constrainedType
As for CDA
snapshot

  element
Start with top level element and to a depth first traversal of the schema until you get to types (e.g., CE, ST)

OK, now I'm ready to populate the header, with something like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="http://hl7.org/fhir"
    xmlns:v2="urn:hl7-org:v2xml" xmlns="http://hl7.org/fhir" exclude-result-prefixes="xs f v2"
    version="2.0">
    <xsl:output indent="yes"/>
    <xsl:variable name="segments" select="document('segments.xsd')"/>
    <xsl:variable name="fields" select="document('fields.xsd')"/>
    <xsl:variable name="datatypes" select="document('datatypes.xsd')"/>
    <xsl:variable name="parts" select="$segments | $fields | $datatypes"/>

    <xsl:template name="start">
        <xsl:apply-templates select="$msg"/>
    </xsl:template>
    
    <xsl:template match="/">
        <StructureDefinition>
            <extension url="http://hl7.org/fhir/StructureDefinition/elementdefinition-namespace">
                <valueUri value="{/xs:schema/@targetNamespace}"/>
            </extension>
            <url value="http://hl7.org/fhir/v2/StructureDefinition/{//xs:element[1]/@name}"/>
            <xsl:variable name="versionText" select="normalize-space(/comment()[contains(., 'Version')])"/>
            <!-- version String of the form #[.#+]+ in a comment containing the text version -->
            <xsl:variable name="versionNumber"
                select="replace($versionText, '^.* ([0-9]+(\.[0-9]+)+).*$', '$1')"/>
            <version value="{$versionNumber}"/>
            <!-- Value of complexType/annotation/appInfo/LongName for CONTENT definition of Field -->
            <name value="{$versionText}"/>
            <display value="{$versionText}"/>
            <status value="active"/>
            <experimental value="false"/>
            <xsl:variable name="copyrightText" select="normalize-space(/comment()[contains(., 'Copyright')])"/>
            <xsl:variable name="publisherName"
                select="substring-before(substring-after($copyrightText, ', '),'.')"/>
            <xsl:variable name="publishedText" select="normalize-space(/comment()[contains(., 'generated on')])"/>
            <xsl:variable name="publishedDate"
                select="substring-before(substring-after($publishedText, 'generated on '), ' ')"/>
            <publisher value="{$publisherName}"/>
            <date
                value="{substring($publishedDate, 7, 4)}{substring($publishedDate, 4, 2)}{substring($publishedDate, 1, 2)}"/>
            <description value="{$versionText}"/>
            <copyright value="{$copyrightText}"/>
            <code>
                <system value="http://hl7.org/fhir/v2/0076"/>
                <code value="{//xs:element[1]/@name}"/>
            </code>
            <fhirVersion value="1.0.2"/>
            <kind value="logical"/>
            <abstract value="false"/>
            <constrainedType value="{//xs:element[1]/@name}"/>
            <baseDefinition value="http://hl7.org/fhir/StructureDefinition/Element"/>
        </StructureDefinition>
    </xsl:template>
    <xsl:template match="xs:complexType"/>
</xsl:stylesheet>

Next thing I need to figure out is how to populate elements in the snapshot.  Here is approximately what they look like.

Field
Where does it come from/Comments
path
Use FHIR . Notation for elements
representation
Only if an xml attribute, in which case it says "xmlAttr"
name
Fixed up name (remember to clean up after dots)
label
Probably same as name
short
Concise definition (e.g., Admit Patient) from Annotations in the schema
min
element/@minOccurs (or 0 if not specified)
max
element/@maxOccurs (* if unbounded)
base
Duplicates definition
type

  code
References URL to type's StructureDefinition

Where ### comes from @Type
mustSupport
True if min > 0
maxLength
From the MaxLength annotations in the schema
binding

  strength

  valueSetReference
Where #### is the table number
From the Table annotations in the schema

After <baseDefinition>, I can put in the data for the first element in the snapshot.

            <baseDefinition value="http://hl7.org/fhir/StructureDefinition/Element"/>
            <snapshot>
                <element id="{//xs:element[1]/@name}">
                    <path value="{//xs:element[1]/@name}"/>
                    <min value="1"/>
                    <max value="1"/>
                    <base>
                        <path value="{//xs:element[1]/@name}"/>
                        <min value="1"/>
                        <max value="1"/>
                    </base>
                </element>
            </snapshot>

And then do a depth first traversal after that element.
                             ...
                </element>
                <xsl:apply-templates
                    select="$msg//xs:complexType[@name = //xs:element[1]/@type]">
                    <xsl:with-param name="path" select="//xs:element[1]/@name"/>
                    <xsl:with-param name="depth" select="2"/>
                </xsl:apply-templates>
            </snapshot>

For which I'll need a template to do some more work.
   <xsl:template match="xs:complexType">
        <xsl:param name="path"/>
        <xsl:param name="depth"/>
        <xsl:for-each select="xs:sequence/xs:element">
            <xsl:variable name="ref" select="@ref"/>
            <xsl:variable name="element"
                 select="($msg | $parts)//xs:element[@name = $ref]"/>
            <xsl:variable name="name" select="translate($element/@name,'.','-')"/>
            <xsl:variable name="min" select="if (@minOccurs) then (@minOccurs) else ('1')"/>
            <xsl:variable name="max" 
                 select="if (@maxOccurs='unbounded') 
                     then ('*') 
                     else if (@maxOccurs) 
                     then @maxOccurs else 1"/>
            <xsl:variable name="type" 
                 select="$parts//(xs:simpleType|xs:complexType)[
                          @name = $parts//xs:element[@name = current()/@ref]/@type
                         ]"/>
            <xsl:variable name="base" 
                 select="$type/(xs:complexContent|xs:simpleContent)/xs:extension/@base"/>
            <element id="{$path}.{$name}">
                <path value="{$path}.{$name}"/>
                <label value="{$name}"/>
                <min value="{$min}"/>
                <max value="{$max}"/>
                <base>
                    <path value="{$path}.{$name}"/>
                    <min value="{$min}"/>
                    <max value="{$max}"/>
                </base>
                <type>
                    <code value="http://hl7.org/fhir/v2/StructureDefinition/
                                 {if ($type/xs:simpleContent) then ($base) else ($name)}"/>
                </type>
                <mustSupport value="{if(string($min) &gt; '0') then ('true') else ('false')}"/>
            </element>
            <xsl:apply-templates select="$type">
                <xsl:with-param name="path" select="concat($path,'.',$name)"/>
                <xsl:with-param name="depth" select="$depth + 1"/>
            </xsl:apply-templates>
        </xsl:for-each>
    </xsl:template>

Before the for-each, I figured out I need to walk up the type hierarchy
        <xsl:if test="xs:complexContent/xs:extension">
            <!-- Handle this
            <xsd:complexContent>
                <xsd:extension base="HD">
                    <xsd:attributeGroup ref="MSH.6.ATTRIBUTES"/>
                </xsd:extension>
            </xsd:complexContent>
            -->
            <xsl:variable name='base' select='xs:complexContent/xs:extension/@base'/>
            <xsl:apply-templates select="$parts//xs:complexType[@name=$base]">
                <xsl:with-param name="path" select="$path"/>
                <xsl:with-param name="depth" select="$depth"/>
            </xsl:apply-templates>
        </xsl:if>

To get the short descriptions in annotations, this needs to be added after the label element.
                <!-- handle this:
                    <xsd:annotation>
                        <xsd:documentation xml:lang="en">Triage Code</xsd:documentation>
                -->
                <xsl:if test="$type/xs:annotation/xs:documentation[@xml:lang='en']">
                    <short 
                         value="{normalize-space($type/xs:annotation/xs:documentation[@xml:lang='en'])}"/>
                </xsl:if>
                
To get the maximum length of the string for the data, I needed to add this before the mustSupport element.
                <!-- Get the maximum length of the string 
                <xsd:complexType name="HD.3.CONTENT">
                     ...
                    <xsd:simpleContent>
                        <xsd:extension base="ID">
                            <xsd:attributeGroup ref="HD.3.ATTRIBUTES"/>
                        </xsd:extension>
                    </xsd:simpleContent>
                </xsd:complexType>
                <xsd:attributeGroup name="HD.3.ATTRIBUTES">
                        ...
                    <xsd:attribute name="minLength" type="xsd:integer" fixed="1"/>
                        ...
                </xsd:attributeGroup>
                -->
                <xsl:variable name="atts" 
                     select="$parts//xs:attributeGroup[
                               @name = $type/xs:simpleContent/xs:extension/xs:attributeGroup/@ref
                             ]"/>
                <xsl:if test="$atts/xs:attribute[@name='maxLength']">
                    <maxLength value="{$atts/xs:attribute[@name='maxLength']/@fixed}"/>
                </xsl:if>

And table bindings to V2 value sets come after it:
                <!-- Handle binding to tables
                    <xsd:attributeGroup name="MSH.15.ATTRIBUTES">
                            ...
                        <xsd:attribute name="Table" type="xsd:string" fixed="HL70155"/>
                            ...
                    </xsd:attributeGroup>
                    -->
                <xsl:if test="$atts/xs:attribute[@name='Table']">
                    <xsl:variable name="tableNumber" 
                         select="substring-after($atts/xs:attribute[@name='Table']/@fixed,'HL7')"/>
                    <binding>
                        <strength value="extensible"/>
                        <valueSetReference>
                            <reference value="http://hl7.org/fhir/ValueSet/v2-{$tableNumber}"/>
                        </valueSetReference>
                    </binding>
                </xsl:if>

After several more tweaks, I can now generate a StructureDefinition from the V2.8.1 schema.

I don't know how useful this will be, and it's very dependent upon the V2 XML tooling, but it works well enough for now.

Unlike Adam, for which, if you like what you see you have to go build it yourself, you can find my one day build completely downloadable online.

0 comments:

Post a Comment