As far as tools go, it's a start, but I need more. My developers want a Schematron for the CDA Consolidation Guide. I want the data in a format that I can compare with the source materials to perform a gap analysis with the HITSP C32. So, I took a break over the last two days to work on getting the data into a format that I can use it (XML), and started building a Schematron from it. The first part works very well and you can find it under the Download heading below. The second is still a work in progress, so I won't make it available today.
To get the data out of the database into an XML Format, I used three tools:
- Oxygen XML Editor (my current favorite among XML Editors)
- Xalan XSLT Engine (the only XSL Transformer I want to use)
- MySQL JDBC Drivers
- An XSL Stylesheet which is really the subject of this post
Oxygen includes Xalan, and that includes a SQL Extensions library, which means that I can perform queries to unlock the data in the Trifolia Workbench tables. I wrote a two pass transform where the first pass rips the tables, and the second pass builds the appropriate XML structure. The Trifolia template database has five main tables:
- Templates (template)
- Constraints (template_constraint)
- Code Systems (dictionarycodesystem)
- Value Sets (valueset)
- Value Set Members (valuesetmember)
The first two main tables are the most important from a validation perspective. The remaining three support simple value set maintenance. There are also a number of code tables containing human readable terms for various purposes:
- Vocabulary Binding Types (vocbinding_type) [STATIC, DYNAMIC]
- Contexts (dictionarycontext)
- Conformance Types (conformance_type) [SHALL, SHALL NOT, SHOULD, SHOULD NOT, MAY]
- Template Types (template_type) [document, section, entry, subentry, unspecified]
I don't deal with the implementation guide tables (implementationguide or associationtemplateimplementationguide), or the list of users (tdb_users), but you should know these tables are there.
In the first pass, the contents of these tables are copied into <templates>, <constraints>, <codeSystems>, <valueSets>, and <members> elements inside the <tdb> element. Copying the data is pretty straightforward. The first step is to perform the appropriate query against the tables, resolving terms in the code tables with LEFT JOIN operations.
I wrote a template that processes the result set, and generates and attribute or element for each cell in each row using the column labels. I reproduce it below because it can be used to liberate data from any SQL query in an XML format.
<!-- resultXML is the rowset returned by the query -->
<xsl:param name="resultXML" select="."/>
<xsl:param name="result" select="exslt:node-set($resultXML)"/>
<!-- map is a string containing either an a or an e for each cell
indicating whether it should be output as an attribute or an
<xsl:param name="meta" select="$result/sql/metadata"/>
<!-- name is the element name to use for each row, and defaults
to the table name -->
<xsl:param name="name" select="$meta/column-header/@table-name"/>
<!-- for each row in the result set -->
<!-- generate an element for the row -->
<!-- for each column in the result set -->
<xsl:variable name="pos" select="position()"/>
<!-- if there is an a in the map position for the cell -->
test="substring($map, $pos, 1) = 'a' and string(.) != ''">
<!-- generate an attribute for it -->
<!-- And again for each column in the result set -->
<xsl:variable name="pos" select="position()"/>
<!-- if there is an e in the map position for the cell -->
test="substring($map, $pos, 1) != 'a' and string(.) != ''">
<!-- generate an element for it -->
The first pass extracts the data to an XML Fragment, but the appropriate containment isn't established. That fragment is restructured in the second pass. In that pass, each constraint element is placed inside the appropriate <template> or <constraint> depending upon its parentConstraintID and templateID attributes, and each <member> is inserted inside the appropriate <valueSet> element based upon it's valueSetOID attribute (see notes below).
Ripping the data out took about half a day, and the XML format is remarkably close to what I suggested as a metamodel not quite a year ago. The next step is crafting a Schematron from this. I've gotten that nearly working, but have to work around this notion of branches that is in Trifolia. If I had a notion of what MDHT wanted as an import format for templates, I could also produce that.
You can download the Stylesheet from Google Code. It extracts the data from the Trifolia Workbench database into an XML Format. Document on that format will come as soon as I'm happy with it.
- Install the Trifolia Database according to the Instructions (requires MySQL Community Edition)
Note: If MySQL is already installed, be sure to execute the set charset latin1 command before running the source command.
- Install the JDBC Driver for MySQL. Copy the jar file to the appropriate lib folder.
- Modify the <xsl:param> elements at the top of the stylesheet to insert your user and password for creating the database connection.
- Run the stylesheet. It will output a file containing the contents of the template database.
Issues and Questions
If you have issues or questions, please add comments below. I'll do my best to address them here.
The stylesheet fixes up a couple of problems in the Trifolia Workbench data. Contexts in the workbench data aren't namespace qualified, so you cannot use them directly in stylesheets or Schematron. I've patched that for now in the stylesheet, and logged a bug to the Lantana JIRA Bug tracker via this e-mail (only use this e-mail to report bugs in the Trifolia Workbench).
Also, be aware that the valueSetOID field in template_constraint is actually the ID of the value set, not it's OID. I patched that to in my stylesheet, and logged a separate bug for that.