Thursday, April 8, 2010

Validating CDA Documents

One of the benefits of using Schematron to validate CDA documents is that the assertion of conformance to a template in a CDA element can be used to trigger the testing rules in a schematron.  A question that recently came up is:  What schematrons do I need to use to validate a document of type X (where X is an HL7 Implementation Guide, IHE Profile, HITSP specification or other guide)?

The best answer that I can give is in fact:  ALL of them.  The reason for this is that most CDA implementation guides permit inclusion of sections or entries not otherwise prohibited.  To verify an instance, you want to be able to check as much of what is in it as possible.  If the instance uses a template that isn't required of it, you'd still want to be sure that it used that template correctly.  So, when testing a CDA instance to see that it is valid, what you need is:

1.  For each template that you recognize, verify that the instance conforms to the template.
2.  For each template that you do not recognize, issue a warning that the test tool cannot verify that the instance conforms to the requirements of that template.

If you just want to verify that an instance is valid against a specific implementation guide, that's a slightly different story.  At that point, you can just apply all the rules of that guide and any other rules that it also requires you to apply.

Two related issues also crop up with respect to CDA schema validation:
1.  A CDA Document must conform to the CDA Schema
2.  A CDA Document may contain extensions.

These two statements would appear to conflict with each other.  However, CDA is clearly extensible according to the standard.  Extensions are permitted but must be defined in a namespace prefix that is disctinct from the urn:hl7-org:v3 namespace. 
So the first statement really needs to say:

1.  A CDA Document minus any extension attributes or elements (and their content), must conform to the CDA Schema.

There are some attributes in a CDA document that are not defined in the CDA schema, but which are not "extension" attributes.  The two most common non-extension attributes appearing in CDA instances are xsi:type and the namespace declarations (attributes matching the pattern: xmlns and xmlns:*).

A very simple XSLT transform will allow you to generate a CDA document minus extension element and attributes.  The following template will copy all elements in an tree that are in the CDA namespace, and remove all that are not.  You will need to add templates to match and copy attributes from the HL7 and other appropriate namespaces (an excercise I leave for the reader).

‹xsl:template match='cda:*'›
    ‹xsl:apply-templates select='@*'›

    ‹xsl:apply-templates select='cda:*'›


Having produced an extension free CDA instance, you can validate it according to the CDA Schema.

Validating extension elements themselves is often done through Schematron, but can also be done by modifying the base CDA Schema to insert these elements at the appropriate locations.  This has been done for the NIST Validator when testing against the HITSP C32 specification.

Finally, there are several notes appearing in the CDA model that are not verified by Schema or available schematrons.  Some of these are checked by the Eclipse Instance Editor.  Many of these could be checked by Schematron, a project for you to work on if you are interested.

1 comment:

  1. Those implementers that rely on XML technologies and tools may be tempted to disregard Keith's reference to the Eclipse Instance Editor.

    Please don't: most invalid CDA documents are invalid because of invalid use of datatypes, something neither validated by schema nor schematron. See for a whitepaper about that topic.

    A prepackaged version of the Eclipse editor for CDA is described here:

    A CDA document has to validate against the abstract class definition as present in its specification. To only use schema and schematron won't be sufficient.