Wednesday, November 30, 2016

Implementing Partial CDA Validation

In Partial Rejection and Levels of Validity in CDA (or anything else for that matter) I discussed levels of validation of CDA content.  Now I have to make that real.  There are two different ways to go about it.  As you might recall, here are a few of the levels in the partial validation hierarchy:

Level 0: Totally bogus content.  Is this even XML?
Level 1: The CDA Header is valid.
Level 2a: Level 1 + the narrative content is valid according to the CDA Schema
Level 2b: Level 2 + the LOINC codes for documents and sections are recognized as valid.

The first level is just doing an XML Parse without validation.  This will ensure content is well-formed XML.  If you fail this test, no need to go further.

The next level validates everything up through nonXMLBody or structuredBody.  This is easy.  Craft a new CDA Schema by editing POCD_MT000040.xsd as follows (delete struck out material and insert underlined material):

  <xs:complexType name="POCD_MT000040.ClinicalDocument">
    <xs:sequence>
      <xs:element name="realmCode" type="CS" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="typeId" type="POCD_MT000040.InfrastructureRoot.typeId"/>
      <xs:element name="templateId" type="II" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="id" type="II"/>
      <xs:element name="code" type="CE"/>
      <xs:element name="title" type="ST" minOccurs="0"/>
      <xs:element name="effectiveTime" type="TS"/>
      <xs:element name="confidentialityCode" type="CE"/>
      <xs:element name="languageCode" type="CS" minOccurs="0"/>
      <xs:element name="setId" type="II" minOccurs="0"/>
      <xs:element name="versionNumber" type="INT" minOccurs="0"/>
      <xs:element name="copyTime" type="TS" minOccurs="0"/>
      <xs:element name="recordTarget" type="POCD_MT000040.RecordTarget" maxOccurs="unbounded"/>
      <xs:element name="author" type="POCD_MT000040.Author" maxOccurs="unbounded"/>
      <xs:element name="dataEnterer" type="POCD_MT000040.DataEnterer" minOccurs="0"/>
      <xs:element name="informant" type="POCD_MT000040.Informant12" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="custodian" type="POCD_MT000040.Custodian"/>
      <xs:element name="informationRecipient" type="POCD_MT000040.InformationRecipient" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="legalAuthenticator" type="POCD_MT000040.LegalAuthenticator" minOccurs="0"/>
      <xs:element name="authenticator" type="POCD_MT000040.Authenticator" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="participant" type="POCD_MT000040.Participant1" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="inFulfillmentOf" type="POCD_MT000040.InFulfillmentOf" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="documentationOf" type="POCD_MT000040.DocumentationOf" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="relatedDocument" type="POCD_MT000040.RelatedDocument" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="authorization" type="POCD_MT000040.Authorization" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="componentOf" type="POCD_MT000040.Component1" minOccurs="0"/>
      <xs:element name="component" type="POCD_MT000040.Component2"/>
      <xs:any processContents/>
    </xs:sequence>
    <xs:attribute name="nullFlavor" type="NullFlavor" use="optional"/>
    <xs:attribute name="classCode" type="ActClinicalDocument" use="optional" fixed="DOCCLIN"/>
    <xs:attribute name="moodCode" type="ActMood" use="optional" fixed="EVN"/>
  </xs:complexType>

This will result in the schema processor ignoring anything after the CDA Header.  Or will it? Actually, this will fail, as the schema now violates the Unique Particle Attribution constraint of XML Schema 1.0.  However, if you could be sure that componentOf would be present, setting minOccurs="1" on that declaration resolves the problem.  But not every CCDA requires that, and so that little fix won't work.  OK, what if we change the definition of that last component so that it can contain anything?  Yep, that works.

It should look something like this:
  <xs:complexType name="POCD_MT000040.Component2">
    <xs:sequence>
      <xs:any processContents="skip" />
    </xs:sequence>
    <xs:anyAttribute processContents="skip"/>
  </xs:complexType>

So, now <component> can contain any sort of well formed XML content, and your "header validator" won't care.

An alternative implementation would use a specialized XSL identity template with some exceptions to skip any unrecognized content after componentOf, and simply delete the component element definition in POCD_MT000040.ClinicalDocument.

The next challenge is validating narrative only content.  For that, you want to tweak section definitions within the document so that you don't care about validating any content that isn't <text>, <title> or perhaps <code> within <section>, and that you validate subsection content.

That's a bit trickier.  For this case, you could define a <component> element at the top level which would be overriden by specializations of <component> defined within the header or entries (which you really don't care about), but which would be processed when matched by <xs:any processContents='lax'>.  However, rather than do that, my recommendation would be to create a specialized identity template that copies only what you want to validate within sections, and skips anything you don't care to validate.  Then you can just use the standard CDA Schema to validate the content without any changes (because all content within a section is optional according to the schema).

In that way, what you've just done is eliminated the potentially invalid content.  There's extra value there, because now what you have is a transform of the original content which, if "narrative valid", is probably safe to keep around for viewing and transformation by a stylesheet.

That identity template is a simple exercise in software engineering.  I'll leave it to the interested reader to figure it out.

Oh, one final thing: Don't be dumb and validate in easy to hard order.  Validate in the other order, because it will cost less in processing time for good documents.  Let the bad ones pay the performance penalty for multiple validation stages.

   -- Keith



3 comments:

  1. Do you have a library of CDA documents that can be used for testing a validation tool? That is good and bad files?

    What about validation to IHE profile constraints?

    Thanks

    ReplyDelete
    Replies
    1. 1. No and yes .. do you? Yes. I have one. Can I share it? No. Does it have positive and negative testing? It's a work in progress, but yes.

      2. It's the same kind of stuff. For that, you may be able to access a repository of good and bad stuff in Gazelle.

      Delete
  2. Thanks for the information,its really well explained and defined to the ease
    Will like to visit here again
    Thanks .. Ben-Hur 2016 vf

    ReplyDelete