Tuesday, March 15, 2011

Why there is no W3C Schema for CCD

One of the callers on today's CDA Consolidation call asked if there would be an XSD for the Consolidated CDA Implementation Guide.  This is a fairly common request that I hear from a lot of different sources.  If you have the CDA specification, it provides a W3C Schema for the HL7 CDA Standard.  But there is no W3C Schema for CCD itself, or for other implementation guides built on top of CDA or CCD.

CDA is a standard that is capable of representing a wide variety of clinical documents.  CCD and other implementation guides place further constraints on CDA, but do not do so in a way that you can create a W3C Schema (an XSD file) for them.  The reason for that is that the W3C Schema standard represents the original constraints that were present in XML and its predecessor SGML.  Those constraints required that the Element name fully define the possible model for the XML having reached a particular point within the parse of a document.  Even though parsing technology has advanced quite a bit since SGML was specified, that requirement still remains.  This means that it becomes difficult or impossible in W3C Schema to even specify a set of content constraints of an XML element based on the value of one of the attributes of that element.

In CCD, the model is not described by the element name, nor even an attribute of that element, but rather from the value of the root attribute of the templateId element which appears inside the element.  An XML parser would need to determine the particular set of constraints that would be applied using some form of lookahead.  The original SGML markup language forbade the requirement for lookahead in parsing markup. This constraint made its way into XML DTDs and subsequently into the W3C XML Schema language.  So, there is no W3C way to create a schema that can tell you how to further constrain the CDA XML for use in CCD.  The same is also true for constraining XHTML or DOCBOOK or any other general purpose markup language.

One of the benefits of W3C schema support and the reason that it is often asked for is that it enables easy translation from databases or other persistence layers based on the XML content model.  Some would argue that this limit of CCD is a problem.  I happen to like the ability to layer constraints to support incremental interoperability.  GreenCDA is one effort at moving through the layers of constraints to generate XML that has a schema.  It may be a way through the challenge as transforms from Green to normalized CDA would still provide incremental interoperability.  I'd like to see a more algorithmic approach (read: repeatable and programmable) used to derive GreenCDA schemas from the layered constraints.

But, there it is.  If you want to know why there isn't a schema, it's because of some fundamental limits placed on XML, and those limits apply to any general purpose markup language.  The same problem appears in DocBook, XHTML and any other markup language that you want to further constrain for a particular purpose.  It's also the same reason why there isn't an XML schema specific to the XDS metadata (and in fact, a repeatable algorithm for developing a GreenCDA could also be used to develop a GreenXDS metadata package).  Hmm, I'm seeing a trend here...


1 comment:

  1. Despite the good reasons behind not having a schema, I still have the same problem: my client needs to communicate a CCD with other institutions and can't because there's no standard. I think it's unfortunate that after we've spent months preparing the EMR to export XML, establishing connectivity, sorting through legal issues, that we ALSO have to meet with programmers to validate our XML. Let's fix this!

    ReplyDelete