Thursday, January 19, 2012

Microdata Shot Down but still flying for CDA

Graham Grieve shot down my HTML5 + Microdata proposal Tuesday with a well-placed comment on the ease of writing micro-data [or actually the lack there-of]. I’m borrowing from Calvin Beebe’s suggestion (he’s an HL7 Structured Document Workgroup Chair) that there be a way to transform back and forth between formats to support some of the capabilities provided by HTML5 + Microdata, and still maintain the ease of writing (and validating) CDA documents in XML.

While, I’m still convinced that we need to use HTML5 for the text portion of the document, and still really like what Microdata does for you with the document in the browser. But now I acknowledge that we will probably need to allow HTML5 and some RIM (or perhaps FHIR [pronounced Fire]) based XML representation of content. Whether it be RIM-based or FHIR-based, call it HL7 XML and lets move on.

In order to resolve Graham’s issue though, I’ll need a way to translate back and forth between HTML5 + HL7 XML and HTML5 + Microdata. The only thing I can count on being in common between FHIR and another HL7 XML format is that they are XML, so I’m just going to work on Microdata to XML and back transformation.

Here is some of my initial thinking on the transformation:

  • The transformation may be guided by an XML schema, but this is not a requirement.
  • Transformation to Microdata and back should be possible.
  • The itemprop attribute represents the name of the element or attribute. When the value of itemprop begins with an @ character, it will be represented as an attribute in the XML. When it begins with any other character it will be represented as an element. Since @ isn’t a legal name start in XML, this seems like it should work.
  • The itemtype attribute must represent the Schema type of the element or attribute in the form URL#typename, where URL is the namespace URL associated with element or attribute in the schema, and the typename is the name of the complex or simple type in the schema.
  • When there is no namespace URL associated with the Schema, the URL shall be TBD.
  • Properties which are represented in the XML as attributes are given names of the form: @name where name is the attribute name.
  • An attribute can be assigned more than one property value if its schema simple type is defined using
  • Properties which are represented in the XML as elements are given names of the form: name where name is the element name.

What is cool about this transformation is that I also get (for free), a JSON and RDF interpretation along with the Microdata representation of the machine readable metadata.

Associating an XML Document with an HTML 5 element as microdata without an Schema is pretty easy. You traverse the DOM of the XML Document an:
  1. Start with the HTML 5 element to which the XML document is to be associated as Microdata.
  2. Add itemscope to the HTML 5 element.
  3. Add itemprop to the HTML 5 element using the name of the element.
  4. If there is an xsi:type attribute associated with the element, set the itemtype to be the namespace URL for the type, followed by # followed by the type name.
  5. For each attribute of the element, add a new item to the item associated with the HTML 5 element. That item will have an itemprop value of @ plus the attribute name. Set itemValue to be the value of the attribute.
  6. For each child element, add a new item to the item associated with the HTML 5 element. Recurse at step 1 above.
There are some details still to be worked out, like the tricky bits of Schemas without a namespace (because the itemtype would become #typename), but  not to hard even for a freshman programming project.  Reversing this transformation is just little bit harder, but not impossible. I’ll be working out the details a bit later this year, and leave it as an exercise for the reader until then.  If you figure it out, do let me know.


Post a Comment