Wednesday, July 27, 2011

A Wickedly Fresh look at CDA through Microdata and HTML5

So I said yesterday that I had an idea forming in the back of my head around Microdata and CDA Release 3. It ties into a previous post on using XHTML to represent CDA R3 Narrative, and goes even one step further to eliminate Version 3 XML representation from the document altogether, replacing the V3 modeling with Microdata.

In HTML5 Microdata, there are five important new attributes supporting microdata that can appear on any element.

  • itemscope - A boolean attribute indicating the presence of a new semantic item in the content.
  • itemtype - A URL giving the definition of the item's type, only allowed on elements that define items.
  • itemprop - The name of property on an item.  It may be used in conjunction with itemscope and itemtype to define a property returning an item.
  • itemid - A globally unique identifier (in URL form) for the item.
  • itemref - A pointer to a set of identifiers where additional properties can be found (vitally important).
If you understand how objects are put together in JavaScript as bags of named properties, you should be able to understand Microdata pretty easily. 

Now, what we need to be able to do is to map an HL7 Version 3 model onto microdata.  That is what an ITS does by the way, maps the HL7 models into a representation.

These Property bags that microdata creates can represent just about anything you want them to, including the full RIM based model.

There are some things though, that HTML5 already does that is CDA-like that should be preserved.  For example, there are elements to handle sections and their titles correctly so you can ignore the H1-H6 problem.  There are other elements that handle timestamps well.  So, you can interweave your V3 model into the narrative with a well-constructed Microdata ITS, and that model can come from any V3 Domain, and you can have a standardized RIM representation of it as well.  That RIM representation can be constructed directly from the HTML5 representation using some of the developing APIs.  This is remarkably simple stuff that could eventually make CDA R3 simply a profile of HTML5 + Microdata + an ITS.

It will be an interesting thing to explore in more detail.  BUT don't expect a specification in 90 days folks.  Let's let Microdata get out of the W3C process before we go gung ho into the NEXT S&I Framework process.  What I think we should be doing is experimenting with this now so that as the W3C further develops microdata, we can provide valuable feedback.

One thing I know I don't like (but could get used to) is the way itemref works.  I come from an NLP background.  In that space, you usually build semantic objects from the bottom up, not the top down.  When building from the bottom up, it's easy to find different parts of a thing in different places in the text.  For example, you could have three spans of text, on containing a test result, the next the normal range, and the last, the interpretation of it.  Spans 1 and 3 are part of the observation.  Span 2 is part of a reference range associated with it.  Each of these parts might be discovered by a separate NLP process, and want to be linked BACK to the item that it belongs to, rather than having the item link to all of its parts.  Either way works from a modeling standpoint, because an ID/IDREF link can be traversed in either direction.  Using itemref makes it easy (and possibly more efficient in existing parser implementations) to find the parts later.  Reversing the linkage so that the part points to the whole keeps the parts that you are examining (and possibly annotating with microdata) in one place.   The efficiency argument is based on the fact that many parser implementations will index items by their id.  But you could just as easily index an IDREF using a particular name to find all things that point to a particular id.  It's just that parsers today typically already maintain the ID index, and don't necessarily deal with one of IDREFs.


  1. Oh No!!!. You used too many key-terms and cool sounding buzz-words. The next S&I Framework project will be based on this blog... Actually, the next 'Summercamp'.

  2. I think it's a very interesting idea! Maybe a less-RIMy mapping would be adequate, more in line with some of the "green" approaches. There would be no more "wire format" debates :)

    The ability for a user agent to be able to extract encoded data from any web page is a powerful bonus, not to mention the enabling of more specific search capabilities. (Sound familiar?)

  3. Ideally you'd have a separate RIM-based model that fully expresses all of the semantics including the 7 levels of nesting through various relationships to properly express your semantics. However, your item. You'd then have a collapsed/green view of that that defines the items actually of interest that could be referenced in your HTML. The expression of the base model with RIM mappings could be made available at the itemtype URL . . .

  4. If green to RIM can be automated and visa versa, this is is clear winner in the ease of use vs. Semantic concistency/complexity battle. There is some promising work from Robert Warden in the UK on the automation front. I like the idea that Green types map to transforms for RIM semantics as well as human understandable meanings.

  5. I just had a PCAST Déjà vu while reading this blog post.

    If certain IHE folks read this they may come up with a long list again :-)


    Michael Planchart