Thursday, July 22, 2010

XHTML for CDA Release 3

CDA Release 2 uses an HTML-like markup language to support narrative text.  The table below shows the coorespondence between the two:



CDAHTML
‹content›‹SPAN›
‹linkHTML› (attributes are identical)‹A›
LINK›
‹sub›‹SUB›
‹sup›‹SUP›
‹br›‹BR›
‹footnote›‹footnoteRef›(not available)
‹renderMultimedia›‹IMG›
‹paragraph›‹P›
‹list listType="ordered"›‹OL›
‹list listType="unordered"›‹UL›
styleCode="className"class="className"
ID="value"ID="value"



For CDA Release 3 there's been a formal proposal to use XHTML to support the narrative text. I'm very much in support of this for several reasons. You can take a class on XHTML, buy a book on it, and hire engineers with experience in this markup language. From a technical perspective, this also eliminates the need for a "default" stylesheet, since the XHTML is already there.

There are a couple of places where I suspect we will want to "Profile" XHTML to limit some of the capabilities.  The W3C developed Modular XHTML for this sort of purpose.

What are the challenges and their implications?  The biggest issue has to deal with where and how the document structure is stored. 

The structure of the CDA Document today is as a list of sections representing using a RIM class called an Organizer.  These sections can possibly contain recursive subsections.  All of these are intermingled into the RIM structure of the CDA.  Using XHTML would likely separate the organization of the text from the structured entries describing it.  The current document structure organizes content in a certain way, and that organization carries semantic meaning (not all aspects of presentation are fluff!).  This change would require some thought to determine whether: 
  1. The XHTML structure is used to organize the content, with sections attached to XHTML organizing elements such as ‹div›.
  2. A parallel organization structure is incorporated into the structured entries.
There are strengths and weaknesses to either of these.  The biggest weakness of using the XHTML representation for organizational structure of the content is the loss of the organization structure in the RIM classes.  One reason this is an issue is because of something called context conduction.  Context conduction allows certain pieces of context (i.e., the subject, author or performer of an act in a clinical statement) to be conveyed from larger structural components.  Loss of the organization structure in the structured entries would make it more difficult for systems to determine who was the author, subject or performer associated with a given act using fairly simple programatic constructs.

Duplicating the organizational structure in the structured entries has other problems.  Any time you duplicate information in two different places, you have an associated risk that the duplication process was not carried out correctly.  The question then becomes how to determine which structure is correct.

Fortunately, I think there is a solution to both of these problems.  The organizational structural of the document is implicitly represented by block elements in the content.  That structure can be made explicit through an algorithmic transform of the content.  Specifying the algorithm by which the structure is duplicated enables use of that structure to convey context information.  The remaining question is whether it would be better to convey the structured entries in the CDA Release 3 document using that algorithmically developed structure (which could require a validation step), or letting the application use the algorithm when it needed to make inferences about context.

I think at this point, I'm in favor of duplicating the organizational structure in the entries, but could be readily convinced that it isn't necessary.  An advantage to not carrying the implicit structure is that you can have other parallel structures in the structured entries.  This gives you another "view" of the data that is separated from the presentation view.  An example of where this could be valuable is in developing treatment plans.  A component of a treatment plan is the particular condition or conditions which it treats.  That could be the context for that portion of the treatment plan, which could simplify the transmission of the treatment plan content.

An additional benefit of using XHTML to convey document structure is that it allows lists and tables to also be used to convey structure.  Tables and lists are organizers of information even more so than document sections.  Why should they not be treated in the same fashion in CDA?

I expect the discussion and development of this formal proposal will take quite a bit of time over the coming months.  It will certainly result in some challenging problems that need to be solved if we are to move forward in this direction.  I look forward to those challenges.

1 comment:

  1. I see this as a very encouraging development. It's easy to fall into the attitude that *my particular data* is too complex to be represented in something as pedestrian as (X)HTML. But I think it is important to question that assumption. As you point out, everyone can author and manage HTML and I think the microformat people have shown us that complex data can peacefully coexist with HTML.

    Of course you would need to be careful. HTML authors are very accustomed to (mis)using markup in purely presentational ways. Tables are for tabular data, please!

    ReplyDelete