Monday, March 4, 2013

Displaying a NonXMLBody in HL7 CDA LateNightAtHIMSS

While I'm at HIMSS, I'm planning on doing some catch-up.  While my days and evenings will be filled with HIMSS13 stuff, I expect that the hours between arriving back in my room, and when I finally head to bed will include a bunch of standards work that I've been lagging on.  As usual, I try to include that in blog posts as I can. These will be tagged with LateNightAtHIMSS.  Here's the first:

One of my to-dos from the HL7 Working Group Meeting back in January was to write some text for the claims attachment workgroup on how to display the non-XML body in CDA.  I'm just getting around to that now, and have realized that I'll probably need to radically shorten the content for the HL7 specification, but didn't want to lose the thinking that went into it, so I'm putting it out here.  It's not a complete how-to, but it should give you some ideas about how to accomplish the task.


Display of CDA is usually a matter of transforming the narrative content using an XSLT style sheet. However, when exchanging information using the unstructured document using a <nonXMLBody>, this mechanism does not work without additional engineering. The header can be readily transformed using a stylesheet. However, the body of the document must either be made reference-able by the browser in a URL scheme it recognizes, or separately decoded into its binary format.  All content must be rendered using an appropriate viewer for the MIME type of the document.

This requires several steps, including configuring the browser to display the non-HTML content if needed (e.g., for application/pdf, application/msword or text/rtf content), linking to externally referenced content, or linking to and decoding the embedded base-64 encoded content.  In addition, you must address security concerns that might be introduced by displaying content such as application/pdf, application/msword, text/rtf or text/html which could include scripts.

Configuring the Browser

In order to display the content of the <nonXMLBody> element within a CDA document, the browser component must be configured to support display of the various MIME types that are allowed within the unstructured document. While displaying the CDA header of such a document can be done using an XLST stylesheet, displaying the body content is not just a simple matter of transforming it using XSLT.

The mediaType attribute of the <text> element identifies the type of content. The following types are supported in the unstructured document for word processing/text formats:
  • application/msword
  • application/pdf
  • text/plain
  • text/rtf
  • text/html
The following MIME types are supported for graphic formats:
  • image/gif
  • image/tiff
  • image/jpeg
  • image/png
Most modern browsers can display text/plain, text/html, image/gif, image/jpeg, and image/png without any difficulty. However, they must be configured (e.g., with plugins or other software) in order to property display application/msword, application/pdf, text/rtf and/or image/tiff files. The browser component used by your application must be configured to properly handle the display of these files. Files in the image/tiff format are often challenging because there is a great deal of variation in how that standard is used (e.g., multi-page tiff images), and few plugins handle all the variations well.

In addition to configuring the browser, you must also address other issues. There are two ways that content can be included within the CDA document: by reference or by value.

Linking to Content Included By Reference

The unstructured document can reference the narrative text via a URL, or embed base-64 encoded content directly into the <text> element of the CDA document. When the content is referenced via URL, the URL appears in a <reference value="URL"> element inside the <text> element of the <nonXMLBody>. This can be transformed into an <iframe src="URL"> in the XSLT transformation. Transformations may be needed on the URL to correctly point to the location of the referenced content. The URL used inside the CDA document may use links that refer to part of the transfer package used to encode the CDA document and its referenced files. However, these could use a URL scheme or syntax not supported by the browser being requested to display the content.

For example, CDA Release 2 describes a a way in which a CDA document can be sent using the multipart/related MIME type.  The first component is the CDA document, and subsequent parts include the resource used to contain the nonXMLBody.  Each component of the multipart can be given a URI in the Content-Location: header associated with it.  That URL applies within the context of the multipart resource itself, but  those content locations will not be understood by common web browser components.  So, the application software needs to address how those URLs will be translated to enable access to each component as a separate resource.

Linking and Decoding Content Included By Value

The content of the unstructured body can also appear base-64 encoded in the CDA document. The <text> element will have a mediaType attribute indicating the type of content, and a representation attribute (set to the value B64 to indicate that the content is base-64 encoded. The <text> element will contain the base-64 encoded content as text as in the following example:

<text mediaType="text/rtf" representation="B64">e1xydGY...</text>

Embedded base 64-encoded content cannot be reliably be transformed to HTML via an XSLT stylesheet for all content types in all browsers. One mechanism that works to some degree with image formats is use of the data: URL scheme.  This URL scheme allows base-64 encoded data to be sent to a browser in an URL in the form:

data:[<mediatype>][;base64],<data>

Most modern browsers support the data: URL, but many (especially those in common use), limit either the size of the that can appear, or the type of content that can be shown (for example, some versions of Internet Explorer only allow use of image formats in an <img> tag, which prevents it from being used to support application/msword, application/pdf, text/plain, text/plain and text/rtf.

One way to display these is process the content of the CDA body twice.   The first pass renders the header, and generates an appropriate <iframe src="URL"> link to call the second pass.  When the browser attempts to render the content of the <iframe>, it calls back on the server to get the contents of the specified URL.  That URL is encoded during the first pass to initiate the second pass.  The second pass base-64 decodes the binary content and returns it to the browser as a binary stream of data, which it (or its plug-in) renders appropriately.

Security Concerns with the NonXMLBody

The nonXMLBody of CDA can include application/msword, application/pdf, text/rtf and text/html. Each of these formats can include script which could be run by the plugin running in the browser.  It may not be easy to disable scripting in commonly available plugins, or in some cases (such as when displaying base-64 decoded text/html), it may not be possible to turn off scripting (because it may be used by the browser to enable other parts of your application.

3 comments:

  1. I always appreciate your insightful, ad-hoc tutorials, Keith.

    Thanks,
    TJL

    ReplyDelete
  2. Hi Keith, was reading your post and looking at the v3 specs (ed. 2010), and I can't find a "representation" attribute for ED, is that correct?
    Maybe we should use "compression" to specify the B64. What do you think?

    ReplyDelete
  3. Hi Keith ,

    I have been trying to embed/sow a pdf in CCD document. I am able to show an image using the xml code given in this link: https://justpaste.it/10xab

    However, when i replace the mediatype="application/pdf" and reference value="http://www.pdf995.com/samples/pdf.pdf" then pdf doesn't appear. Same xml code works for image but nor for pdf or any other type(text/html) document. Could you please guide me in right direction as to what am i missing here.

    Thanks a lot!

    ReplyDelete