Friday, November 12, 2010

Frustrated by Lack of Standards Support in IE...Again

Recently I posted on this blog about Self-Displaying CDA.  I'll be writing more about that next week.  An equal challenge exists for Level 1 wrapped content using the CDA standard.  Numerous people have tried to create a simple XSLT transform that would allow a CDA document containing a nonXMLBody to be displayed.  Unless that body happens to be text/plain, it's been almost impossible to do, especially if Base 64 encoded.

I did manage to deal with base 64 encoded text/plain in an XSLT stylesheet that called out to Java using Xalan pretty easily.  The trick is to call out to the URL decoding function passing in the content of the ‹cda:text› element.  The same technique should also work in .Net with any XSLT processer than can call out to a .Net object that decodes base64 content.  You should also be able to do it with an XSLT processor that can support JavaScript (see here for a decoder function)

‹!-- 
  Given textual data, Base64 decode it.  This template is used to 
  Base 64 decode information found in XDS-SD (text) format into 
  a string.
--›
‹xsl:template name="BASE64Decode"›
  ‹!-- The data to base 64 decode. --› 
  ‹xsl:param name="data"/› 
  ‹!-- Force it to be a string (just in case) --› 
  ‹xsl:variable name="theData" select="java:java.lang.String.new(string($data))"
  ‹!-- Create a new Base 64 decoder --› 
  ‹xsl:variable name="decoder" select="java:sun.misc.BASE64Decoder.new()"/›
  ‹!-- Get the decoded bytes --› 
  ‹xsl:variable name="theBytes" select="java:decode($decoder, $theData)"/›
  ‹!-- Turn it into a String (using default character set) --› 
  ‹xsl:variable name="result" select="java:java.lang.String.new(theBytes)"/›
  ‹!-- Return the string --› 
  ‹xsl:value-of select='$result'/›
‹/xsl:template›
But base-64 encoded PDF in XSLT eluded me for quite some time.  The key problem was being able to convert the content and have it be read by the browser.  It just cannot be done because the image or object data has to come from somewhere else.  HTML doesn't support embedding of the content directly, which is what I'd have to do with an XSLT.    I knew that I could create a protocol intepreter for IE that would solve the problem by including the data in a URL, but never had the time to build it.  Of course, the best URL protocol to support would be the data: URI specified in RFC 2397.

IE 8 added data: URI support for images and objects, but only for image formats, with a 32K limit on the size of the URI content.  I thought I'd try it out this evening after my discussion today, especially since I'd loaded up a number of browsers on my machine for the self-displaying CDA project.

Here's the template for the stylesheet that should work:

‹xsl:template match="cda:nonXMLBody"›
  ‹xsl:variable name='url'›
    ‹xsl:text›data:‹/xsl:text›
    ‹xsl:value-of select='cda:text/@mediaType'/›
    ‹xsl:if test='cda:text/@representation="B64"'›;base64‹/xsl:if›
    ‹xsl:text›,‹/xsl:text›
    ‹xsl:value-of select='n1:text'/›
  ‹/xsl:variable›
  ‹object width='600px' height='800px' data='{$url}'›‹/object›
‹/xsl:template›

I've been able to verify that in fact it does work in Windows for Opera and Safari, but fails for Firefox, Chrome (Chrome doesn't even run the XSL transform!) and Internet Explorer.  The really frustrating part of this is that Microsoft crippled the support for the data: URL in IE 8.0.  It only works for image formats, and of course, PDF is not an image format, so it doesn't work.  I can understand why Microsoft did this, becase the data: URL is a security risk.  The key problem is that the base 64 encoded data can contain scripts (even in PDF) that might pass through security filters that don't check the content of the URL, and that can lead to a number of new attacks on the browser.

Internet Explorer 9.0 is currently in Beta, and Microsoft has increased the size of the URL to address other issues, and allowed its use in other objects (including SCRIPT), but doesn't seem to have any plans to support its use for non-image content in the OBJECT tag.  I hope that will change.  Allowing data: URL for the SCRIPT tag is at least as big a security risk as allowing application/pdf content in an OBJECT tag.  From a technical perspective, they've already got most of the code to support it in the product, they just need to find a way to address the security risks.  An adequate solution might be to A) allow users / system administrators to determine the content types that could be transmitted using the data: URI format, and B) invoke anti-virus/spyware scanners on the data: URI content before allowing it to be used, somewhat the same way that they can be invoked before accessing a PDF document over the web.

Now if someone brilliant had happened to write a plugable protocol handler for IE using the data URI format, you should be able to make it work for versions of IE that didn't already support it, and you should be able to hack it (by changing the URL scheme to something like x-data: ) to work in IE 8.  It's a pretty straightforward engineering job.  My .Net chops aren't up to it yet (I like Java).  A good protocol interpreter should be able to run in IE versions from about 6 on.

If the browser vendors had implemented the standards and specifications that were available more than a decade ago (the data URI was defined in 98, XSLT in 99) we'd not even have to worry about it.  But they haven't and so we have to worry about it. At least now you know how to make it work, and have most of the pieces you need, including free code.  I'll leave it up to someone else to make it happen, because I have another project I'm still struggling with right now.

It seems that IE 8 still lacks support for applying CSS table display formats to XML in IE 8, even though they claim to support it for HTML (which I haven't verified), and so I'm looking for ways to hack around it.  Like I said, I'll have more to report on that project next week.  It was approved by the SSD steering division in HL7 this week, and I may just finish the technical details before it gets TSC approval in the coming weeks.  Wouldn't that be a hoot.

3 comments:

  1. Why not use XDS transforms to register a bare PDF that is related by the transform to the XDS-SD? I understand the benefit of XDS-SD is that there is ONE object that contains the CDA header information that holds valuable context. My solution does not eliminate that use for those that can and want to handle it that way; but for those that can't or don't want to they get the bare PDF.

    ReplyDelete
  2. I've got a concern for future use. Some XDS documents are intended for presentation preserving display. A key example of this are many of the cardiology related documents, where layout, color, and symbology are important. Indications of presentation preservation requirements should not be ignored. The concepts of presentation presentation are very new to the typical CDA users.

    These are of more than academic interest. Clinical trials have demonstrated that variation in the choice of symbols made a reproduceable, clinically significant, and statistically significant difference in understanding the results of mammography CAD. Reportable significant patient harm events have been traced to changes in user interface layout. Other human interface work in other fields confirms the importance of presentation controls for human understanding. The cardiologists chose PDF because it was their opinion that differences in presentation would have clinical significance.

    CDA lacks presentation controls at this level. CSS is nothing more than presentation hints, variously interpreted by various browser vendors. For example, rather than use an interoperable color standard like CIE, the CSS controls for colors use RGB numbers, with a statement that these are assumed to conform to the RGB-S color space, although nothing is said about viewing environment or calibration. It can matter that what one machine calls "green" is seen as "yellow" on another machine.

    In the absence of more information, I would assume that use of an encapsulated PDF is an indication that the presentation controls defined by PDF are needed and should be part of the display process.

    ReplyDelete
  3. The point of this post is to show an XSLT that worked (almost) for display of CDA Wrapped PDF, not discussions about whether PDF or Structured CDA content is better for what purpose. Clearly there are use cases for both. I just wish the browsers supported the standards in a way that would make it work cheaply.

    ReplyDelete