Saturday, August 29, 2015

Stupid XSLT Tricks for OID and UUID recognition

I'm building a FHIR to CDA translator to convert a FHIR Composition in a Bundle to a CDA Document.  One of my challenges is recognizing identifiers that are already in OID or UUID form.

This is a simplified token matching problem.
A UUID is in the form ########-####-####-############, where each # is one of the hexidecimal digits in [0-9a-fA-F].  To test for this, I can take the string, translate all hex digits into # characters and then test for a match to the form.  This test can be used in a choice as follows:

<xsl:when test="translate($value,'0123456789abcdefABCDEF','######################')                   = '########-####-####-############'">
  <!-- ... stuff to do when $value is an UUID -->

Handling OIDs is a little more difficult.  The pattern there is number[.number]*, where number matches the pattern 0|[1-9][0-9]* (ensuring no leading zeros in the number).

First off, we can reject anything that is not solely made up of digits or the . character.  That's an easy task for translate again.  The expression translate($value,'0123456789.','') will turn any string in $value to the empty string if it is made up of the specified characters.

We also need to make sure that the OID neither starts with, nor ends with a . character.  The first just uses not(starts-with($value,'.')).  It would be nice if XSLT Version 1.0 supported ends-with, but it doesn't.  So we have to find the last character using substring, and check to see that it isn't a . character.  That expression is substring($value,string-length($value))!='.'.

Next, we need to make sure than no sequence of digits starts with 0 except the single digit sequence containing 0.  Let's create a new string called testValue as follows:

<xsl:variable name='testValue' select='translate($value,'123456789','#########')/>

If testValue contains a .0#, then we have a problem, because it contains a number with a leading 0. But we need to go a bit further than that, because two leading zeros are also a problem, so we need to check to see if it contains .00.  That also catches three or more leading zeros, so we've solved that case.  Oh, and we need to check for the case where the first number contains leading zeros, as it won't have a preceding '.'.  We could either check that one separately, or we could force testValue to contain a leading ., and that would let us reuse the previous test.

Leading to this test for OIDs:
<xsl:variable name='testValue'
<xsl:when test="
    string-length(translate($value,'0123456789.',''))=0 and
    not(contains($value,'..')) and
    not(contains($testValue,'.0#') or contains($testValue,'.00')) and
    not(starts-with($value,'.')) and
  <!-- ... stuff to do when $value is an UUID -->

Using translate to match character classes can also help with other test patterns, for example matching dates, phone numbers,  etc., without needing to rely on an external regular expression library (such do exist though, see EXSLT).

You have to be careful to get this kind of matching right.  You can see the evolution of my OID pattern, which, if I hadn't written it out, might very well have let patterns like 00.1 through incorrectly.

When I use patterns like these in code targeted for production use, I'm very careful to document what the code is doing, because it sure as hell isn't obvious.  If you use these tricks, do the same for the poor slob who has to maintain your code after you have moved on.


P.S.  Why is recognizing OID or UUID important in FHIR translations?  I'll leave that to your imagination until I cover the bigger challenge (FHIR to CDA) in detail.

Friday, August 28, 2015

IHE Quality, Research and Public Health Supplements Published

IHE Quality, Research and Public Health Technical Framework Supplements Published for Trial Implementation

The IHE Quality, Research and Public Health Technical Committee has published the following supplements to the IHE Quality, Research and Public Health Technical Framework for trial implementation as of August 27, 2015:
  • Aggregate Data Exchange (ADX)
  • Data Element Exchange (DEX)
  • Healthy Weight (HW)
  • Quality Measure Execution - Early Hearing (QME-EH)
These profiles may be available for testing at subsequent IHE Connectathons. The documents are available for download at Comments on all documents are invited at any time and can be submitted at Quality, Research and Public Health Comments.



Wednesday, August 26, 2015

IHE NA Connectathon 2016 Kickoff Webinar--September 14


What is an IHE Connectathon?

Check out the IHE NA Connectathon testing floor.



IHE NA Connectathon 2016 registration
opens soon!

Can your crew brave five days of intense interoperability testing with 100+ vendors and 550+ engineers at the IHE North American Connectathon? Discover if you have what it takes to participate — register now for the Kickoff Webinar on Monday, September 14.

Monday, September 14, 2015 |  10:30 am — 11:30 am CT.

Showcase your achievements at the HIMSS Interoperability Showcase™ at HIMSS16
It's go time. Take your game to the next level and demonstrate live interoperability at the HIMSS Interoperability Showcase.
Reserve your spot today. Send in your contract by September 1 for early bird renewal rates.


Monday, August 24, 2015

On ONC's Ten-Year Vision

A post about this document showed up on Facebook today, and I was asked what I thought about it. So here are my thoughts. Note: The document referred to in this post is about a year old.
ONC recently release its 10-Year Vision to Achieve an Interoperable Health IT Infrastructure.  The first comment I have on this document is that they have covered about the right timeframe for implementing the vision.  Much of ONC's prior vision was very much tied to the deadlines specified in the HITECH act, and those timelines were based on some pretty unrealistic assumptions about the ability of provider organizations to adopt new technologies.  It's not their fault that congress had some very aggressive timelines, that's often what happens when too much of a program gets tied to legislation, and especially when you are talking about something like ARRA, principally a spending bill.

At 13 pages the document is quite short, and frankly it doesn't say much. Most of the material in the initial two thirds of the document is aspirational goals being set by an organization that won't have an effective leader until 6 months after the next presidential election.  Those will almost assuredly change under the next administration.  Even so, there are some good ideas in it.  Figure 1 should be familiar to anyone who's seen Doug Fridsma's presentation talking about orders of magnitude in Health IT a couple of years ago now.

I think some of the aspirations could be moved around in time depending on various opportunities in the industry.  For example, the ability to integrate patient data into the medical record could occur sooner rather than later.  Many people and organizations are actively working on this.  However, I'm quite happy to let that work continue on without ONC trying to drive it.  It will likely work out for the better in the long run.

The section on "How will we get there?" is interesting.  Some general comments on that section follow:

Building Block #1: Core Technical Standards and Functions
In general, the proposed activities seem reasonable, but only if they are integrated with existing efforts (e.g., granular data access using HL7 FHIR, authorization and authentication with Blue Button Pull and the Argonaut efforts).  We don't need more disparate efforts muddying up the works.  To suggest that the JASON report proposed or suggested an actual architecture is laughable. Marchitechture maybe, but there is no architecture there.

S&I Framework was designed at a point in time when ONC had $100 million to throw at the interoperability standards problem.  That figure is nearly twice their annual budget today, and there's little going on.  Continuing that paradigm won't work.  Neither will funding coordination with standards organizations $20K at a time.  Something else is desperately needed, we still have no progress on what it should look like, and it's a political hot potato that nobody really wants to take on.

Building Block #2: Certification
This building block suggests that there is a demand for more certification, but doesn't suggest who is making that demand.  I'd be interested in the answer to that question, because as far as I can tell, it is neither healthcare providers or vendors who are requesting more certification.  While this building block suggests that certification will assist in expanding exchange to other settings, there are still no incentives for those other settings to do so (see #4 below).

Building Block #3: Privacy and Security Protections
I'm surprised this isn't number one, but also glad.  It seems that ONC has recognized that their are existing technical protections for security and privacy.  The more challenging issue here is not the technical issues, but rather the policy issues, especially in the understanding and implementation of policy (such as the oft-misquoted HIPAA).  All too often, security and privacy is percieved as being far to restrictive about sharing of data WITH patients, as we all know.  That's my number one issue that should be addressed.

Building Block #4: Supportive Business, Clinical, Cultural, and Regulatory Environment
I agree that more incentives are needed in different settings, but that was so softly stated in this building block that I'm wondering who they are trying to hide it from.  On the barriers to access, it is NOT the patient lack of knowledge that is the most significant barrier.  It is more the provider lack of providing patient access.  If data was more accessible and easier to access, we'd be doing it.  I don't have my health data for lack of knowledge about how to ask for it.  I don't have it for lack of knowledge about how to be successful in asking for it in a way that will actually work interoperably. That's not a lack of knowledge on my part, that's really due to a lack of implementation on the provider's part.  And much of that stems from a lack of knowledge for them about how to accomplish that safely (without fear of regulatory repercussions).

Building Block #5: Rules of Engagement
I think the biggest focus on rules of engagement here should be patient engagement in establishing the rules of the road.  The voice of the healthcare provider is important, but more attention needs to be paid to the voice of the most disenfranchised stakeholder in all of this effort: The patient.  I'd like to see a world where ePatient Dave and Regina Holliday were able to set policy about health information exchange, rather than the exalted C-levels of the healthcare providers or HIT vendors. Let patients design it, and then give the others the necessary tools to implement it.  The world (and my healthcare) would be in a better place for it.

Friday, August 21, 2015

It's time again...

It's that time again to award the next Ad Hoc Harley. This one goes out to a young fellow who has been very instrumental in developing CDA standards for years.  I've watched him grow from the new kid in town to being a very effective leader in HL7 and elsewhere.

He does a ton of work, is very thorough and organized, and just a general pleasure to work with all around.  He makes sure every voice is heard, and is able to deal with cranky people like me with a great deal of grace.

This particular award goes out to him for his leadership in the development of the C-CDA Release 2.1 DSTU.  When Structured Documents started looking at the backwards compatibility problems (at my urging), he joined in and helped a good deal with our initial analysis.  When we decided that we could, and therefore must do something to improve the backwards compatibility of the C-CDA 2.0 specification, he stepped up to the plate in a very big way.

He convinced his stakeholders to let him lead this effort, and also got them to provide resources to help with the detailed analysis.  His familiarity with Trifolia and the folks who developed it also helped a great deal in the early stages of the C-CDA 2.1 DSTU Update project to get us all going. This DSTU, by the way, was started in a project that was initiated less than 3 months ago.  It was a mission critical problem for Health IT, and is probably the quickest DSTU Update in HL7 history. Quite honestly, the healthcare industry owes this fellow a great deal of thanks for development of the latest edition of the DSTU.  We quite literally couldn't have done this without him.

This certifies that  
Brett Marquard of River Rock Associates

Has hereby been recognized for his contributions towards Meaningful Interoperability using the HL7 CDA Standard

P.S. C-CDA DSTU 2.1 is now available to HL7 Members and will be publicly available to non-members in 90 days (or very likely sooner if formally named in the ONC Certification and Standards Final rule expected in the next couple of months).

Tuesday, August 18, 2015

Heuristic methods and ad hoc tools for little (big) data

I started off my career in software development writing mailing list data entry and management tools for a small marketing/communications firm in Tallahassee, Florida.  I never realized that the skills I would learn at that time to clean up a mailing list would be so useful later in my career.  The process we used for cleaning up the lists was to use various ad hoc tools, sed/grep/sort/cut and pattern matching processes to parse and organize the lists in various ways.  Then we'd start looking for patterns of things we could clean up, mostly ways to deduplicate the list, or correct systemic data entry problems.  Later I would work alongside some folks who would take documents (very large files) from electronically formatted reference works using macros from troff or nroff, and turn them into SGML (the predecessor to XML for those of you before my generation) using very similar techniques.

I would later see similar techniques used in natural language processing, patient matching (very similar to mailing list deduplication, almost identical in fact), and a variety of other uses.

Last night I used one of these techniques again, find an outlier, determine the cause for it, and then systematically look for others like it based on the cause.  Those other cases often don't stand out without understanding the problem.  I find myself amused that techniques I learned doing very simple computer programming, drudgery almost, very early in my career, still find their ways back into my daily work doin high-falutin archy-teckture.

Such a simple process really, use a simple and fast tool to fix 80% of the problem, do it again on the remaining mess, do it once more, and then manually review the last 0.8% for anything else.  Five thousand data items is quite a bit to process manually.  In the grander scheme of things it is still little data, but for the person (me) who has to do it, it can feel like big data.  But after applying these techniques you can often finish such a problem in much shorter time than you'd think.


Thursday, August 13, 2015

Hawaii Day 4