Tuesday, November 15, 2011

A Short Primer on XQuery

I was first introduced to the XQuery standard over 10 years ago when it was first in development as a draft W3C Standard.  Back then, it was a mess and I swore I would never use it for anything.  It's ten years later, and what finally emerged looks nothing like those early drafts.  I discovered last week that my favorite XML Editor has an XQuery debugger built in, so I decided that I would use it as the basis for my next demonstration of implementing Query Health over the HL7 Quality Measure Format standard.  I'm still in the process of building that implementation (as you might recall, I plan on querying a collection of CCD documents in that implementation).

So, I figured that it was probably time to learn XQuery, and while I'm learning, you can learn with me.  The first query is very simple:

<result xmlns:cda="urn:org-hl7:v3">
{  /cda:ClinicalDocument/cda:recordTarget/cda:patientRole/cda:id

The query results an XML <result> element that is comprised of a copy of all XML elements matching the XPath expression in between the { and } signs.  I could have use any element name to wrap the results, because XQuery is like XSLT in that it enables you to construct XML entities as results using declarative expressions.  What I learned here is that any XPath expression could be an XQuery expression.

The next thing I learned about was let statements.  The syntax of the let statement is also pretty straight-forward:

let $variableName := expression

So, if you want to declare variables, you can do that to.  The expression can be just about any XQuery expression.  You can put a bunch of let statements together using commas.

So far, so good.  Now let's do something with the if statement:
if (expression) then expression else expression

So, now we can make some simple decisions.  Next is dealing with complex selections.  XQuery was intended to be to XML as SQL-based report writing languages were to SQL Databases.  So, you need something like the SQL "SELECT WHERE" clauses.  This uses the "FLWOR" structure.  FLWOR stands for "for, let, where, order by, return".  Actually the order is let, then for, but LFWOR is harder to say than FLWOR.

let $v = expression
for $v2 in expression2
where expression3
order by expression4
return expression5

The for clause iterates over each element returned by expression2, assigning it to $v2.  The optional where clause selects those items for which expression3 evaluates to true.  The optional order by clause provides an expression which can be use to sort the returned results.  The return clause indicates what elements are returned.

Let's look at a simple example.
let $v := fn:doc("myccd.xml"), 
    $o := "2.16.840.1.113883.19.5.1091"
for $i in $v//cda:id
where $i/@root = $o
order by $i/@extension
return $i/..

This creates two variables, one named $v for a CCD document, and another one named $o containing a particular OID.  Then for each cda:id element in the document, it finds those where the OID of the id  (the root attribute of the id element) matches the particular OID I mentioned.  Then, it returns the element that contained that id.  Then entire set of items returned is ordered by the extension attribute.
One cool thing about XQuery is that expressions can be simple XPath expressions,  static XML, or XQuery statements, or various combinations of the three.  Another way I could have set $o is as follows:

let $o := <cda:id root='2.16.840.1.113883.19.5.1091'>/@root

It makes for complicated reading at times (especially when writing XSLT to produce an XQuery program).  One thing that can make your XQuery program simpler are functions.  XQuery and XPath 2.0 define over 100 functions already that you can use.  But you can also create your own functions using the following syntax:

declare function local:functionName($arg1, $arg2) {

Believe it or not, this is almost all the XQuery that you need to know to build an implementation of a Query created from an EQMF input.

There are two important XPath (2.0) functions that you will also need: fn:doc() and fn:collection()
The first works like the document() function described in XSLT 1.0. It takes a URI as an argument and returns a node set containing the XML document at that URI.  That's really usefull for accessing value sets using the SVS RESTful API.

The collection() function also takes a URI, but it can return a node set containing multiple XML documents.  That's the key to processing a collection of CCD documents, because it is that function that will help us access the set of CCD documents to process.

I'm not quite ready to show the work on the XQuery implementation, but I will say this:  It turns out that only a very simple model of patient data will be needed across all of the hQuery, XQuery and SQL implementations that I manage to put together.  That model is one where you have a patient associated with a number of demographic fields (e.g., addresses, gender, age, et cetera), and links from them to collections of encounters, procedures,  problems, medications (and immunizations), results (labs and vital signs) and other observations.  So, I've stopped looking for publicly available SQL database models for clinical data, and will just use a brain-dead simple model.

1 comment:

  1. Nice! I started using XQuery as a way of plumbing together multiple XSL transforms against data in a native XML database, but like XSLT 2.0, XQuery is very much a vehicle for the power of XPath.

    Like any other client-server technology, there is a balance point between client-side presentation, server-side privacy concerns, and where to address these things - XQuery can be run in the browser through a GWT port, for instance (XQIB.js) - which is admittedly odd.

    Running XQuery directly in an XML editor is great for data mining.