Saturday, July 6, 2013

Deduplicating Lists in XSLT

I'm in Saudi Arabia for a couple of days before I head off to a 10-day vacation in England with my family.  I'm not sure how much I'll be writing on my vacation, we'll see how it goes.

I have a few projects to finish up before I get to take some well-deserved time off.  In one of those projects I needed to generate a set of lab results, ordered by the date and type of test performed.  However, the XML I was presented with was not normalized in a way that would make that easy.  Instead of each result being organized into separate panels with the panel reflecting the test performed (a complete blood count), with each result in the panel, it was instead organized into a table where each result included the panel, the result and the date performed.

That can be pretty challenging to handle in XSLT.  What I wanted to do was loop over each separate panel, which could be identified by the panel type and date performed, and then within that list, iterate over the separate results.  I could do it using the EXSLT set:distinct function, but this was one of those cases where the code I'm writing doesn't allow me to use EXSLT.  I suppose I could have changed the rules, seeing as how I was the one who made them, but I had gotten pretty far into the code without needing EXSLT and I didn't want to add third party dependencies.

I've done this before, but it always relied on some rather tricky code using the preceding and following axes. I knew there had to be a better way, so I started searching and found the solution. It shows up in the XSLT: Programmer's Reference by Michael Kay, but you have to know where to look for it.

The key as it were, is in the key element and key() functions.  The key element allows you to define an index on a set of elements that you want to find.  It's syntax is:

<xsl:key name="name" match="match pattern" use="key expression">

The name specifies the name of the key and is used later in the key function.  The match pattern provides the list of elements for which you want to generate a key for.  The key expression defines the expression that generates one or more keys in the context of the matched node.

Later, you use the key function, giving it the name of the key that you are looking things up from, and an expression that generates one (or more) keys to locate.  The function returns the list of nodes matching the match pattern that have one or more of the specified keys.

For my example, we'll pretend I had a list of items like this (it was more complex than this, but this is sufficient to show you the technique:

<test test="name" date="date" result="result" value="value"/>

I created a key like the following:
<xsl:key name="myKey" match="test" use="concat(@date,@name)"/>

Thus, each row was indexed by name and date.  

The next step was to select all the test and deduplicate them based on their keys, producing a list of elements with unique key values.  Here is the code that does that:

  <xsl:variable name="tests" select="./test"/>
  <xsl:variable name="distinctTests" 
    select="$tests[generate-id() = 
                   generate-id(key('myKey', concat(@date,@name)]))[1])]"/>

The tests variable defines a list of tests that I want to deduplicate.  The distinctTests variable iterates over each test element in $tests, selecting it if the unique id of the node matches the unique id of the first matching test that has the same key identifier.

One problem with this technique is that it fails when your selection context and your key contexts aren't aligned.  I didn't run into that issue with the problem I was working on.  I'm sure there is a way around it, but I do need to get some sleep.


Post a Comment