Wednesday, December 15, 2010

Another XSLT Trick

Today I had to process an XML document that was made up of several sequences of elements that needed to be grouped together in an outer element.  This is a pretty common task for me when converting between different formats.  Usually I use EXSLT sets capability to deal with list intersections but this time the XSLT parser I deployed to didn't support it, and I didn't want to have to test another implementation.

So, looking at this XML:

‹items›
‹item›Outer Item 1‹/item› 
‹item›Inner Item 1.1‹/item›
‹item›Inner Item 1.2‹/item› 
‹item›Inner Item 1.3‹/item›
‹item›Outer Item 2‹/item› 
‹item›Inner Item 2.1‹/item›
‹item›Inner Item 2.2‹/item›
‹item›Outer Item 3‹/item› 
‹item›Outer Item 4‹/item›
‹item›Inner Item 4.1‹/item›
‹item›Outer Item 5‹/item›
‹/items›

Assume you want to produce this:
‹items›
 ‹outer›‹title›Outer Item 1‹/title› 
  ‹item›Inner Item 1.1‹/item›
  ‹item›Inner Item 1.2‹/item› 
  ‹item›Inner Item 1.3‹/item›
 ‹/outer›
 ‹outer›‹title›Outer Item 2‹/title›
  ‹item›Inner Item 2.1‹/item›
  ‹item›Inner Item 2.2‹/item›
 ‹/outer›
 ‹outer›‹title›Outer Item 3‹/title›‹/outer›
 ‹outer›‹title›Outer Item 4‹/title›
  ‹item›Inner Item 4.1‹/item›
 ‹/outer›
 ‹outer›‹title›Outer Item 5‹/title›
 ‹/outer›
‹/items›

  1. So the outer items wrapper is the same.  Inside it you have more work to do.
  2. The first step is to process all item children of items that contain the text "Outer Item" (or whatever other matching criteria signals the start of a new list. 
  3. The next step is to find the end of the list of inner components.  That's simply the next "Outer Item" that follows this one in sequence.
  4. Now, the trick.  You process each following sibling of the Outer Item that precedes the end point.
Now for the XSLT that does the work.
‹xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"›
    ‹xsl:template match="items"›
        ‹xsl:copy›‹!-- 1 --›
            ‹xsl:apply-templates 
              select="item[contains(.,'Outer')]"/›‹!-- 2 --›
        ‹/xsl:copy›
    ‹/xsl:template›
    ‹xsl:template match="item"›
        ‹outer›‹title›‹xsl:value-of select="."/›‹/title›
            ‹xsl:variable name="endPoint" 
                select="following-sibling::item[contains(.,'Outer')][1]"
                /›‹!-- 3 --›
            ‹xsl:for-each 
                select="following-sibling::item[
                  . = $endPoint/preceding-sibling::item
                ]"›‹!-- 4 --›
                ‹inner›‹xsl:value-of select="."/›‹/inner›
            ‹/xsl:for-each›
        ‹/outer›
    ‹/xsl:template›
‹/xsl:stylesheet›

Now, if you think about what this is doing, it doesn't seem to be the MOST efficient way to process because you are creating two lists using the preceding-sibling and following-sibling axes in XPath, and then intersecting them.  But:
1) these lists are likely to be delayed in their complete evaluation until needed, and
2) A smart XSLT processor can and SHOULD recognize this idiom and use a more efficient evaluation

It's a handy thing to know how to do when you have to process lists of stuff that doen't use list-type XML markup to indicate list boundaries, and you don't have access to Java or EXSLT extensions to support procedural programming.

Now, can you guess what I was doing?  It's related to a previous post on some other blog...

0 comments:

Post a Comment