Convert your FHIR JSON -> XML and back here. The CDA Book is sometimes listed for Kindle here and it is also SHIPPING from Amazon! See here for Errata.

Tuesday, December 27, 2011

Element Order Schema Important Not IS XML

A discussion on the IHE XDS Implementors Google Group spawned this question:

Why is order important in XML Schemas in cases where the order really doesn't matter with respect to data structuring requirements.  After all, the real issue is just that you have some number of child elements of a particular type.  Why should order matter?

There are a couple of answers to this question.

First is simply that the order is important when the schema says it is.  There are cases where the order of a collection of items has meaning.  This usually occurs in narrative.  Language is quite sensitive to order -- at least in "proper" construction, but as my wife often notes in our communication, "Order word important not is."

It does make sense to put the table header element (<thead>) before the body of the table (<tbody>), and it eases processing (it also simplifies table formatting to put the <tfoot> before the <tbody>).  There are also cases where order really doesn't matter.  Compare for example, dates in European locale to those in the US locale.  Today can be encoded as either <month>12</month> <day>27</day> <year>2011</year>, or <day>27</day> <month>12</month> <year>2011</year> or even <year>2011</year> <month>12</month> <day>27</day> without any loss of meaning.

In SGML DTD's there are three different operators for content models:

  • The comma (,) created lists (xsd:sequence in XML Schema).
  • The vertical bar (|) created choices (xsd:choice in XML Schema)
  • And the ampersand (&) created conjunctions where all elements needed to be present in any order (xsd:all in XML schema.

The ampersand operator was not actually supported by the XML DTD content specification.

Why would you make order important when there is no other requirement for it to be so?

Another reason why order is important is that it makes parsing XML easier to do.  Most XML Schema constructs can be parsed quite simply without any look-ahead using finite automata.  While the "xsd:all" construct can be readily converted to a data structure that can support parsing, you cannot use a finite automaton indiscriminately.  The number of states needed to support the "xsd:all" construct is on the order N! where N is the number of particles in the list of elements allowed. For example, in the date example given above: the first element could be year, month or day.  After that, there are two ways left to choose the next element, and then only one to choose the last.  See the list below.

  1. <Year>
    1. Y M D
    2. Y D M
  2. <Month>
    1. M Y D
    2. M D Y
  3. <Day>
    1. D Y M
    2. D M Y

XML (and XML Schema) is designed to be parsed and validated without using look-ahead because SGML (its predecessor) had the same constraint.  So parsers that deal with "xsd:all" typically keep a list of the particles, and do the validation that all of them were used no more than once afterwards.

Even so, it's much simpler to create a parser that doesn't need to worry about this sort of stuff.  This is why the & content model construct does not appear in XML 1.0 DTD content model, and was only reintroduced with XML Schema.

Another reason why order is important has to do with how elements are extended in XML Schema.  An complex type can be defined that extends another complex type by appending elements to the end.  This makes it easy for the parser to figure out what goes where.  Essentially what it does is create an xsd:sequence containing the content-model of the base type followed by the content model of the new type.  Which means that sequences extend naturally (because a sequence of two sequences is the same as the one sequence with all the particles of the two sequences put together in order), but xsd:all groups do not (becuase a sequence of two xsd:all groups is not the same as one xsd:all group containing the particles of the two).

Now, a brief note on how to create extensible Schemas.  The trick is to use wild cards.  You will typically have a complex type definition for the content of an item, and that will contain some sort of group (usually a sequence).

<xsd:complexType name="extendableElement">
  <xsd:complextContent>
    <xsd:sequence>
      <xsd:element name="foo" type="fooType"/>
      <xsd:element name="bar" type="barType"/>
      <xsd:element type="xsd:any" minOccurs='0' maxOccurs='unbounded' />
    </xsd:sequence>
  </xsd:complexContent>
  <xsd:anyAttribute/>
</xsd:complexType>

What the wildcard at the end does is allow any element to be included at the end of your sequence, or any attribute to be added to the extendable element.  You could include namespace='##other' to say that the element (or attribute) has to be from a namespace other than your schema's target namespace.  This is in fact what I proposed as being the best way to extend HL7 V3 XML these days.

And so, now you know why order is important in most XML Schemas, even when it is not.