Friday, April 29, 2011

A Template Meta-model

-- And miles to go before I sleep.      Robert Frost "Stopping by Woods on a Snowy Evening" 
When I have a really difficult task to finish that my brain is not ready to work on (some would call it writer's block), I try to work around it by looking at related stuff.  I'm now into the analysis phase on the CDA Consolidation ballot (which I have to have a draft of by Wednesday).  But I'm still struggling, so I finished a quick analysis of HL7, Health Story, IHE and HITSP specifications to come up with a document model this evening.  I also looked at previous work from the HL7 Templates workgroup, including the Template Registry Requirements, and the data model I had proposed for a pilot registry project that I was never able to finish for lack of time and assistance.  It went quicker than I thought it would, but since I've been paying attention to templates from these organizations for the past five years, perhaps I shouldn't be surprised.

Let's start with the document model first.

An implementation guide has a title page, table of contents, front matter, a body and back matter.  I focused all my attention on the body.  Within the body there are 4 different kinds of templates which are commonly organized around the CDA Header, the CDA Document, Sections and Entries.  Document and section templates are very easy to analyze, and fit into the same general structure.  It turns out that header and entry templates are very similar as well.

Included in the analysis are my opinions, which appear in italics.


I hereby contribute the remainder of this post to the public domain.

Template
Template Name
This is a short human readable name that quickly describes the template.  It's for human readability.  Changing the name of a template DOES NOT affect its use.
Identifier
This is a single identifier that must be present and is always valued.  HL7, IHE and HITSP all recommend use of an OID as the identifier, with no extension, and so do I.  My first CDA template used extensions, and I've seen several that do also.  It's not a show-stopper either way.
Open or Closed Status
If a template is open, everything not explicitly prohibited is allowed.  If closed, everything not explicitly allowed is prohibited.  This is USUALLY the same for all templates within an implementation guide.  As such, it could be documented in the front matter rather than with each general template.  The current model for the CDA Consolidation guide is to explicitly document it for each template.  I find that extra "gunge" to be distracting, but not so much that I would object to making it available.

A closed template is like a "final" class in UML.  It can no longer be extended to support new use cases (although see implicit inheritance below).

In more than 1000 templates I've looked at or helped develop for CDA (yes, there are THAT many and I've looked at them), I've never seen ONE closed template that I can recall.  I know they exist, just not in the CDA universe .. kind of like anti-matter.   I don't like them because they interfere with reuse, just as I avoid "final" classes in Java. 

Parent Templates:
A template can inherit constraints from one or more parent templates.  HITSP uses multiple parents to harmonize across IHE and HL7 overlaps, and avoided it elsewhere.  IHE PCC uses inheritance from only one parent template.  A PCC template may have multiple ancestors, so there can still be multiple template identifiers.  Inheritance is optional.  A template need not have any parent templates.
I like the one template inheritance rule, but when you get to real-world implementation, it may need to be relaxed.  


  Parent Template ID:
    This is the identifier of the parent template.
  Explicit or Implicit Inheritance:
    This is a boolean flag indicating whether the inheritance must be explicitly expressed in the instance, or whether it is implied by template that inherits from the parent.  If implied, the template need not report its parents.  If explicit, then it must.  This allows reuse of sets of constraints in a template -- and introduces the idea that a template could also be abstract -- never directly instantiated without further constraint.  A closed template can only be inherited from implicitly, never explicitly.

When explicit inheritance is present, you have a couple of options in documentation:

  1. Copy the constraints from the parent template into the documentation of its child, indicating that template as the source
  2. Don't copy them, but do include a link to them from the child.  
IHE and HITSP used the latter model.  The CDA Consolidation guide uses the former.  

I like explicit inheritance because it enables incremental interoperability without having knowledge of the "inheritance" rules. The CDA Consolidation guide templates won't provide any incremental interoperability with epSOS work even if they shared the same constraints because the shared constraints aren't enumerated.  You could still build a document that enabled incremental interoperability by using them, but you don't "get it for free".


Scope:
Scope indicates where the template is applicable.  It can constrain where template may appear, or describe the scope of the use case where it is applicable.
  Scope Narrative:
    A narrative description of the scope.  It need not be present.
  Class: [1..*]
    The set of classes (model artifacts) to which the template applies (e.g., this template only applies to sections in CDA).

Description:
A narrative description that talks about the template.  This is commonly present in IHE and HITSP templates, and is missing from many of the CDA Consolidation templates.  I think this must be present for all templates, but could be argued into "should".
The narrative can include references to other important stuff in the documentation (much like the HL7 PubDB format allows today), including material not generated by a template development tool (e.g., like MDHT or TDB).

Model Diagram:
This is a diagram generated from the template model expressing the template in UML form.
Model Table:
This is a table describing the template using data in the template's model.  It too is machine generated.
Examples:
This is list containing at least one example of the template content.  Two forms are often used:  A skeletal example showing only what the template constrains, and a full example with additional stuff that shows the xml with sensible clinical data.

IHE and HITSP use a skeletal model.  Skeletons are very easy to generate using model data, and are also easy for non-clinical users to create manually (e.g., me).  Full examples that make clinical sense often need human assistance for sample values, et cetera.  Template development tools can support creation of full examples.

Examples really should be present for all but the most obvious stuff, and if you are someone like me, you cannot trust your own intuition about what is obvious.  If it can be machine generated, then why worry, just DO it.

Negative Examples:
Negative examples are helpful when there are obvious ways to mess things up.  They need to be clearly labeled as WRONG, BAD, et cetera.  One of the things I've learned is that the easiest way to show people how to not make the obvious mistake is to show them what it looks like.  It's how I went from this:

‹code›‹originalText›SARS‹/orginalText›‹/code›

to this:

‹code nullFlavor='UNK'›‹originalText›SARS‹/orginalText›‹/code›

Good bad examples are hard to automatically generate.  Tooling could help, but this might be a case where you'd just include external content.  Negative examples are only needed where you want to point out a problem case that isn't obvious.

Constraints:
A template has at least one, and usually more than one constraint (if none, there is no reason for it to exist).  Constraints are the next "reusable" object I found in the analysis.  These are the numbered things in the CDA Consolidation Guide.

Constraint
Identifier:
A constraint has an identifier that allows it to be referenced.  Constraints are reusable (IHE, HL7 and HITSP did it quite often), e.g. X shall be an interval of time contain a low/@value and a high/@value.
Target Component Name:

You need to identify what you are constraining, and most of the time it is only one element or attribute.  The component either a class attribute of the class being constrained by the template, a component of the data type of a class attribute, or an association with another class.  It could also go deeper.

This is where MDHT did not perform well that TDB did.  The "long list" of general header constraints produced by MDHT were a result of several issues, one of which this solves.  The other half of the solution has to do with guidelines (governance) about where to put template boundaries.  The template should begin and end within a single class, using other templates to enforce business rules on associated classes.  This rule can be broken in some cases because sometimes you may need to "go deep" just once, and you don't want to create extra templates just to enforce a rule.  The entryRelationship class in CDA uses three lines of XML.  Why would I want to create templates to say act X must contain entryRelationship Y and entryRelationship Y must contain act Z, when I could more simply say: act X must contain entryRelationship/act Z. 


Target Component Definition:
You often need to explain what the target component is used for.

This is what I've been doing for the recent Reconciliation profile in IHE, building from similar earlier work, and something that the CDA Consolidation guide did not do well.

Trigger Condition:
Sometimes a constraint is triggered by a pre-condition.  If trigger then constraint is a common pattern in several templates found in the CDA Consolidation guide.

Constraint: [0..*]
Constraints can have "sub-constraints" ... and those can have sub-constraints as well.
This is something that TDB does well.


Content Description [0..1]:
This is human readable text explaining what the constraint is doing.  It is needed when the effect of the constraint is not-obvious.  For example, the XPath representation of a constraint on effectiveTime/@value fixing the precision to be at least the second would look like this:


string-length(substring-after(translate(concat(@value,'+'),'-','+'),'+')) > 16

But the description can simply say: must be precise to the day.

For implementation purposes, I'd recommend use of ISO Schematron and XSLT 2.0 because then you can define functions for precision of dates which are simpler to read (at least I think you can).


Data Type Restriction [0..*]:
You might want to say that effectiveTime must always be an IVL_TS, or that code must be constrained to CE.  There need not be a data type restriction.  It might also be tricky, the substanceAdministration/effectiveTime, which IHE limited to [0..2].


Nullible: [0..1]
Can this item be null?  Yes or No.


Cardinality: [0..1]
What are the upper and lower bounds.  If unspecified, then the base model rules.


Conformance [1..1]:  Shall|Should|May|Shall Not|Should Not|Need Not
What is the conformance verb used?


Reason:
A text explanation of WHY the constraint is present.  What does it accomplish?  Why is it here?
Let's stop relying on our collective memories.  Frankly, over time, they aren't all that great (which isn't what I said before the second rewrite).

Precision:
If a QTY data type, what is the precision of it.  And TS is a quantity data type.


Value Set [0..1]:
If coded, what is the value set ... is another reusable artifact.


Value Set
Name:
What do we call this thing?
Identifier:
How does the computer identify it.
Scope:
What does it apply to?  This value set applies to procedures, encounters, lab results, et cetera.
Definition:  
X and all subordinate children from SNOMED CT ... or X, Y and Z, or a combination of these.
Purpose:  
Why does it exist?  What is it used for.
Intensional/Extensional:
How is it defined?  By an operational definition, or an explicit list.
Static or Dynamic:
Is it fixed to a specific vocabulary version, or can it change when a new version is release.
By the way: I really don't like it that we put static/dynamic in next to every value set in the conformance rules.  Static/dynamic are bindings that apply to the value set, not its use.  At least in my world.  I've never seen an implementation guide that would use static in one place for a value set, and dynamic in another.  It's just a REALLY BAD idea.  More gunge to ignore.

General Notes
From the work above, I can create a beautiful CDA Implementation Guide with rich multi-media, XML Schema for GreenXMI templates, a UML Meta-model for templates, or a MIF-based expression of templates.  I'm going to leave all of that in the hands of others.  I have to get back to my review the CDA Consolidation guide.  


What's not in here are the components of the meta-model important to governance and provenance.  That's great stuff, but not a requirement for MDHT to be able to do what we need yet. 


Having run through this excercise, I can now exchange my axe for a scalpel -- and when we go through reconciliation, I absolutely won't take an arbitrary deadline as an excuse for not doing it right.  It takes as much as it will take -- if you want quality, you have to give it time.


I don't know that I'm back to done yet, but I sure hope I'm getting things smart.  I have three more deliverables before Monday before I call myself done again.

1 comment:

  1. And if you want to follow the evolution of this meta-model, check back with the SI Framework Documentation Workgroup Page. I'm hoping they'll move forward with it, but as I said ... I have miles to go yet...

    ReplyDelete