Thursday, March 18, 2010

More on UCUM

The Unified Codes for Units of Measure or UCUM is a code system maintained by the Regienstrief Institute.  It was principally developed by Dr.'s Gunther Schadow and Clem McDonald.  UCUM supports the representation units of measure with less ambiguity than existing combinations of the ISO and ANSI Units of measure used in HL7 Version 2.

UCUM is an interesting code system because:
A. It is infinite in size.
B. It describes a grammar for creating codes
C. There are multiple codes for representating effectively identical concepts.
D. There are reductions to canonical forms.
Note: The same things can be said for unit expressions using either the ISO or ANSI unit standards, so UCUM isn't unique in this regard.

Infinite Size
We tend in information processing systems to like complete lists, but units have mathematical properties that defy such list making.  Units are compositional. Any unit can be multiplied or divided by another unit, or raised to a power, or be multiplied (or divided by) a constant to produce another unit expression.

UCUM allows you to compose two unit expressions with / or . to perform multiplication or division, or append an exponent to a unit expression to raise it to an arbitrary power (e.g., m3 for cubic meters).  You can multiply or divided by integer constants (e.g., s = min/60).  There are shorthand representations for constant powers of 10 (e.g., 10 to the third power is 10*3).  Even better, common metric prefixes can be used with metric units to generate for example: kg or mg from g. 

A Grammar
Computer geeks like me learn about grammars for languages pretty early on in our education.  The grammar for a programming language can run on for pages.  The grammar for UCUM is remarkably short, containing about 10 productions.

The key things to remember are that:
1.  Parenthesis override precedence rules,
2.  Exponentiation has higher precedence than multiplication (e.g., 10.m2 = 10 meters squared) or division
3.  Constants may only be positive integers when used with multiplication or division, but may be positive or negative when exponentiating.
4.  Anything inside {} is an annotation with an effective unit value of 1.
5.  Metric prefixes may only be used with metric units.  You cannot say "kilo-inch" in UCUM.

Multiple Representations and Canonical Forms
From an engineering perspective you can say kilometer a bunch of different ways in UCUM:
1000.m
10*3.m
km

This may or may not be a "defect", I won't even bother arguing the point.  What is important is that UCUM tells you how you can determine that these are identical, because it gives you all the information you need to rationalize these expressions into a canonical form.  The one thing that UCUM doesn't do is define a canonical form, but it one be readily developed algorithmically.  I leave that excercise to the reader. 

But just in case you need to use UCUM units before you work that excercise out, you can check out some worked examples.

P.S.  Thanks Paul for the inspiration...

1 comment:

  1. Also used extensively throughout the DICOM standard. For many more examples see the templates in DICOM Part 16.

    ReplyDelete