Convert your FHIR JSON -> XML and back here. The CDA Book is sometimes listed for Kindle here and it is also SHIPPING from Amazon! See here for Errata.

Friday, June 15, 2012

Simple Math - A Language for Expressions

One of the ongoing issues in Query Health and in HL7 in general is what language should be used in expressions.  Should it be JavaScript, GELLO, XPath or perhaps even an XML representation.  My answer is none of the above.

Most of the execution environments that Query Health can work in already support computation.  I don't want to introduce a new language interpreter unnecessarily.  I can imagine the discussion I'd have with a few data center folks explaining that I need them to modify their SQL environment to include a JavaScript interpreter, for example.  And GELLO, while it is a standard, lacks implementations.  Other representations have similar challenges.  Finally, the idea of using XML to represent simple mathematical expressions that people are already familiar with bugs me.  Yes, I do program in XML (XSLT to be specific).  I can tell you personally, that I hate the additional noise provided by that extra syntax.  It makes code really hard to read.

For Query Health, I can see a few cases where we need some fairly simple arithmetic computations.  To move beyond counting and support higher statistical functions we would need addition, subtraction, multiplication, division and exponentiation.  We probably don't need transcendental functions, but there are some operations for which we do need some simple functions over one or two variables.

What I really want is a simple mathematical expression language that I can easily implement in programming environments as diverse as SQL, XQuery, XSLT/XPath, C, C#, C++, Java, JavaScript, Perl, Ruby and any other programming language you can think of.  It needs to be simple enough that one could perform a series of search and replace operations to turn it into executable code.  Fortunately, most programming languages today share quite a bit of common syntax when it comes to expressions.

So, I decided to create a small language, I'm going to call simple math.  The point of simple math is to make it easy to write math expressions in a way that can be translated into a variety of execution languages.  Simple math is NOT designed to be computed by itself (although it could be).  It is designed to be transformed into something that can be executed in a variety of programming environments.

From an implementation perspective, the following are my requirements for "Simple Math":
  1. Definition of variables that can be bound to an implementation specific object (a database table, or class instance).  
  2. Definitions of variables that can be bound to an implementation specific object containing fixed (constant) values.
  3. The ability to call a function to perform some operation.
  4. Simple Arithmetic (addition, subtraction, multiplication, division, exponentiation)
  5. Parenthesis
  6. No side effects
  7. Easily translatable to executable code using simple search and replace operations.
Having worked in over a dozen programming languages, there are some common capabilities across most that could be simply reused.

Constants:
So far, I'm just doing math, so all I need for literals are the usual representations of numbers.  There are two basic types:  Integer and real in Simple Math.  Integers start with a digit and are followed by consecutive digits, and may be preceded by a - sign.  Real numbers are represented as integers (minimally a single digit including 0) that are followed by an optional decimal point and decimal part (minimally .0) and an optional exponent part separated by the letter e (or E) and a required sign and decimal number indicating the power of 10 to which the number is raised.

Implementations must support at least IEEE single precision arithmetic on real numbers, and follow the IEEE rules for computation.

Operators:
The operators for addition +, subtraction -, multiplication *, and division /, as well as parenthesis are pretty commonly used.  Some languages have operators for exponentiation, and others use function calls.  The languages that have these operators also have a function call notation, so I won't try to choose an operator for exponentiation.

The integer division and modulus operators are available in some programming languages, but do not have a consistent syntax.  Since they can be supported by function calls, I'd skip picking operators for them.

Parenthesis () are commonly understood when used to change operator precedence across almost all programming languages.

We don't need array operators for what I'm calling "simple computation".  I could be convinced otherwise, but I think we can skip this for most cases.  If we do get into arrays, we'd also need to understand what the index is for the first item, which varies by language.  Skipping arrays also avoids that mess.

We do, I think, need to have an operator for member access to an object, and most languages already support the . as the member access operator.  We don't need to get into differences between pointers, references and values, because as your recall, one of the requirements is to be side effect free.

I think we can skip the comparators, but again, I could possibly be convinced otherwise with a good use case for them.  They usually have a boolean result (but can be otherwise in the presence of exceptional values like NaN, Infinity or NULL).  And given that they have a boolean result, at least in the QH context, we can use existing boolean and range selection capabilities in the model to achieve the same result as comparison operators in an expression language.

Identifiers
Identifiers for variables and constants is also not too tricky (until you get to SQL). Most programming languages require an initial letter, and can be followed by any number of letters and digits.  In addition to the usual letters, most also allow an underscore or other punctuation characters in identifiers, but these characters can also have special meaning in other contexts.  For example, Java allows $ and _, C only allows _.  XML allows the _ and the : in names, but : has special meaning in many cases (as a namespace delimiter for a qualified name).  In SQL, it gets a bit more complicated, because the _ can have special meaning at the beginning of a name, and case could be significant or not in the usual transformation.  It is possible to "quote" identifiers, so that case significance can be preserved, and it is also possible to create a regular expression to support translation of identifiers into quoted identifiers.

One of the challenges of identifiers is avoiding reserved names in the various programming languages.  In SQL, if we "quote" the identifiers, then this is no longer a problem.  Looking across C, C++, C#, Java, JavaScript, Perl, Ruby, SQL, and VB I came up with a list of about 300 distinct keywords (ignoring case).  Then I realized that making a list of prohibited keywords would likely not work because some language that I hadn't considered wouldn't be supported.  The simple answer then is to ensure that there is some transformation of identifiers to a non-keyword form that works in general.  An example would be the arbitrary rule that no "simple math" identifier be permitted to end in a double underscore.  One could then append a double underscore to every identifier that matched a keyword in your chosen implementation language, and be sure that it would not collide with another identifier.

Functions
Function calls provide some of the more complicate arithmetic, including min/max functions, floor/ceiling/rounding, computing powers and logarithms, modular arithmetic, et cetera.  I like the JavaScript Math object as a basic starting point, but I think I'd limit min/max to the two argument forms.  I don't know that we need the trignometric functions for the kind of things we need to compute, but again, I could be convinced otherwise.  To that, I'd add a div() and mod() function to support modulo arithmetic.

These basic math functions would be preceded by an identifier (Math) and a dot . to indicate that they come from the basic math library.

There, now I have the basic idea behind Simple Math written down.  Now to standardize it.  Anyone looking to create a new standard?   I'm going to be speaking at OMG's Healthcare Information Day in Cambridge next week, maybe I can get them to take it on.