Friday, November 9, 2012

An expression language for use in HQMF


What appears below is what I just finished writing as a suggested addition to the HQMF Release DSTU that is being balloted until next Wednesday.  It's a more formal definition of what I called Simple Math previously.  But it goes beyond that because it also defines the binding of variables, and defines how classes and attributes are named in the context of HQMF.
I'm not fully satisfied by this solution because it still requires quite a bit of mapping from RIM semantics to implementation detail, but I think it's the best choice to resolve the issues of computation.  For those of you who are going to say again that I should be using JavaScript, GELLO or XPath, or your other favorite language PLEASE READ the second paragraph below.  I already am.

Appendix B: An expression language for use in HQMF
The HQMF specification does not describe an expression language for use in the <expression> element of values found in MeasureObservationDefinition, nor in the <value> element of the joinCondition.  HQMF implementations may be based on a number of different programming languages and execution environments.  Several prototype HQMF interpreters have been created.  Some use JavaScript, others SQL, and others XQuery.  Thus, a single choice for the expression language is not obvious.  This presents a challenge for implementers, as the lack of a platform neutral expression language means that there is no single expression of a quality measure that could be implemented on multiple platforms.  

The goal of the language described below to offer a solution to this challenge.  It provides a way to include computable expressions in HQMF.  The language is designed in a way that simple regular expression substitutions might be all that is necessary to turn the expression into executable code in a variety of platforms.  This language is not intended to create a new language to supersede C, C++, GELLO, Java, JavaScript, Perl, Ruby or XQuery.  In fact, the expressions allowed by this language should all be legal expressions in each of these languages, with identical evaluations given an appropriate binding mechanism.  Thus, it becomes a (nearly) common subset for writing expressions that can be translated to a variety of implementation platforms.

While there are many implementations of JavaScript, GELLO, and other programming languages available, it is not always feasible to integrate these implementations into existing execution environments. The feasibility is not just based on technical capability.  For example, while JavaScript interpreters are widely available, and many can be used in conjunction with, or have already been integrated into SQL databases, some data centers would object to installations or use  of software that has not undergone stringent testing.  However, the same data center may allow use of existing SQL language capabilities to achieve similar goals.

This appendix demonstrates the feasibility of defining an expression language that is a subset of many common programming languages.  This can be done in such a way as to allow implementations to simply reuse expressions found inside an HQMF instance to execute inside their programming environment.

B.1 Identifiers
Identifiers in the language start with an alphabetic character, and may be followed by alphabetic characters, numeric characters, and the _ symbol.  Identifiers are used to reference bound variables, class members and functions.

Implementations are required to recognize the ASCII alphabetic characters (A-Z and a-z), Arabic numerals (0-9), and the _ character.  The alphabetic characters A and a must be disctinct (no case folding). 

While some SQL implementations may case fold identifiers used in tables and columns, it is possible to quote these identifiers to ensure exact matches.

There is no length limit on identifiers.  It is up to an implementation to address implementation language specific length limitations when translating identifiers in an  HQMF expression to an appropriate value.

identifier ::=  [a-zA-Z][a-zA-Z0-9_]*

B.2 Literal Constants
Numeric constants can be integers or real numbers. There are no string or character constants in this language subset.  While strings are useful in a general programming context, they are not needed in use cases where expression evaluation is necessary for HQMF.

literal ::= integer | real | timestamp

B.2.1 Integers
Integers are represented using an optional sign (-), followed by a sequence of digits.  The sequence of digits must be either a single 0, or a non-zero digit followed by zero or more additional digits.

integer ::= (-)? (0|[1-9][0-9]*)

Implementations must support at least 32-bit precision for integers.

B.2.2 Real Numbers
Real numbers are expressed using an optional negative sign, an integer component, a decimal point, and a decimal component, followed by an optional exponent.

real ::= [-] (0|[1-9][0-9]*).[0-9]+ [(e|E)(+|-)([1-9][0-9]*)]

Implementations must support at least IEEE double-precision real numbers.

B.2.3 Time Stamps
Time Stamps are represented in ISO 8601 Notation without punctuation (as used in HL7 Version 3 rather than in W3C Schema), and between quotes.  Thus, 6:30AM ET, on January 20th, 1965 would appear as "196501200630-0600".

timestamp ::= " [0-9]{1-12}(.[0-9]{1-6})?((+|-)[0-9]{1,4})? "

B.3 Operators
B.3.1 Arithmetic Operators
Arithmetic operators include +, -, * and / supporting addition, subtraction and negation, multiplication and division.  Precedence of operators is negation, multiplication, division, addition and substraction.  The parentheses characters ( and ) are used to override the order of operations.
Implementations must support these operators, and are permitted to support other arithmetic operators.

add-op                    => '+' | '-'
mult-op                   => '*' | '/'

B.3.2 Logical Operators
The logical operators are AND, OR and NOT().  NOT is a unary operator and has higher precedence than the other operators.  AND has higher precedence than OR.

B.3.3 Comparison Operators
Comparison operators include <, >, >=, <=, ==, and !=.  The precedence of these operators is == and !=, followed by <,>,<= and >=.   These operators are of lower precedence than arithmetic operators.  

Note:  == and != were chosen rather than = and <> to simplify substitution.  Replacing = by itself with == is harder to do correctly than replacing == with a =.  Given that == is used for equality, != becomes the natural symbol (from C and Java languages) for inequality.

eq-op                     => == | !=
rel-op                    => <= | >= | < | >

B.4 Grammar
The intent of the grammar specification below is not to enable implementors to “parse” expressions using this language, but to express the intended effects of evaluations of expressions used in this language.

or-expression             => and-expression (OR and-expression)*
and-expression            => not-expression (AND not-expression)*
not-expression            => NOT(not-expression)
                          |  relational-expression
relational-expression     => equivalence-expression (rel-op equivalence-expression)*
equality-expression       => addition-expression (eq-op addition-expression)*
addition-expression       => multiplication-expression (add-op multiplication-expression)
multiplication-expression => primary-expression (mult-op primary-expression)
primary-expression        => literal
                          |  - primary-expression
                          |  ( expression )
                          |  identifier-expression ( arg-list )
                          |  identifier-expression
arg-list                  => expression (, expression)*
identifier-expression     => identifier (. identifier)*

B.5 Language Binding
Variables are bound to data in the implementation model through use of the localVariableName element in data criteria.  The scope of the localVariableName values is to the entire document, so that all localVariableName values must be unique.

Each local variable in an HQMF expression represents an object that is structured as based on the RIM class from which it is derived.  Thus, the language can provide access to the RIM and navigational capabilities of the RIM.  Implementations must map these accesses into appropriate references in their implementation model.

B.5.1 Class Attributes
Attributes of a class are accessed using the . operator.  For example, to access the Act effectiveTime attribute of a class using the “myLocalAct” local variable name, you would reference it as myLocalAct.effectiveTime.  Data type properties are accessed in a similar fashion.  To obtain the low value of the effectiveTime, one would write myLocalAct.effectiveTime.low.  An implementation would then map references to myLocalAct.effectiveTime.low to the appropriate language and implementation specific representation.

The use of references to models in the various act criteria enables implementations to provide implementation specific models for different kinds of information references.  Thus, an implementation could map encounter.effectiveTime.low into a reference to the admitDate column of a visit table, but would map procedure.effectiveTime.low into the startTime column of the procedure table.

B.5.2 Associations
While RIM attributes and the properties of data types are uniquely named, a further challenge is accessing information from an associated RIM class while another RIM class is in scope.  For example, consider the case of computing the average waiting time for a patient during an ED visit.  Suppose that two observations are captured in the EHR, one being the patient arrival time, and the other being the time that they were first seen by a healthcare provider.  These data of interest could be represented as shown below in the DataCriteriaSection.

<entry>
  <localVariableName>Arrival</localVariableName>
  <actCriteria>
    ...
     <code code="441968004" codeSystem="2.16.840.1.113883.6.96"
      displayName="time of arrival at healthcare facility" />
  </actCriteria>
</entry>
<entry>
  <localVariableName>Seen</localVariableName>
  <actCriteria>
    ...
    <code code="308930007" codeSystem="2.16.840.1.113883.6.96"
      displayName="seen by health professional" />
  </actCriteria>
</entry>

In order to compute a quality measure which reports the average wait time the measureObservationDefinition could be defined as follows:
<measureObservationDefinition>
  ...
  <code code='AGGREGATE' codeSystem='2.16.840.1.113883.5.4'/>
  <value><expression>Seen.effectiveTime - Arrival.effectiveTime</expression></value>
  <methodCode code='AVERAGE' codeSystem='2.16.840.1.113883.5.84' />
  <precondition>
    <joinCondition>
      <value>Seen.componentOf.encounter.id = Arrival.componentOf.encounter.id</value>
    </joinCondition>
  </precondition>
  ...
</measureObservationDefinition>
The joinCondition ensures that the Seen and Arrival effective times being compared are from the same encounter. To do so, however, they must reference the encounter identifier, found in a different class than the observation.  In the XML representation of the RIM classes, these are referenced via the component act relationship.  The ActRelationship class in the RIM is has a typeCode attribute, when given the value of “COMP” describes the type of actRelationship being referenced as a component.  This typeCode attribute contains one of the values from the ActRelationshipType vocabulary.  However, COMP is neither convenient nor memorable as used in an expression.

This issue was resolved in the HL7 R-MIM Designer by creating a set of rules for naming relationships used in an R-MIM diagram.  This language proposes to use a subset of those names to access related classes from their parents or children.

There are four associations for which names are needed.  Each type of association relates a source class and a target class.  The name of the association depends upon the direction in which the association is made.  For example, to relate a document to one of its addenda, you would say document.addendum, but to go from the addendum to the parent document, you would say document.addendumOf.

B.5.2.1 Act Relationship Association Members
Table 1 below lists the names to use to reference from one act to another via the Act Relationship association class.  The first column of this table provides the ActRelationshipType.  The second column provides the name of this association in the usual traversal direction, and the third column, when the association is read in the reverse direction.

The RIM Act Relationship class has members of its own.  These are accessed via the name of the relationship.  To access components of the associated act class, use the appropriate RIM act class name (act, encounter, observation, procedure, substanceAdministration, or supply). While not strictly necessary to disambiguate between the relationship and the target (or source act), use of the RIM class helps to clarify relationships enabling appropriate mapping in the implementation environment.

Table 1 Act Relationship Member Names
ActRelationshipType
Source->Target
Target->Source
APND
addendum
addendumOf
ARR
arrivedBy
arrivalFor
AUTH
authorization
authorizationOf
CAUS
causeOf
cause
CHRG
charge
chargeFor
CIND
contraindication
contraindicationFor
COMP
component
componentOf
COST
cost
costOf
COVBY
coverage
coverageOf
CREDIT
credit
creditTo
CTRLV
controlVariable
controlVariableFor
DEBIT
debit
debitTo
DEP
departedBy
departureFor
DOC
documentationOf
documentation
DRIV
derivedFrom
derivation
ELNK
links
linkedBy
EXPL
explanation
explanationFor
FLFS
inFulfillmentOf
fulfillment
GEN
generalization
specialization
GEVL
evaluationOf
evaluation
GOAL
goal
goalOf
INST
definition
instantiation
ITEMSLOC
itemStorage
itemStorageFor
LIMIT
limitation
limitationOf
MFST
manifestationOf
manifestation
MITGT
mitigates
mitigatedBy
MTCH
matchOf
match
NAME
conditionNamed
assignedConditionName
OBJC
maintenanceGoal
maintenanceGoalOf
OBJF
finalGoal
finalGoalOf
OCCR
occurrenceOf
occurrence
OPTN
option
optionFor
OREF
referencedOrder
referencedBy
OUTC
outcome
outcomeOf
PERT
pertinentInformation
pertainsTo
PRCN
precondition
preconditionFor
PREV
predecessor
successor
REFR
reference
referencedBy
REFV
referenceRange
referenceRangeFor
REV
reversalOf
reversal
RISK
risk
riskOf
RPLC
replacementOf
replacement
RSON
reason
reasonOf
SAS
startsAfterStartOf
startsBeforeStartOf
SCH
scheduleRequest
requestedBy
SEQL
sequelTo
sequel
SPRT
support
supportOf
SPRTBND
boundedSupport
boundedSupportOf
SUBJ
subject
subjectOf
SUCC
predecessor
successor
SUMM
summary
summaryOf
TRIG
trigger
triggerFor
UPDT
updateOf
update
VRXCRPT
verbatimExcerptFrom
verbatimExcerpt
XCRPT
excerptFrom
excerpt
XFRM
transformationOf
transformation

B.5.2.2 Participation Association Members
Participations are associations between and Act and a Role.  Using the wait time example, suppose that instead of capturing the “time seen by a provider” in an observation, this information was (more correctly) modeled using the participation time of the encounter performer and the participation time of the service delivery location.  In this case, the expression being computed would be:
  <value>
    <expression>EDVisit.performer.time.low - EDVisit.location.time.low</expression>
  </value>

Table 2 below shows the names used to reference the participation class from the act in the second column, or from the role in the third column based on the participation type in the first column.
Unlike Act Relationships, participations are typically traversed in the direction from act to participation.

Table 2 Participation Type
ParticipationType
Act->Participant
Role->Participation
ADM
admitter
admission
ATND
attender
attenderOf
AUT
author
origination
AUTHEN
authenticator
authenticated
BBY
baby
babyOf
BEN
beneficiary
beneficiaryOf
CALLBCK
callBackContact
callBackAvailability
CON
consultant
consultation
COV
coveredParty
coveredPartyOf
CSM
consumable
consumedIn
CST
custodian
custodianship
DEV
device
deviceOf
DIR
directTarget
directTargetOf
DIS
discharger
discharge
DIST
distributer
distributed
DON
organDonor
organDonation
DST
destination
destinationOf
ELOC
dataEntryLocation
dataEntryLocationOf
ENT
dataEnterer
dataEntry
ESC
escort
escort
HLD
holder
contractHeld
IND
indirectTarget
indirectTargetOf
INF
informant
informationGiven
IRCP
informationRecipient
informationReceived
LA
legalAuthenticator
legallyAuthenticated
LOC
location
locationOf
NOT
notificationContact
contactFor
NRD
nonReusableDevice
nonReusableDeviceOf
ORG
origin
originOf
PPRF
primaryPerformer
performance
PRCP
primaryInformationRecipient
informationReceived
PRD
product
productOf
PRF
performer
performance
RCT
recordTarget
recordTargetOf
RCV
receiver
receiverOf
RDV
reusableDevice
reusableDeviceOf
REF
referrer
referral
REFB
subjectReferrer
subjectReferral
REFT
subjectReferredTo
referral
RESP
responsibleParty
responsibleFor
RML
remoteLocation
remoteLocationOf
SBJ
subject
subjectOf
SPC
specimen
specimenOf
SPRF
secondaryPerformer
performance
TRANS
transcriber
transcription
TRC
tracker
tracking
VIA
via
viaOf
VRF
verifier
verification
WIT
witness
witness

B.5.2.3 Role Link Association Members
Like Act Relationships, Role links are associations between two classes of the same type.  They appear infrequently in HL7 Version 3 models.  Table 3 below provides the names of role links from source to target in the second column, and from target to source in the third column, based on the role link type found in the first column.

Table 3 Role Link Assocation Names
Role Link Type
Source->Target
Target->Source
BACKUP
backupFor
backup
DIRAUTH
directAuthorityOver
directAuthority
INDAUTH
indirectAuthorithyOver
indirectAuthority
PART
part
partOf
REL
relatedTo
related
REPL
replacementOf
replacedBy

B.5.2.4 Player and Scoper Associations
A role is associated with two entities.  The first entity, known as the “player” of the role, is a person, place, organization or thing which performed or participates in the act.  The second entity, known as the “scoper” defines the scope in which the player acts.  The Role Class and whether the entity is playing the role, or scoping the role determines the name of the association.

Table 4 below provides the names of role relationships to the playing entity in the second column, or the scoping entity in the third column, based on the role class found in the first column.

Table 4 Player and Scoper Associations
Role Class
Playing Entity
Scoping Entity
ACCESS
access
accessed
ACTI
activeIngredient
activeIngredientOf
ACTM
activeMoiety
moietyOf
ADMM
product
administeringParty
ADTV
additive
additiveOf
AFFL
affiliate
affiliator
AGNT
agent
representedEntity
ALQT
aliquot
aliquotSource
ASSIGNED
assignedPerson
representedOrganization
BASE
base
baseOf
BIRTHPL
birthplace
birthplaceFor
CAREGIVER
careGiver
careGiverOf
CASEBJ
caseSubject
caseMonitor
CASESBJ
caseSubject
caseReporter
CERT
certifiedParty
certifyingParty
CHILD
child
parent
CIT
citizenPerson
politicalEntity
COLR
color
colorAdditiveOf
COMPAR
commissioningParty
commissionedParty
CON
contactParty
representedParty
CONT
content
container
COVPTY
coveredParty
underwriter
CRED
credentialedPerson
credentialIssuer
CRINV
investigator
sponsoringOrganization
CRSPNSR
researchSponsor
researchAuthorizer
DEPO
deposited
location
DST
distributedProduct
distributor
ECON
emergencyContact
representedParty
EMP
employee
employer
EXPR
exposedParty
exposingParty
FLVR
flavor
flavorAdditiveOf
GEN
specializedKind
generalizedKind
GRIC
specializedKind
genericKind
GUAR
guarantor
promisor
GUARD
guardian
ward
HCFAC
healthcareFacility
identifyingAuthority
HLD
held
holder
HLTHCHRT
healthChart
subjectPerson
IACT
inactiveIngredient
inactiveIngredientOf
IDENT
identifiedEntity
identifyingAuthority
INGR
ingredient
ingredientOf
INST
instance
kindOf
INVSBJ
investigativeSubject
investigatorSponsor
ISLT
isolate
source
LIC
licensedPerson
licenseIssuer
LOCE
locatedEntity
location
MANU
manufacturedProduct
manufacturer
MBR
member
group
MIL
militaryPerson
militaryServiceOrganization
MNT
maintainedEntity
maintainer
NOK
nextOfKinContact
representedParty
NOT
notaryPublic
politicalEntity
OWN
ownedEntity
owner
PART
part
whole
PAT
patient
provider
PAYEE
payee
invoicingParty
PAYOR
invoicePayor
underwriter
POLHOLD
policyHolder
underwriter
PROV
healthcareProvider
identifyingAuthority
PRS
relationshipHolder
personalRelationshipWith
PRSN
presentingPerson
location
PRSV
preservative
preservativeOf
QUAL
qualifiedEntity
qualificationGrantingEntity
RESBJ
researchSubject
researchSponsor
RET
relailedProduct
retailer
RGPR
regulatedProduct
regulator
ROL
player
scoper
SCHED
schedulabeEntity
schedulingEntity
SCHOOL
educationalFacility
identifyingAuthority
SDLOC
location
serviceProvider
SGNOFF
signingAuthority
representedParty
SPEC
specimen
specimenSource
SPNSR
sponsor
underwriter
STAK
stakeholder
entityInWhichStakeIsHeld
STBL
stabilizer
stabilizerOf
STD
student
school
STOR
storedEntity
storageLocation
SUBS
prevailingEntity
subsumedEntity
SUBY
subsumingEntity
subsumedEntity
TERR
territory
governingEntity
THER
manufacturedProduct
manufacturer
UNDWRT
underwriter
underwrittenParty
WRTE
warranteedProduct
warrantingEntity

B.6 Language Runtime
TBD
See Simple Math for a proposal.  Basically, steal JavaScript's Math package, and support a few constants (e.g., PI, E).

B.7 Extensibility
This language is intended to be extended to support implementation capabilities not specified in this appendix.  Implementations are free to provide additional runtime capabilities (e.g., specialized functions), or language syntax (e.g., octal representation of numeric constants), or semantic constructs (e.g., a ternary conditional operator, such as the C/Java ?: operator).  However, such extensions should not be expected to be interoperable across implementation platforms.


B.8 Notes on Syntax 

  • Whitespace is allowed where you would expect it to be (e.g., between operators and operands).  
  • There are no multi-step expressions. Line breaks have no syntactic meaning since each expression is expected to result in one value in the context of HQMF where these expressions are used.  
  • There are no comments because comments can be included in the XML where the expressions appear using existing XML commenting capabilities.
  • Extra syntactic sugar (such as an optional + sign before a numeric literal to indicate that it is positive) have been eliminated to ensure compatibility across the widest variety of implementation platforms.
  • There are no bit-wise operators because these were not necessary in the HQMF use cases expressed in developing this specification.
  • There are no strings because these were similarly not necessary.
  • There are no unsigned integers because integers sufficed for the computations in the HQMF use cases explored.
  • There was no need to distinguish between float and double, so only one "real" type is specified.
  • Float, unsigned, short and similar variations on numeric types are language optimizations which only further complicate things, so they were dropped.

5 comments:

  1. Have a look at Xtext in Eclipse:
    http://www.eclipse.org/Xtext/

    You can use it to define your grammar, generate a parser, an editor and an EMF-based AST. You can then use one of the many Model-to-Model (M2M) or Model-to-Text (M2T) transformation technologies in Eclipse to translate your AST to an implementation language of your choice:

    http://www.eclipse.org/m2m/
    http://www.eclipse.org/modeling/m2t/

    Also check out Xbase -- an Xtext based language that could serve as a foundation for your expression language:

    http://wiki.eclipse.org/Xbase

    ReplyDelete
    Replies
    1. Forget Compilers. The point of this language subset is to ELIMINATE the necessity for a compliler EXCEPT after translation of language constructs to a target implementation language. And then, the compiler is the same as is used in your implementation (e.g., C, C++, Perl, Ruby, SQL, XSLT, XQuery).

      The point of this language is to have something that can be SIMPLY transformed into language expressions found in other programming languages with the expected effects.

      Delete
  2. It was a great run, and I hope the readers who discovered us found our content useful.

    ReplyDelete
  3. Right, which is why something like Xtext can be very useful -- you get an editor and a parser into an EMF-based AST that you can translate into the target implementation language of your choice.

    ReplyDelete