Wednesday, February 28, 2018

Hello CQL

So you want to learn CQL.  So do I, so I thought I'd probably write a book about it ;-)

Somewhere in the book will need to be the CQL Hello World program, which I'll repeat below for the uninitiated:

define Result: 'Hello World!'

CQL doesn't have assignment statements.  You define things and having defined them, you can later refer to them.  But that's it.  Values are never changed by the program.

That's an essential feature of declarative programming.

By being side effect free, CQL programs can be implemented by an executor in whatever order makes the most sense to optimize performance.  Another commonly used language that works this way is XSLT, which might explain why I like CQL. 

CQL has four primitive data types: Boolean, Integer, Decimal and String, along with the not quite primitive DateTime and Time types.  Boolean uses the traditional true and false values.
It also has complex data types including Quantity, Code, Concept, ValueSet and CodeSystem. 
Beyond that, everything else is either a complex class referencing an information model, or is defined in a Tuple.  And then there is null, which isn't a data type.

Strings are sequences of characters wrapped in single quotes.  Special characters are escaped using \ as in the C and Java language families with all the common escapes and Unicode.

Double quotes are reserved for named identifiers associated with complex things (Code, Concept, ValueSet, and CodeSystem).

Math is math.  Logic is three-valued.  Time is complicated, but less so in CQL than anything else.  CQL moves time from being a great big ball of timey-wimey stuff into linear progression that allows non-time lords to express logic within it.

One of the chapters will have to be about the history of CQL.  In "A theory of everything" written in 2013 I quickly listed some of that history.  Later history includes FHIR, QUICK, QICore, and some other bits and bobbles.  The meeting described in that post reads to me much like the begats in the Bible, and CQL may in fact be the messiah for CDS.  But right now it probably still has to spend its 40 days (or is it weeks, hopefully not months or years) in the wilderness.

Five years.  This is probably the second time in my life where I sat down and looked at a piece of health IT history and went oh shit.  Was it really that long ago?

Anyway, I probably am going to write that book, but don't expect it soon. I still have a lot to learn.


Tuesday, February 27, 2018

Logic in the Presence of Unknowns Just Isn't

... logical?  ... executing? ... or as in my case, even vaguely working to my expectations.

CQL today has to work in the presence of unknown values.  We call these nulls.  Null has this weird property of taking over everything in tri-value oriented languages (those where null is expected), and blowing up everything in bi-value oriented languages (those where null is not so much expected).

How can you tell if your language is oriented towards tri-valuedlogic, or bi-valued logic?  Well, the simple answer is what happens when you compare null OR true.  If the answer is that an exception is thrown, you are definitely dealing with bi-valued, and if you get true, then you are dealing with tri-valued, and if you get null, someone screwed up.

So what happens when you try to build a language interpreter for a tri-valued logic system (like say CQL) in a language that is generally bi-valued (say Java).  Some problems around null values.  In the real world, null is a thing.  It happens.  People don't fill out all the fields in a form, some values are simply unknown, or dependent on workflow that hasn't happened yet.  But we still have to reason with it.

Here are some interesting things you need to think about:
When you sort a list of objects based on a field, where do the objects go where the field is null?  XSLT got this right by making a decision, even if you don't like it.  So the behavior is defined.
"The set of sort key values (after any conversion) is first divided into two categories: empty values, and ordinary values. The empty sort key values represent those items where the sort key value is an empty sequence. These values are considered for sorting purposes to be equal to each other, but less than any other value. The remaining values are classified as ordinary values."
CQL doesn't actually cover this case.  Here's what it has to say about sorting:
"After the iterative clauses are executed for each element of the query source, the sort clause, if present, specifies a sort order for the final output. This step simply involves sorting the output of the iterative steps by the conditions defined in the sort clause. This may involve sorting by a particular element of the result tuples, or it may simply involve sorting the resulting list by the defined comparison for the data type (for example, if the result of the query is simply a list of integers)."
So NOW what?  Well, I think a minor adjustment much like what XSL had to say is in order here.

Type conversion is another issue. If you have a defined process for converting from one type to another, then you should also have a defined process for converting null things that might have been a basic data type into other null things that could be a different data type.  For example, the null string should convert to the null date.

Taking type conversion a step further, the string "asdf192832340asdfa8" when converted to a date might in fact return a null value to indicate their is no conversion.  Or it could raise an error.  That's a decision that needs deciding.

What happens when you union or intersect two lists where the list itself is null?  At the very least the behavior needs to be defined.  To see where the problem lies, consider the following:

List<String> l = null;

Is l an empty list, or simply null?  People who build collections are in the habit of returning an empty collection rather than null, but sometimes the collection builder itself returns null because it perhaps doesn't even understand the type of null at execution time.  That's actually OK, just return Collections.EMPTY_LIST (which happens to be pretty much identical to Collections.EMPTY_SET).

Life gets dicey around nulls.  There are no easy rules, you have to think about it.

BTW: This isn't a dissing CQL. I quite like the language.  But then again, I've been known to write tremendous volumes of code in XSL as well, so that isn't necessarily great praise from someone sane ;-).  I'm simply reporting on some of the challenges I'm having in the hopes that they can be fixed, and that others trying to use it can watch out for the hidden (you might even say unknown) pitfalls that are still being worked out.


Friday, February 23, 2018

How workflow can affect data and reasoning

I've been playing around a bit with the Clinical Query Language lately.  One of the interesting challenges I had to solve was to deal with some logic defined with one particular data representation in mind with a different representation.

To simplify the problem, I'll look at something that's pretty typical.
Consider the patient history form, a section commonly appearing on the "Clipboard" given to new patients:

Has anyone in your family ever had:
Cancer              [ ?]
Hypertension    [   ]
Stroke               [X]

You'll note here that the patient might not use the form the way it was intended due to uncertaintly about one of the answers. So perhaps it might later be changed to:

Has anyone in your family ever had:
Cancer               Yes   No  Unknown
Hypertension     Yes   No  Unknown
Stroke                Yes   No  Unknown

When encoding this information, there are number of ways to store it in the EHR system. If using precoordinated terms, you can simply list SNOMED CT expressions for all the positive items. This is one way to encode the information. However, pre-coordination of all possible cases doesn't exist in any singular vocabulary.  You cannot say in a single SNOMED CT term that you don't know if the patient has a family history of cancer, but it can be stated in a post-coordinated SNOMED CT expression. So this kind of result is often captured in question/answer form.

There are at least three additional ways to codify this information in question/answer form:
  1. You can codify the overall question, and give a list of codified answers.
    Q1: Family History? A: Hypertension
    This tells you nothing about cancer or stroke.
  2. You can codify each individual question, and list the answers as a yes/no for each checked / unchecked box respectively.
    Q1: Family History of Cancer? A: No
    Q2: Family History of Hypertension? A: No
    Q3: Family History of Stroke? A: Yes
    This doesn't capture the uncertainty about cancer.
  3. You can codify each individual question, and list the answers as a yes/no or unknown.
    Q1: Family History of Cancer? A: Unknown
    Q2: Family History of Hypertension? A: No
    Q3: Family History of Stroke? A: Yes
    This captures the fine detail across the board.

For an application to be able to reason with the data, you have to consider the various ways in which the question could be asked, and how to detect the appropriate response.

The challenge with clinical decision support and quality measurement is then to determine how to map the questions you have answers to into the questions the decision support is asking.  Sometimes, there isn't a clean match (as for cases 1 and possibly 2 above, as well as coding using precoordinated terms).

In CQL, you can easily map codes used to answer questions into a particular form, making it rather easy to change the code systems.

Code systems and codes are specified symbolically, as in:

codesystem "SNOMEDCT": ''
code "FH of Cancer": '275937001' from "SNOMEDCT" 

To change from from SNOMEDCT to ICD-10, you might use:
codesystem "ICD10": ''
code "FH of Cancer":  'Z80.9' from "ICD10" 

You could also create a single value set containing both codes.  Presently, CQL does not have a way to define a value set, only to reference them (they are defined elsewhere).  
valueset "FH of Cancer": 'some OID or URL reference'

You logic would then only need to address one thing: "FH of Cancer", either as a singular code, or a value set.  

You might cheat here and use a Concept as a way to create a faux valueset across different terminologies, but this is actually discouraged in CQL.  It might be better to create a CQL list of codes and use the contains() expression to determine if the code you have found is in the list of codes.

This doesn't get at negation/unknown logic that might also be needed in formulations 2 and 3 above.  To do that, you can define a function that checks for varying formats, and might also use various value sets for "Yes"/"No"/"Unknown" as possible variations.

When you get right down to it though, the decision that determines how reasoning needs to be done is often taken long before the reasoning is ever implemented.  And you cannot expect that to change quickly, because simply changing the form from the first example to the second can take quite a bit of time in a provider organization.

CQL goes a long way towards making clinical decision and quality measure logic reusable and mappable to provider workflows, but it is still missing a few pieces to make it truly easy to separate logic from data.

Monday, February 12, 2018

Mapping from a FHIR Composition to an XDS DocumentEntry

Rene Spronk asked this one on one of the HL7 lists ... basically the question is what is the mapping from XDSDocumentEntry to Composition.  I give the mapping backwards, how do you produce an XDS DocumentEntry from a composition, rather than forwards, but it can be used in either direction.

XDSDocumentEntry Composition ClinicalDocument
formatCode   (meta.profile) ClinicalDocument.templateId
uniqueId    (meta.versionId)
languageCode   (language) ClinicalDocument.languageCode
    identifier .setId
    status n/a
classCode (mapped from type), typeCode     type .code
    class n/a
patientId, sourcePatientId, sourcePatientInfo     subject .recordTarget
healthcareFacilityType, practiceSettingCode
via reference to encounter/location/physicalType and encounter/location/type
    encounter .componentOf.encompassingEncounter
creationTime      date .effectiveTime
authorInstitution, authorSpecialty, authorPerson, authorRole
via reference to author resource
    author .author.assignedAuthor
title      title .title
confidentialityCode      confidentiality .confidentialityCode
    attester .authenticator/.legalAuthenticator
        mode implied by .authenticator/.legalAuthenticator
        time .authenticator.time
legalAuthenticator          party .authenticator.assignedEnttty
    custodian .custodian.assignedCustodian
    relatesTo .relatedDocument
parentDocumentRelationship          code .relatedDocument.typeCode
parentDocumentId          target[x]
    event .documentationOf.serviceEvent
eventCodeList          code .code
        period .effectiveTime
serviceStartTime             low
serviceStopTime             high