Monday, April 27, 2020

A SANER $convert operation for Converting a FHIR Resource to CSV format

One of the points of SANER is to make it easier for organizations to integrate with a FHIR Server to support measure reporting.  To address this, I introduced the concept of Coping Mechanisms, but need to take it a step further.  We also need to address some of the other missing interfaces to query for Measure and MeasureReport resources in a few different ways.

A MeasureReport is defined by a Measure, and carries with it a very small set of important data for the receiver in public health context.  Take for example, the Measure based on the CDC/NHSN COVID-19 Patient Impact and Hospital Capacity Module.  In CSV form, this needs only a few bits of information to be communicated:

  1. The facility identifier of the reporting facility.
  2. A unique identifier for the report.
  3. The date associated with the reported data.
  4. - 17. The set of reported values
How would one define an API to convert the MeasureReport to CSV form that would 
  1. Fit with the FHIR OperationDefinition concept, and 
  2. Yet be fairly simple for someone NOT understanding FHIR to use.
The $convert Operation seems to be a good starting point for this.

Obviously, _format (thinking in FHIR) or the Accept header (thinking in REST) should be text/csv for this operation to be acted upon as a CSV converter.

CSV output is well defined, but the conversion process isn't so well defined.  
One could arguably just dump the JSON path as the column header names, and the leaf node values (e.g., @value or @url).  In fact, there are several JSON to CSV converters that do just that.  While that would work, it misses requirement #2 by a long shot, and doesn't allow the user to control the column names.

This gets back to allowing the user control over naming things in the CSV header, and mapping each header to values in the report.  There are a number of values in the report that were mentioned above.  I could easily add a composite parameter like header=name$value to the report where name is a string giving the header name, and value is an expression giving the "path to the value" in the MeasureReport.
  1. For CDC/NHSN, this is the NHSN identifier.  We have an idea what this looks like, but no real examples.  It's fairly obviously either reporter.identifier or subject.identifier (and more likely the latter for reasons discussed later).  If I were to use FHIR Path, I could say something like: subject.identifier.where(system = 'NHSN Facility Identifier System URL').value and that would give me the right thing.
  2. This is either MeasureReport.id or perhaps better, MeasureReport.identifier.value (but for which identifier).  Since these are NHSN generated values (probably OIDs but we don't really know yet), the FHIR Path is probably something like MeasureReport.identifier.where(system='NHSN Report Identifier System URL').value.
  3. Easy: MeasureReport.date, or better yet, just date.
  4. Each of these is a group.measureValue, or a group.population.count (and note, I've started to drop MeasureReport from the FHIR Path, which is still valid, since MeasureReport is the context for the conversion).  But I have to identify which of these ... so group.where(code.where(coding.where(system = 'ick' and code='blech'))).measureValue or group.population.where(code.where(coding.where(system = 'ick' and code='blech'))).count.  Ick and blech are simply syntactic meta-variables which I named that way because as I'm writing these expression, my stomach is churning for the poor non-FHIR-aware end-user.
So, I like where this is heading, but FHIRPath is still to FHIRy for some.  How can we simplify this?

If MeasureReport is the context, can we relax FHIRPath a little bit so that it would be more friendly?

To start with, I really like Sushi's # notation for codes, so I could say something like group.where(code=ick#blech).measureValue.  And since I honestly don't care about system (since MeasureReport references Measure, and code systems are pretty well defined), I could further simplify to group.where(code=#blech).measureValue, which is getting better.

Could this be further simplified?  The measureValue field only appears in group, could there be another syntax to identify it?  In XPath, I might say //f:measureValue[../f:group/code ...], or *[code=...]/f:measureValue.  FHIRPath doesn't have a .. like traversal.  However, I might define *to mean any field, and thus use *.where(code=#blech) (where * is simply and alternate name for descendants()).

So, now we have *.where(code=#blech).measureValue, but why stop there?  How about making *S mean mean *.where(S) instead?  Now we have *code=#blech.measureValue.  This is getting better, but still not as good as it can get.  Code is an obvious index in FHIR, so in fact is system, and a few other "slicers".  In fact, the slicing pattern is a fairly common one.

What if * were a "slice" operator, where slices were attempted starting from near and heading to farther away, so that *#blech implicitly meant, a thing whose slice is identified uniquely by an identifier or code whose value or code was blech.  And *name# could mean, that thing whose slice is identified with an identifier or code system with a system URL of name.

There might be many such possible slices.  Each possibly slice would be ordered in size from smallest to largest (from inner to outer slices).  And we could provide a selection operator that worked on picking the right one using the . operator.  So *#bleck.measureValue simply means that measureValue who slicer is a code or identifier with the code or identifier of bleck.

Now, I might say something like:
header=*#positive.measureValue$positive,...

Can I go JUST one tiny step further and say that if you are slicing, take the simplest content you can for the slice, so that if the slicer is a code in a field X (e.g., found in X.code, X.category, et cetera), then the sliced value is very likely the next simple value you could report?

This might even be prioritized: 
  1. (.*V|v)alue if there is one, 
  2. (.*C|c)ode if there is one
  3. (.*N|n)ame if there is one
  4. the first primitive type if none of the above match.
And furthermore, if since *S always has # in it, I might further simplify to say that if S has #, then * is not needed.

Now I might say something like:
MeasureReport/32/$convert?_format=text/csv&headers=facilityId$NSHNFacilityIdentifierSystem#,reportId$NSHNReportIdentifierSystem,collectionDate$date,numTotBeds$numTotBeds...

And finally, if X$X, then I need say only X, so:
MeasureReport/32/$convert?_format=text/csv&headers=facilityId$NSHNFacilityIdentifierSystem#,reportId$NSHNReportIdentifierSystem#,collectionDate$date,numTotBeds,Numbeds,numBedsOcc...

But # means something special and needs escaping, so just use | like FHIR does in query parameters.

And finally, if X$X, then I need say only X, so:
MeasureReport/32/$convert?_format=text/csv&headers=facilityId$NSHNFacilityIdentifierSystem|,reportId$NSHNReportIdentifierSystem|,collectionDate$date,numTotBeds,Numbeds,numBedsOcc...

AND NOW that's an API that has a well defined meaning, maps back to FHIRPath with some additional rules, and makes some sense to the common user who's not seen FHIR yet.

Testing this against a few other resources:  If the slicer is a LOINC code, this would get Obserevation.value.  If the observation is a panel (e.g., urine test), I could get a CSV of the component values quite cleanly.

If the slicer is a string (for item.linkId in Questionnaire), it would get item.text, which is NOT quite what we want.  Let's add answer[X] to the prioritized list but what happens when there can be more than one answer?  Hmm, have to think about that one.  Perhaps that CSV answer is a comma separated list of values in the cell?  That's not clear.  It's probably good enough though for now.

I'll have to work up some grammar for this.









0 comments:

Post a Comment