Thursday, January 30, 2020

Querying with Dates in FHIR, dateTime to date

In a previous post I talked about querying from date to dateTime in FHIR, and indexing strategies to deal with that.  I also mentioned that I'd go the other way for it.

So let's look at the problem in reverse.

Consider a dateTime for an event: January 20th, 2020, at 6:00am EST.   Consider also that some systems may not store the timestamp for this event (so it would simply be 2020-01-20 with no timestamp or time zone).

Let's say that something important happened at that time, and I want to find all events that had a dateTime after that event.

What if there are five Observation resources, with effectiveDateTime values of 

  1. 2020-01-20T05:59:00-05:00
  2. 2020-01-20T06:01:00-05:00
  3. 2020-01-19
  4. 2020-01-20
  5. 2020-01-21

And I query for le2020-01-20T06:00-05:00, which ones should I get?

Well, pretty clearly I should get #1, and I shouldn't get #2.
But what about 3-5?

Here's what FHIR has to say about the le operator in this case:


lethe value for the parameter in the resource is less or equal to the provided valuethe range below the search value intersects (i.e. overlaps) with the range of the target value or the range of the search value fully contains the range of the target value

Ok, so the search value is 2020-01-20T06:00-05:00.  What is the range below it?  It covers all time before 2020-01-20T06:00-05:00.  Assuming a maximum clock precision in milliseconds, that includes all times up to and including 2020-01-20T05:59:59.999-05:00.

What is the range of the search value?  It's 2020-01-20T06:00:00.000-05:00 to 2020-01-20T06:00:59.999-05:00 inclusive  (using the same assumption on precision) .

We can start off by recognizing that the range of the search value can safely be ignored.  It doesn't fully contain any day of any year.  So we only need to worry about the range below the search value.

If we assume the user and the date associated with the event are in the same time zone (a probably, but not great assumption), then arguably #3 is before, #4 is unknown, and #5 is after this timestamp.  But if we cannot make that assumption, then what?  Time zones range from UTC-12 to UTC+14 (a 26 hour range possibly covering 3 different dates).

As it happens, in UTC-12, this event occurred on 2020-01-19, in Boston on 2020-01-20, and in UTC+14, on 2020-01-21.  So, without any assumptions on time zone, we have no clue whether these events occurred before, after on on any of the given dates in 3 - 5.  We could say for certain when the date is 2020-01-18 or 2020-01-22 that it's before or after, because given the CURRENT set of time zones (which could change based on local legislation), that these dates MUST be before or after, but cannot say otherwise without making an assumption about time zone.

This isn't a problem when both items to compare are dates (in the evil twin scenario of my previous post, my evil twin was born a day before me EVEN though we were born at the exact same time).

There is really only one way around this problem, and that is to assign a time zone to use in these comparisons.  There are two time zones that we already know about in this situation when making the comparison.  The first is the time zone given in the user's query parameter, and the second is the time zone of the server.  And then there's the argument that every time comparison should be made with respect to UTC.

I find the latter argument about UTC to be based on the needs of developers, rather than users, and thus inadequate.  Similarly, I find the server argument to be based on technical rather than user-oriented requirements.  So, my preferred solution would be to use the time zone specified in the users query parameter.

That means that 2020-01-19 would be "less than" 2020-01-20T06:00-05:00, and 2020-01-21 would be "greater than" it, and 2020-01-20 would fit into the categories of "could be less than" and "could be greater than".  And from the user's perspective, this would be as close as we can get in responding.

It's NOT a satisfying answer, because of the "could be" category.  This results from the definition of the eq operator.

We generally think that if A = B, that B = A.  However, FHIR defines equals in this way:
eqthe value for the parameter in the resource is equal to the provided valuethe range of the search value fully contains the range of the target value
If A is the same range as B, then A = B and B = A, but if the range of A is larger than the the range of B, this is not true.

clock, doctor who, and gif image


0 comments:

Post a Comment