Thursday, August 15, 2019

Getting FHIR Data from mHealth devices and applications

I've been spending a good bit of my time working on understanding health data in mobile apps and devices.  Most of my research tells me we need to look at what the problems really are, rather than to assert that ___ will solve the problem.

There's not really a good collection of FHIR data coming from mobile apps and devices that could be used for any sort of analysis.  To address this problem, the Mobile Health Workgroup in HL7 is sponsoring a track at the HL7 September FHIR Connectathon to explore what kind of FHIR resources come out of these devices, and produce that collection for analysis.

The workgroup is hosting a meeting on August 23rd at 11am Eastern to discuss this track if you would like to learn more.  Coordinates are below:

 Web Meeting :
Dial-in Number (US): (515) 604-9930
Access Code: 836039
International Dial-in Numbers:
For 24/7 Customer Care, call (844) 844-1322

Saturday, July 13, 2019

Optimizing Inter-Microservice Communications using GZip Compression in HAPI on FHIR

It's widely known that XML and JSON both compress really well.  It's also pretty widely known that one should enable GZip compression on Server responses to improve server performance.  Not quite as widely known , you can also compress content being sent to the server (for POST or PUT requests).  Also, most people can tell you: JSON is smaller that XML.

And size matters when communicating over a network.

So it should be obvious that one should always GZip compress data whenever possible when connecting between two servers, right?

Uhmm, not so much, but you could already see that coming, because what would I write a blog post about if if were true.

Here's the thing.  Compress saves time for two reasons:

  1. It takes less time to transmit less data.
  2. There's less packet overhead with less data.
But it also takes CPU time to compress the data.  So long as the CPU time taken to compress the data on one size, and uncompress it on the other side, is LESS than the savings in transmission and packet overhead, it's a net win for performance.  

Let's look at data transmission:

A maximum transmission unit (MTU) is about 1400 bytes.   This takes a certain amount of time to transmit over the network.  Here are some values based on different networking speeds:
10 1.12 
20 0.56 
100  0.112 
200 0.056 
300 0.037 
1000 0.012 

Depending on network speeds, time saving on sending a single packet can save anywhere from  12 µs to 2.2ms.  This isn't very much, but if you have to send more than one packet, then you have interpacket latency, which is basically dealing with round-trip times from client to server for acknowledgements.  ACKs don't need to be immediate in TCP, a certain number of ACKs can be outstanding at once, there's not latency introduced on every packet) sent.  But your network latency also has an impact (network latency is generally measured on the order of 10s of ms) on the throughput.

I ran an experiment to see which method was fastest when sending data in a POST/PUT Request, using GZip or not using GZip, and the results were interesting.  I send 200 create requests in which I controlled for the size of the resource being sent in terms of the number of packets it would required to be sent over, from 1 to 10 packets of data (where I mean packet, the size of a single TCP segment transmission, controlled by maximum MTU size).  I sent the request in two different formats (XML and JSON), over three different networks.

For a control network, I used localhost, which actually involves no transmission time or effective latency.  Also also did the transmission over my local network, so that it actually went from my system, to my router, and then back to my system.  And then finally, I transmitted from my system, to my external IP address (so it left the router, went to my cable model and came back through it).

I burned the first batch of 200 requests to initialize the code through the JIT compiler.

Here's what I found out:

  1. Don't bother compressing on localhost, you are just wasting about 2ms of compute on a fast machine. 
  2. Don't bother compressing within your local network (i.e., to a switch and back).  Again, about 2ms loss in compute on a fast machine.
  3. Going across a network boundary, compress JSON after 3 packets, and XML always*.
  4. Use JSON rather than XML if you are using a HAPI server.  JSON is ALWAYS faster for the same content.  For smaller resources, the savings is about 20%, which is fairly significant.

What does this mean for your microservices running in your cloud cluster?  If they are talking to each other over a fast network in the same cluster (e.g., running on the same VM, or within the same zone with a fast network), compression isn't warranted.  If they are communicating across regions (or perhaps even different zones within the same region), then it might be worth it if your content is > 4.5K, but otherwise not.  A single resource will generally fit within that range, so generally, if what you are compressing is a single resource, you probably don't need to do it.

It won't hurt, you'll lose a very little bit of performance (less than 5% for a single request if it doesn't take much work), and much less if you do something like a database store or something like that [all my work was just dumping the resource to a hash table].

That very limited savings you get for turning outbound compression on in the client when making an interservice request is swapping compute time (what you pay for) for network time (which is generally free within your cloud), and saves you precious little in performance of a single transaction.  So any savings you get actually comes at a financial cost, and provides very little performance benefit.  Should your cloud service be spending money compressing results? Or delivering customer functionality?

Remember also when you compress, you pay in compute to compress on one end, and decompress on the other.


* My results say the compression is faster, but that the difference in results (< 2%) isn't statistically significant for less than 2 packets for XML.  I had to bump the number of tests run from 20 to 200 to get consistent results in the comparison, so it probably IS a tiny bit faster, I'd just have to run a lot more iterations to prove it.

Wednesday, July 10, 2019

What's your Point of View when writing code

Apparently I take code as a collaboration with the system I'm writing the code for.  I was reviewing my comments in some code I'd written, and all the comments were written first person plural.  We, us, et cetera.  When I review other code or documentation, it's all third person, the system, the component, et cetera.  I'm also writing for my team, and the we/us includes them in the conversation, but the we/us in the head is me and the computer.

Does anyone else have the same feel for this?  Do you comments talk to others, yourself, or you and the computer system that's running it?


Wednesday, June 26, 2019

Set Theory Much? Yeah ... me too.

Not is sometimes Knotty, or perhaps nutty.
As I'm building out queries in my FHIR Server, I recall one of the challenges I had in interpretation of negation in relationship to tests the last time I did this. 

Here are a couple of queries, see if you can sort them out the same way I did.  For context, assume that you've got some labs, some radiology, and some other stuff, and perhaps the only way you can find the other stuff (e.g., an EKG) is that it isn't actually coded in any way.  That's sometimes the case for the other stuff after all.


If you recall, DiagnosticReport.category is multi-valued as well making it even more interesting.

Before diving in, let's talk about some queries and some data.  Perhaps you have some tests that are EKG results (neither labs, nor radiology).

Now, let's look at it the other way first:

Returns any report where DiagnosticReport.category is coded using LAB, OR is coded using RAD, or is coded both ways.

Since category is a list (effectively a set of codes), the interpretation here is DiagnosticReport.category intersect (LAB, RAD) is non-null.  Another way to say this is |DiagnosticReport.category intersect (LAB, RAD)| > 0 (where |set| is the cardinality or size operator).

Returns any report where DiagnosticReport.category is coded both as LAB, and as RAD.

And the interpretation here is DiagnosticReport.category intersect (LAB) is non null AND DiagnosticReport.category intersect (RAD) is non null.  We could also say DiagnosticReport.category is a superset of (LAB) AND DiagnosticReport.category is a superset of (RAD).  Which allows us to join this second one as DiagnosticReport.category is a superset of (RAD, LAB) or yet another way: |DiagnosticReport.category intersect (LAB, RAD)| = 2.

Now, throw :not at the problem, and it becomes knotty indeed.

The way I want to read this is that DiagnosticReport.category contains neither LAB, nor RAD (or DiagnosticReport.category intersect (LAB, RAD) is null OR |DiagnosticReport.category intersect (LAB, RAD)| = 0.

But what then is this?
Well, follow the logic (bomb).  DiagnosticReport.category is NOT a superset of (RAD, LAB), or yet another way |DiagnosticReport.category intersect (LAB, RAD)| != 2.  These are the reports that aren't both.

Did that all make sense to you?  Because I'm still scratching my head.

Oh but wait, there's more:  If DiagnosticReport.category is missing, does this work?  Actually, yes, because it would be returned for both queries using :not, which would be correct.

But that probably isn't how you thought you'd write those queries in FHIR, is it?  Yeah, me either.  Or is is it me neither?  Either way, I think I've got it right now (and written too).

Monday, June 17, 2019

Telling Time the HL7 Way

If you've never been to an HL7 Working group meeting, you'll run into some shorthand that long-time HL7'ers know that you'll have to catch up on.  The first is how we split up the day.

Officially, the day has 4 quarters, with breakfast, lunch and two breaks:

  • Breakfast starts 8-ish, and goes until 9:00.
  • Q1 goes from 9-10:30am
  • Morning Break is 10:30-11am.
  • Q2 is from 11am - 12:30pm
  • Lunch goes from 12:30-1:45pm.  There's plenty of time to each, call home, and take a short meeting.
  • Q3 is 1:45-3pm and is a "short" quarter by 15 minutes.
  • Cookie break is 3-3:30pm.
  • Q4 goes is 3:30-5pm.

A "Q0" meeting (not part of the official nomenclature, but still well-understood) is before breakfast, usually 7-ish, but could also be "overlapping" with breakfast.

"Q5" and "Q6" are generally "after 5" till about 6:30-ish, and after "Q5" till whenever...  This is often where some good work happens (some would even say "the real work").

If you are doing HL7 meeting stuff from Q0 to Q6, you still have 12 hours for your day job and sleep.  Your mileage may vary.

Monday after 5pm is the cochair's dinner.  If you want to hang with a cochair, they are likely busy from 5-7:30 or so Monday night.

Wednesday starting around 5:30 is the HL7 Reception.  This goes until about 7:30.

The first half of Monday in September is the Plenary session.

Monday and Tuesday at the WGM in January are the two Payer Summit days.

Connectathons are Saturday and Sunday before the Working group meeting.  Quarters?  Yeah, kinda.  We have them, food shows up at the right times.  But it's a Connectathon, software is ready when it's ready.  Some have been known to work until Q8 or 9, and maybe even start at Q -1.

I wanna say board meetings happen somewhere in Q3 and 4 on Tuesdays, but it's really up to the chair.

Technical Steering Committee (a governance committee) meets Saturday and Sunday.
International Council is Sunday and Thursday Afternoon.
Education Facilitators Lunch is Monday most meetings.


Friday, June 7, 2019

What's your Field of View?

When you look at something under a microscope, what you see varies based on the level of magnification.  How much you can see and distinguish fine detail depends essentially upon your field of view.

One of the things that I've been looking at recently is personal health data stored in consumer apps and wearable devices.  Most of the details here amount to a FHIR Observation of some sort, with a code to describe the data element (and a value as a code, or quantity, or perhaps even a waveform).  We know that codes are computer friendly, but they aren't people friendly (and software developers ARE people, regardless of what others might tell you).

So, when everything is an observation, it gets messy for software developers who want nice, easy to remember mnemonics and JSON stuff that is focused right where they are focused.  Things that FHIR can capture and store, but maybe FHIR isn't actually the right place for those working in this space.

PCHA and Continua have some specifications in this space too, but again, NOT easy for developers to use, because once again, too much focus on the terminology, and not on what the developer is trying to do.

We need to find a way to move terminology out of the way.  Open mHealth looks like it's at a better place for this space, but folks who've invested heavily in FHIR and other standards don't agree.  But wait, what if those developers aren't my audience?  What then?

It all depends on your field of view.  And mine, as usual, is many and varied.

   -- Keith

Wednesday, June 5, 2019

Best practices for Logging and Reporting errors in FHIR

Over the years I've developed a number of micro-services implementing and using FHIR APIs.  I've developed a number of best practices for logging and reporting on errors that occur.  Some of these follow.


  1. If a call to your API is not validly formed, log this as a warning in your service's log.  You detected an error in user input, and handled it properly.  This is NOT an error in your application, it is an error in the calling application.  You DO want to WARN someone that the calling application isn't calling your application correctly.  You don't want to alarm them that your application isn't working right, because in fact, it is working just fine.
  2. If something happened in a downstream API call that prevents the proper functioning of your application (e.g., a database read error), this is improper operation of the system, and is an ERROR preventing your service from operating (even though there's nothing wrong in the service itself), and should be logged as such.  
  3. IF you implement retry logic, then:
    1. Log as warnings any operation that failed but finally succeeded through retry logic.
    2. Log as errors any operation that failed even after retrying.
  4. If an exception was the cause of an error, consider:
    1. If you KNOW the root cause (a value is malformed), say so in the log message, but don't report the stack trace. This will cut unneeded information from your logs, which you will be thankful for later. For example:
      try {
         int value = Integer.parse(fooQueryParameter.getValue());
      } catch (NumberFormatException nfex) {
            "Foo query parameter ({}) must be a number.",
    2. If you don't know why the error occurred (there could be multiple reasons), do report the stack trace in the log:
      try (PreparedStatement st = con.prepareStatement(query)) {
         ResultSet result = st.execute();
      } catch (SQLException jex) {
         LOGGER.warn("Unexepected SQL Exception executing {}",
            query, jex);
         throw new InternalErrorException(...);
    3. Consider pruning the stack trace at the top or bottom.  From the bottom because you know your entry points, infrastructure before that probably isn't that useful to you (e.g., tomcat, wildfly).  From the top because details after your code made the call that threw the exception isn't necessarily something you can deal with.  
  5. DO report the query used (and where possible, parameter values in the query) in the log. Consider also reporting the database name when using multiple databases. I have often seen database exceptions like "parameter 1 has invalid type" with no query included, and no values.
  6. Consider how you might implement retry logic in cases of certain kinds of exceptions (e.g., database connection errors).
  7. Use delimiters in your logging output format to make it easier to read them in other tools (e.g., spreadsheets).  I often use tab delimiters between the different items in my logging configuration: e.g.,
    %d{yyyy-MM-dd HH:mm:ss.SSS}\t[%thread]\t%-5level\t%logger{36}\t- %msg%n
  8. Consider reporting times in the log in a timezone that makes sense for your implementation (and more importantly, to your customer).  When your customer reports they had a problem at 9:33am, you want to be able to find issues at that time in the logs without having to compute offsets (e.g., from GMT ... do you know yours).

Reporting Errors in OperationOutcome

  1. Use 400 series errors like 400 or 422 when the fault is on the part of the client (e.g., invalid operation syntax, or unsupported query combination).  NOTE: HAPI on FHIR will report unsupported query combinations as a 500 series error (which I fix using an interceptor or filter).
  2. Use 500 series errors like 500, 503 or 507 when the fault is on the part of your service.
  3. DO tell the calling user what the problem is in easy to understand language, and if possible, include corrective action they can perform to address the issued.
  4. DO NOT include content from Exception.getMessage().
    I sometimes see this:
    catch (Exception e) {
      throw ...
    This is not good behavior.  You often have no clue what is in e.getMessage(), and often no control.  It can leak information about your technology implementation back to the API user, which can expose vulnerabilities (see below).
  5. DO NOT include the stack trace in the OperationOutcome.  This belongs in your logs, but not in the user response.  See the OWASP Error Handling cheat sheet.
  6. For database errors, you might want report the kind of database (e.g., patient chart, provider list), but not the exact name of the database.  Again, you want to be clear, but avoid leaking implementation details.

Use Error Codes

Finally, consider creating error codes (which can be reported to the user).  Report the error code WITH the human readable message.  The value of unique error codes is that:
  1. The error code does tell you where in your code the error occurred, but doesn't expose implementation details.
  2. Error codes can be associated with messages in ways that enable translation to multiple languages
  3. Error codes can also be associated with actions that users can take to correct the error (if it is on their part), or which your operations staff can take to either further diagnose OR correct the error.  For example:  DB001: Cannot access Provider database.
    Then, in your operations guide, you can say things like: "DB001: This message indicates a failure to connect to the Provider database".  Verify that the database services are up and running for the provider database for the customer site ...