Saturday, July 13, 2019

Optimizing Inter-Microservice Communications using GZip Compression in HAPI on FHIR

It's widely known that XML and JSON both compress really well.  It's also pretty widely known that one should enable GZip compression on Server responses to improve server performance.  Not quite as widely known , you can also compress content being sent to the server (for POST or PUT requests).  Also, most people can tell you: JSON is smaller that XML.

And size matters when communicating over a network.

So it should be obvious that one should always GZip compress data whenever possible when connecting between two servers, right?

Uhmm, not so much, but you could already see that coming, because what would I write a blog post about if if were true.

Here's the thing.  Compress saves time for two reasons:

  1. It takes less time to transmit less data.
  2. There's less packet overhead with less data.
But it also takes CPU time to compress the data.  So long as the CPU time taken to compress the data on one size, and uncompress it on the other side, is LESS than the savings in transmission and packet overhead, it's a net win for performance.  

Let's look at data transmission:

A maximum transmission unit (MTU) is about 1400 bytes.   This takes a certain amount of time to transmit over the network.  Here are some values based on different networking speeds:
Bandwidth
(Mbps)
Time
(ms)
2.24 
10 1.12 
20 0.56 
100  0.112 
200 0.056 
300 0.037 
1000 0.012 

Depending on network speeds, time saving on sending a single packet can save anywhere from  12 µs to 2.2ms.  This isn't very much, but if you have to send more than one packet, then you have interpacket latency, which is basically dealing with round-trip times from client to server for acknowledgements.  ACKs don't need to be immediate in TCP, a certain number of ACKs can be outstanding at once, there's not latency introduced on every packet) sent.  But your network latency also has an impact (network latency is generally measured on the order of 10s of ms) on the throughput.

I ran an experiment to see which method was fastest when sending data in a POST/PUT Request, using GZip or not using GZip, and the results were interesting.  I send 200 create requests in which I controlled for the size of the resource being sent in terms of the number of packets it would required to be sent over, from 1 to 10 packets of data (where I mean packet, the size of a single TCP segment transmission, controlled by maximum MTU size).  I sent the request in two different formats (XML and JSON), over three different networks.

For a control network, I used localhost, which actually involves no transmission time or effective latency.  Also also did the transmission over my local network, so that it actually went from my system, to my router, and then back to my system.  And then finally, I transmitted from my system, to my external IP address (so it left the router, went to my cable model and came back through it).

I burned the first batch of 200 requests to initialize the code through the JIT compiler.

Here's what I found out:

  1. Don't bother compressing on localhost, you are just wasting about 2ms of compute on a fast machine. 
  2. Don't bother compressing within your local network (i.e., to a switch and back).  Again, about 2ms loss in compute on a fast machine.
  3. Going across a network boundary, compress JSON after 3 packets, and XML always*.
  4. Use JSON rather than XML if you are using a HAPI server.  JSON is ALWAYS faster for the same content.  For smaller resources, the savings is about 20%, which is fairly significant.

What does this mean for your microservices running in your cloud cluster?  If they are talking to each other over a fast network in the same cluster (e.g., running on the same VM, or within the same zone with a fast network), compression isn't warranted.  If they are communicating across regions (or perhaps even different zones within the same region), then it might be worth it if your content is > 4.5K, but otherwise not.  A single resource will generally fit within that range, so generally, if what you are compressing is a single resource, you probably don't need to do it.

It won't hurt, you'll lose a very little bit of performance (less than 5% for a single request if it doesn't take much work), and much less if you do something like a database store or something like that [all my work was just dumping the resource to a hash table].

That very limited savings you get for turning outbound compression on in the client when making an interservice request is swapping compute time (what you pay for) for network time (which is generally free within your cloud), and saves you precious little in performance of a single transaction.  So any savings you get actually comes at a financial cost, and provides very little performance benefit.  Should your cloud service be spending money compressing results? Or delivering customer functionality?

Remember also when you compress, you pay in compute to compress on one end, and decompress on the other.

    Keith

* My results say the compression is faster, but that the difference in results (< 2%) isn't statistically significant for less than 2 packets for XML.  I had to bump the number of tests run from 20 to 200 to get consistent results in the comparison, so it probably IS a tiny bit faster, I'd just have to run a lot more iterations to prove it.

Wednesday, July 10, 2019

What's your Point of View when writing code

Apparently I take code as a collaboration with the system I'm writing the code for.  I was reviewing my comments in some code I'd written, and all the comments were written first person plural.  We, us, et cetera.  When I review other code or documentation, it's all third person, the system, the component, et cetera.  I'm also writing for my team, and the we/us includes them in the conversation, but the we/us in the head is me and the computer.

Does anyone else have the same feel for this?  Do you comments talk to others, yourself, or you and the computer system that's running it?

   Keith