Friday, February 19, 2021

Load Testing is the Most Expensive Effort You Will Regret Not Doing

Photos of the I-5 Skagit River Bridge

Time and time again I’ve seen major failures in information systems that fail under real load.  This happens over and over again. It happened again today in my home state when the state’s vaccine scheduling system failed shortly after being opened up for scheduling.  I tried using it at least five different times today to schedule myself and my wife.  I finally gave up, realising that the system simply was under way too much strain.  Thirty second page loads, timeouts,  mystic error messages never meant to be exposed to an end user, it’s all classic symptoms of a system under way more load that it can handle.

The solution is very simple, you have to test systems under the load that they are designed to be used.

The problem is, that’s one of the most expensive tests software developers perform.  It can take a month and more just to prepare.  A big part of that is simply getting enough synthetic data to test with.  The data has to be good enough to be used with your validation checks.  The system has to be designed so that you can run such a test at scale without committing other systems to act on the test data.  It’s hard, it’s expensive, and often by the time you are ready to load test, the project is already late, and product needs to ship, or at least that’s what management says.

And so, against better judgement, the product ships without being tested.  And it fails.  Badly.

Somewhere, a resource gets locked for long than it should, and that causes contention, and the system slows to an unusable crawl.  It’s an unneceaary table lock in a database.  A mutex on a singleton in the runtime library.  A semaphore wrapped inappropriately around a long running cleanup operation.  Diagnosis takes days, and then weeks.  Sometimes a critical component needs to be rearchitected, or worse, replaced.  At other times, the fix is simple, once you finally find the error.  And sometimes there’s a host of optimizations that are needed.  What would have delayed the delivery of the product by weeks now delays it by months... or even years.  Some never recover.

Often, the skills necessary to engineer the system to a sustained load are simply absent, to the point that the expected loads and response times were never actually provided as design inputs. Nobody computed them, b/c the system is new, and nobody knew how to without any real world experience.

Load testing is the most engineering intensive effort of “software engineering”.  It’s the kind of effort that distinguishes a true software engineer or system architect from a “computer programmer”.  

Discovering that your system won’t operate under the expected load is not the worst thing that can happen to you.  Doing so after you’ve “gone live” is.  Now you have three expensive efforts simultaneously to address:

  1. The political fallout of the disaster to manage.
  2. The recovery effort on the broken data streams created by the failed system.
  3. The load testing you should have done in the first place, including the accompanying remediation of issues found from that effort.
Don’t ever skip load testing, at least if you want to continue to call yourself a software engineer.


Friday, February 12, 2021

Enhancing Search capabilities in HAPI using SUSHI and SearchParameter


I've been using HAPI on FHIR for several years across multiple projects, sometimes with my own back end for storage, and at other times using the vanilla HAPI JPA Server with a database back end.  One of the features of the JPA Server is that you can enhance search capabilities by telling the server how to implement an enhanced search capability by creating a SearchParameter resource on the server.

The FHIR SANER Implementation Guide defines several SearchParameter resources.  The simplest of these enables searching by Encounter.hospitalization.dispositionCode and is a good starter example for those who just need to search on a single field value for a resource where the core FHIR specification doesn't define a search capability.  A more complex example can be found in the Search by Code which enables search on several resource types for a single field.  While technically correct, this example is more complex than HAPI JPAServer can handle, and I'll talk later in this post about how it might be simplified to enable JPAServer to handle it.

If you are defining a SearchParameter resource for an HL7 or IHE Implementation Guide, the first thing you need to do is specify a bunch of metadata associated with the resource.  If, like me, you have to do this for a number of different resources, SUSHI has a syntax that enables you to effectively create a macro, a set of instructions that can be included in any resource definition.

The set of instructions I use is:

RuleSet: SanerDefinitionContent
 * status = #draft      // draft until final published
 * experimental = true  // true until ready for pilot, then false
 * version = "0.1.0"    // Follow IG Versioning rules
 * publisher = "HL7 International"
 * contact[0].name = "HL7 Public Health Workgroup"
 * contact[0].telecom.system = #url
 * contact[0].telecom.value = "http://hl7.org/Special/committees/pher/index.cfm"
 * contact[1].name = "Keith W. Boone"
 * contact[1].telecom.system = #email
 * contact[1].telecom.value = "mailto:my-e-mail-address"
 * jurisdiction.coding = http://unstats.un.org/unsd/methods/m49/m49.htm#001

We'll change status to #active, experimental will be set to false, and the version will be updated when we publish as a DSTU.  We follow the HL7 conventions for the first contact (the web page for the workgroup responsible for publishing the IG), and I add my contact information as the editor.  The jurisdiction.coding value is set to the value commonly used for Universal guides.

Other metadata you have to create describes the specific search parameter.  I put that directly in the SearchParameter instance:

Instance: SearchParameter-disposition
InstanceOf: SearchParameter
Title: "Search by hospitalization.dispositionCode in Encounters"
 * insert SanerDefinitionContent
 * url = "http://hl7.org/fhir/uv/saner/SearchParameter/SearchParameter-disposition"
 * description = "This SearchParameter enables query of encounters by disposition to support automation of measure computation."
 * name = "disposition"
 * code = #disposition

The instance name should start with "SearchParameter-", and should be followed by the value of the name you are going to use for search.  This is what you would expect to appear in the query parameter.  For this example, the search would look like "GET [base]/Encounter?disposition=...", so we use disposition as both the name and the code for this parameter.  I'd recommend keeping the name and code the same.

Finally, you need to specify some of the technical details about the Search parameter.  For this example, it applies only to the Encounter resource, and operates against a code.  Here are the additional settings you would need to specify:

 * base[0] = #Encounter
 * type = #token
 * expression = "hospitalization.dispositionCode"
 * xpath = "f:hospitalization/f:dispositionCode"
 * xpathUsage = #normal
 * multipleOr = true
 * multipleAnd = false

  1. The base parameter allows you to identify the resources to which the search parameter applies.
  2. The type parameter indicates which kind of search parameter type to support.
  3. The expression parameter describes how to find the data used for the search in the FHIR Resource using FHIRPath.
  4. The xpath parameter describes how to find the data used for the search using an XPath expression over the FHIR XML.  For the most part, this is simply the same as expression, using different syntax.
  5. Generally, you want to leave xpathUsage set to #normal as above.
  6. If you want the parameter to be repeatable using a logical or syntax, set multipleOr to true.  For this case:
    GET [base]/Encounter?disposition=01,02 would mean: Get all dispositions where the code value is 01 or 02.
  7. If you want the parameter to be repeatable using And, set multipleAnd to true, otherwise set it to false.  If the field in the resource is not repeatable, you can very likely leave this set to false.
For more complex search parameters, you can add multiple resource types for the base parameter.  We did that in the SANER IG to support search by codes in Measure and MeasureReport.

Here's an example of how we did that, but there are some caveats which I'll get into below:

 * code = #code
 * base[0] = #Measure
 * base[1] = #MeasureReport
 * type = #token
 * expression = """
 descendants().valueCodeableConcept | descendants().valueCoding | descendants().valueCode | code | descendants().ofType(Coding).not().code
"""

 * xpath = """
 descendant::f:valueCodeableConcept | descendant::f:valueCoding | descendant::f:valueCode | f:code | f:descendant::f:code[ends-with(local-name(..),'oding')]
"""
* xpathUsage = #normal
* multipleOr = true
* multipleAnd = true

Technically, the above expression and xpath are accurate.  However, this search parameter appears to be too complex for HAPI JPA Server to handle.  I expect that is because it operates over a simplified syntax of either the XPath or expression content, so I will be digging into that.  I think the ofType() or [ends-with...] expressions might be causing problems.  There are other ways that I can make these more explicit which would likely work better.