Friday, August 20, 2021

Stratifying Race and Ethnicity for SANER

Variation is the bane of standards.  Eliminating needless variation is part of my job.  Doing it in a way that doesn't increase provider (or developer) burden is an indication that it's been done right.

I've looked at a lot of state and national dashboards while working on the SANER Project, and one thing I notice is the variation in reporting for data with respect to race and ethnicity classifications (strata).  Often, when reported on publicly, these two different categories are combined into smaller sets, with groupings like multiple race, other and unknown.

ONC National Coordinator Micky Tripathi noted Health IT reporting variation for this kind of data in his keynote delivered at a recent Strategic Health Information Exchange Collaborative (SHIEC) conference.

Federal reporting uses separate fields for race and ethnicity, and allow for multiple values to be reported for race.  There are 5 possible values for race (not counting various flavors of unknown), and two values for ethnicity according to OMB Reporting requirements.

Reporting multiple races means that there are several ways to report none (flavors of null including unknown, refused to answer and did not ask), 5 ways to report one race, 10 to report two races, 10 ways report three races, 5 ways to report four, and 1 way to report all five, resulting in around than 33 categories.  

Combining that with the various ways to report ethnicity (again with flavors of none), that results in about 165 possible reporting categories.  Looking at the actual statistics, there are about 50 categories that would generally be needed for a given facility (e.g., frequency > a few tenths of a percent) to stratify populations according to race and ethnicity, if non-existing groupings are not reported on, and perhaps an even smaller number for smaller facilities.  It wouldn't be possible for example, for a 100 bed hospital to even use all of the category combinations.

The data is generally rolled up into a much smaller number of reporting categories which vary between states, and these often also vary with how federal dashboards report the same data.  Different states have different racial and ethnic makeups and the public reporting at race and ethnicity data at these levels is designed to address potential disparities relevant to the state.

Given that many state departments of health  also support reporting to federal agencies, how does one normalize reporting without having to have 51 separate specifications for reporting?

The best way to handle this is to stratify by the combination of race and ethnicity, and report all possible existing combinations.  In other words, don't report 0 values for combinations that don't exist, as that can be inferred from the data.  This enables states to roll up this data into a smaller set of categories for their public reporting, yet retain the data needed for federal reporting, and enable federal reporting to roll up differently.  When automatically computed, this level of stratification does not introduce a reporting burden on the reporting providers.



Tuesday, August 3, 2021

Thinking Ahead

A very long time ago, when I worked at Florida State University, I had two rules for programming written on my board:

  1. Just get it to work.
  2. If it works, don't mess with it.
It became a test of the quality of people who would read it, as the ones who suffered from some sort of reaction while interpreting the ramifications were definitely the people that I wanted to have around me.

If you just get it to work, and don't mess with it, you have something that has the very least effort put into it.  If you need to change it in the future, good luck with that.

Along the same lines, when I was growing up, a kid who lived on my street had a small car, I think it was a Dodge of some sort, and he wanted to beef it up. So he took out the engine and tranny, and replaced it with a MUCH bigger one.  Somehow he managed to make it all fit together after modifying the drive shaft, but he forgot one very important thing: Engine torque.  When he finally started the car after spending the better part of a year on this project, he wound up warping the frame, something like the picture to the right.  The car was a total loss.  The point of this story is that you can only do so much with a limited infrastructure.

One of the challenges with FHIR adoption for some use cases is that there are existing HL7 interfaces, labs, ADT feeds, immunizations, et cetera, that are already widely deployed, adopted, and working, in fact, working well.  There's little desire to replace these interfaces with FHIR based interfaces because:
  1. What we have is good enough for what we are doing.
  2. It's working right now.
But as we keep pushing the interoperability needle higher and higher, eventually we will have replace these interfaces. When should we do that?
  1. When what we have isn't good enough for what we want to do.
  2. Or we can't make want we want to do easily work with what we have right now.
The HL7 V2 to FHIR project is an example of what happens when interfaces get stuck in these situations, we cannot easily connect them to newer infrastructure so that we can do more with them, so we build things that enable us to convert from one to the other.  The very existence of the project demonstrates that there's more than we want to be able to with the data present in HL7 Version 2 messages.  This might include things like:
  • Aggregating data from multiple sources
  • Providing more sophisticated searching capabilities
  • Enabling data subscriptions
There's a lot of effort and cost associated with replacing something that works with something else, and it's hard to justify that when the thing that's working is in fact still working.  But, if there was a way to upgrade, replacing your scooter with a Corvette (and you can justify the need for a Corvette), then it might in fact be worthwhile.

When interface standards are mandated by regulatory policy, it's pretty difficult to upgrade.  Consider what happened with X12 5010 standards, or the whole discussion around CCDA 1.1 and CCDA 2.1 backwards compatibility.  It's even more difficult when it all has to happen in a very short time frame.  We need to consider how to have policy enable these kinds of shifts, over REASONABLE time frames.  Two years is not enough time to roll out a new standard without severely impacting an industries capacity to do anything else but roll that out.  We know that from experience (or at least I hope we do).

But what would the next generation ADT, lab, immunization or other standards look like?  And what would they enable us to do that the current ones don't.  It's time to start thinking about that.


Monday, August 2, 2021

YAML as a FHIR Format

 

YAML (short for "YAML Ain't Markup Language" but I simply prefer "Yet Another Markup Language") is a file format that even further simplifies writing structured data files.  A while back I struggled with writing measures for the SANER Project because both XML and JSON formats have some minor issues that make it hard to hand code the expressions.

XML sucks because, well, XML sucks.  Whitespace is really valuable for formatting code, but XML just wants to make it to be gone.

JSON isn't much better because you have to escape newlines and once again, cannot see your code structure for expressions.

I dinked around a bit with YAML input and output, and now because I'm creating a new measure, wanted to get it working properly, which I have now done, at least in so far as my YAMLParser now correctly round trips from JSON to YAML and back to JSON.

The key to making this work is using Jackson to convert between JSON and YAML, and configuring the YAML quotes right so that strings that look like numbers (e.g., "001") don't get treated incorrectly as numbers when converting between the two.

The methods newYAMLMapper() creates a Jackson YAMLMapper correctly configured.

    public static YAMLMapper newYAMLMapper() {
        YAMLMapper m = new YAMLMapper();
        return
          m.enable(YAMLGenerator.Feature.LITERAL_BLOCK_STYLE)
           .disable(YAMLGenerator.Feature.MINIMIZE_QUOTES)
           .disable(YAMLGenerator.Feature.USE_PLATFORM_LINE_BREAKS)
           .disable(YAMLGenerator.Feature.SPLIT_LINES);
    }

Methods for converting between YAML and JSON are fairly simple:

    public static String fromYaml(String yaml) 
      throws JsonMappingException, JsonProcessingException
    {
        ObjectMapper yamlReader = newYAMLMapper();
        Object obj = yamlReader.readValue(yaml, Object.class);

        ObjectMapper jsonWriter = new ObjectMapper();
        return jsonWriter.writeValueAsString(obj);
    }
    public static String toYaml(String jsonString) throws IOException {
        // parse JSON
        JsonNode jsonNodeTree = new ObjectMapper().readTree(jsonString);
        // save it as YAML
        String jsonAsYaml = newYAMLMapper().writeValueAsString(jsonNodeTree);
        return jsonAsYaml;
    }

Converting from streams and readers works similarly.

The YamlParser class implements IParser.  It contains an embedded jsonParser convert resources back and forth between Java classes and JSON formats, and then uses toYaml and fromYaml methods in encodeResourceToString and parseResource methods to read/write in YAML format.  It's NOT the most efficient way to read/write YAML to FHIR, but it works (correctly as best I can tell).

    public static class YamlParser implements IParser {
        private final IParser jsonParser;
        YamlParser(FhirContext context) {
            jsonParser = context.newJsonParser();
        }

        @Override
        public String encodeResourceToString(IBaseResource theResource) 
            throws DataFormatException
        {
            try {
                return toYaml(jsonParser.encodeResourceToString(theResource));
            } catch (IOException e) {
                throw new DataFormatException("Error Converting to YAML", e);
            }
        }
        @Override
        public <T extends IBaseResource> T 
            parseResource(Class<T> theResourceType, InputStream theInputStream)
            throws DataFormatException {
            try {
                return jsonParser.parseResource(
                    theResourceType, fromYaml(theInputStream));
            } catch (IOException e) {
                throw new DataFormatException("Error Converting from YAML", e);
            }
        }
        ...
    }

All of the setter/getter methods on YamlParser delegate the work to the embedded JsonParser, as shown in the examples below.  

        @Override
        public void setEncodeElementsAppliesToChildResourcesOnly(
            boolean theEncodeElementsAppliesToChildResourcesOnly) {
            jsonParser.setEncodeElementsAppliesToChildResourcesOnly(
                theEncodeElementsAppliesToChildResourcesOnly);
        }

        @Override
        public boolean isEncodeElementsAppliesToChildResourcesOnly() {
            return jsonParser.isEncodeElementsAppliesToChildResourcesOnly();
        }


A full blown implementation can be found at YAML Utilities for FHIR