Thursday, December 17, 2009

Vocabulary

Healthcare IT products need to deal with terminology for ICD-9-CM, ICD-10-CM, ICD-10-PCS, SNOMED-CT, RXNORM, LOINC, NDC, CPT, HCPCS, UMLS, the Healthcare Provider Taxonomy and a number of proprietary vocabularies as well.  Most of these use different file formats to exchange the data about the vocabulary.

What I'd really like to see is everyone use standard format to exchange this information.  Preferably I'd like that format to be XML-based to make it easier to process.  But I'd also like that representation to be fairly compact, so I might be able to live with a text delimited format.  I can readily create an XML reader that will import common text delimited formats in an XML document for processing, so it's not a huge problem if the format isn't XML-based.

Finally, I'd like everyone to agree on some very common concepts (e.g., "is a") that need to be expressed so that these concepts have the same meaning across terminology.  Ensuring that we have a set of commonly accepted (standard) relationships will certain help us get to a point where we can reason across terminology boundaries.

The US Federal government is responsible in some way for maintenance, delivery or mandated use of some of these vocabularies (RXNORM, UMLS, ICD-9 and 10 variants used in the US, NDC, HCPCS and the Healthcare Provider Taxonomy), and yet almost all of them require different file formats for distribution.  It's what I've come to expect from my government, but I wish it would stop.  At least the work done by NLM (RXNORM and UMLS) have a common file format.  The Rich Release Format is used for both of these and uses | as a text delimiter to separate columns.  In fact, it might even be worthwhile to have a number of SDOs get together and agree to use that format (or perhaps a modification of it) to deliver vocabulary information.

Some of the vocabularies I mention are published in books with a lot of ancillary material that should also be part of the downloads.  For example, the ICD-9-CM vocabulary contains a rather large index which is incredibly valuable, along with a number of inclusions and exclusions.  But to really make good use of the vocabulary you need the data associated with these additional parts incorporated into the downloads.

Finally, I'd like to see some of the hierarchical relationships in some of these terminologies be formally expressed within them.  LOINC for example, contains numerous concepts describing clinical documents, but the LOINC data itself doesn't actually include some of the important relationships between the different types.  For example, the Admission History and Physical Note (47039-3) doesn't show up as being related in the document hierarchy with the Cardiology Hospital Admission Note (34094-7).  The same is also true for relationships between the various laboratory results. 

As we in the US continue to talk about simplification and debate some of the really hard IT topics, this seems like a really simple problem to solve that could be addressed with just a little bit of the right attention.

2 comments:

  1. Part of the dilemma is in peoples understanding of the different vocabularies under discussion and how they relate to one another. There is usually NOT a one to one relationship betweeen the cross linkages in terms of semantics between the various terminologies so one can not assume semantic equivalency just because there is a cross mapping. So there is usually not an IS_A between most of the terminologies mentioned. There may be MANY SNOMED CT concepts that cross map to a SINGLE ICD9CM. So, there must be extreme caution when one proposes inter-relating these different vocabularies. They have different purposes and the relationships between them do not translate into semantic equivalency.
    Gerry Wade

    ReplyDelete
  2. Sometimes reasoning needs to be done across vocabulary domains. For example, to implement a decision support rule about reviewing the patient's dosing regime for Insulin might need to understand problems using SNOMED CT, medications using RXNORM, and lab results using LOINC. There it would be very nice if the same assumptions about ISA would be true across different vocabularies, perhaps even including the terminology used to describe the concept of ISA.

    ReplyDelete