Thursday, December 17, 2009


Healthcare IT products need to deal with terminology for ICD-9-CM, ICD-10-CM, ICD-10-PCS, SNOMED-CT, RXNORM, LOINC, NDC, CPT, HCPCS, UMLS, the Healthcare Provider Taxonomy and a number of proprietary vocabularies as well.  Most of these use different file formats to exchange the data about the vocabulary.

What I'd really like to see is everyone use standard format to exchange this information.  Preferably I'd like that format to be XML-based to make it easier to process.  But I'd also like that representation to be fairly compact, so I might be able to live with a text delimited format.  I can readily create an XML reader that will import common text delimited formats in an XML document for processing, so it's not a huge problem if the format isn't XML-based.

Finally, I'd like everyone to agree on some very common concepts (e.g., "is a") that need to be expressed so that these concepts have the same meaning across terminology.  Ensuring that we have a set of commonly accepted (standard) relationships will certain help us get to a point where we can reason across terminology boundaries.

The US Federal government is responsible in some way for maintenance, delivery or mandated use of some of these vocabularies (RXNORM, UMLS, ICD-9 and 10 variants used in the US, NDC, HCPCS and the Healthcare Provider Taxonomy), and yet almost all of them require different file formats for distribution.  It's what I've come to expect from my government, but I wish it would stop.  At least the work done by NLM (RXNORM and UMLS) have a common file format.  The Rich Release Format is used for both of these and uses | as a text delimiter to separate columns.  In fact, it might even be worthwhile to have a number of SDOs get together and agree to use that format (or perhaps a modification of it) to deliver vocabulary information.

Some of the vocabularies I mention are published in books with a lot of ancillary material that should also be part of the downloads.  For example, the ICD-9-CM vocabulary contains a rather large index which is incredibly valuable, along with a number of inclusions and exclusions.  But to really make good use of the vocabulary you need the data associated with these additional parts incorporated into the downloads.

Finally, I'd like to see some of the hierarchical relationships in some of these terminologies be formally expressed within them.  LOINC for example, contains numerous concepts describing clinical documents, but the LOINC data itself doesn't actually include some of the important relationships between the different types.  For example, the Admission History and Physical Note (47039-3) doesn't show up as being related in the document hierarchy with the Cardiology Hospital Admission Note (34094-7).  The same is also true for relationships between the various laboratory results. 

As we in the US continue to talk about simplification and debate some of the really hard IT topics, this seems like a really simple problem to solve that could be addressed with just a little bit of the right attention.