Monday, April 16, 2012

Developing a SQL Schema for QueryHealth

In my SQL Implementation for Query Health post I showed how one could transform an HQMF Formatted Query into a set of SQL statements that would evaluate to a result for a given query.  We reviewed that implementation in more detail last Thursday at the S&I Framework Face to Face.  One of the outcomes of that review was that we decided to document a target schema for a SQL implementation.

There are several pilots or potential pilots that are interested in implementing against a SQL database in the Query Health Project.  Each of them uses a different schema.  We could have picked one of those, or just chosen one to be suitable for the purposes of a reference implementation.   What I've done is gone through my SQL implementation code again, and documented a target schema that would be suitable for a rewrite of that against the new HQMF schema that Query Health created.

What follows is the basic description of a target database schema.  This schema is NOT intended to support a full EHR as it does NOT support all of the details such a system would need.  It is intended to support a representation of the data that Query Health could operate against.  So, we don't track which organization an order was placed with for example, or the current status of the order

Table Purpose
Demographics Demographic values (name, birth date, gender, race, ethnicity, address, et cetera).  
Problems A list of problems and diagnoses
Allergies A list of allergies associated with the patient.
Medications A list of the medications that the patient is or has been on. 
Immunizations A list of the immunizations that the patient has been given.
Procedures A list of procedures that have been performed
Encounters A list of encounters that have occurred
Results A list of lab test results the have been generated
VitalSigns Vital signs that have been recorded.
FamilyHistory Family history associated with the patient.
SocialHistory Social History observations associated with the patient.
Orders Non-medication orders that have been made.
Rx Medication orders that have been made.

Each of the values in the first column corresponds to one of the definitions associated with a data criterion in the HQMF.  In the HQMF to JavaScript translator, these become objects in a Green C32 JSON Object Model that is created based on CCD or CCR documents provided for patients.   In the HQMF to SQL translator, these will become tables in a view that enables execution of the SQL Query.

There are a couple of ways in which these tables could be "populated".  One could simply create views for each of the tables above in an EMR database.  Each view would generate the appropriate content.  Another mechanism would be to have an Export/Transform/Load process dump the data from the EMR database to tables in this format.

Each of the tables starts with the same set of common columns:

Column Type Description
PatientID char(36) The patient for whom this row is associated.
ID char(36) A unique ID for the row.
Code char(20) A code for this entry.
CodingSystem char(128) The coding system for this entry (in OID format).
TimeLow TS The low value of the effective time for this row.
TimeHigh TS The high value of the effective time for this row.
ProviderID char(36) A unique identifier for the provider responsible for this row.
EncounterID char(36) A unique identifier for the encounter in which this row was created.
Negation boolean A flag indicating that the documented act was NOT performed.

Of the tables (or views) listed above, only Encounter could never reasonably have a value for for encounter. That wasn't enough of an excuse to special case encounter.  While it seems unlikely that you would document NOT recording a demographic, obtaining a lab result, or blood pressure, or ordering a procedure or medication, it's certainly possible that might be done, especially in certain contexts associated with quality measurement.

Demographics, Results, Vital Signs, Family and Social History and Problems or Allergies have some common features.  They are stored in question/answer form, where code and coding system capture the code associated with the question (what is the patients: ethnicity, HgA1C, Pulse, FH of Heart Disease, or Smoking Status).  The answer part needs to be captured in a structure that could record a number, physical quantity, string or coded value.

The questions associated with problems are things like complaint, symptom, finding, or type of diagnosis, and provide a more detailed classification.  With allergies, you might classify it as an intolerance, a true allergy, or adverse reaction for which more details are needed, and you might also classify by type of substance (e.g., food, medication, or environmental substance).

For the case of Family History, the question would be expressed using a coding system for a disease, and the answer would be expressed as a (coded) family relation for whom the disease condition did (or if negated, did not) exist.

That structure is pretty straight-forward and appears below:
Column Type Description
Type char(10) Type of value recorded
Value real Numeric Quantity
Unit char(10) Units of Measure for Physical Quantities
StringOrCode char(36) String or code values
CodeSystem char(128) Code System associated with codes
Interpretation char(20) Interpretation Flag

The type field tells you whether the value of interest appears in value or StringOrCode, and whether Unit or CodeSystem columns are relevant. The Intepretation flag only shows up on some values and indicates whether the value is considered to be abnormal in some way (e.g., high, low, et cetera).

Any of the Medication, Immunization, Result, Procedure or Encounter rows could have been performed as the result of an order.  Ideally, there would be a way to track the relationships of these items back to the order.

Now, having done this much analysis, there are several ways to build a schema that meets the requirements expressed.  I could have a fact table, a value table, and an order relationship table, and add to the fact table the type of fact using the table column from the first table in this post.  Or, I could build the 13 tables described in that first table, including the columns in the second, and adding the value table and order relationship where necessary.  The former structure (a star schema) is closer to a data mining/data warehouse schema used by I2B2.  The latter is closer to what might be used by an EMR system.

The point of query health is to send the question to the data.  It could live in a data warehouse or in an individual EMR.  If it lives in a data warehouse, then there are clearly some skilled technology resources who can figure out the mapping from an EMR structure to a warehousing oriented structure, but the reverse is not true.  So, I would go with the second choice, creating an EMR-like schema an put all the necessary values for each table in a single row (for simplicity).  That makes it easier for an EMR to map to the schema, a little bit harder for a system like I2B2 (but for which there are more skilled resources available), so in general, a good balance.

Here's the final structure I decided to work with for the new HQMF to SQL translator.  Note that there are several other small optimizations included.  I don't bother to  associate demographics with a specific Encounter ID for example.   Since allergies and problems always answer with a code, you don't need type, value and unit, but since vital signs are always physical quantities, you never need to know the type or have a place to store coded results, for example.

Table char(36) char(36) char(36) char(128) TS TS char(10) real char(10) char(36) char(128) boolean char(10) char(36) char(36) char(36)
Demographics PatientID ID Code CodeSystem TimeLow TimeHigh


StringOrCode VCodeSystem ProviderID
Problems PatientID ID Code CodeSystem TimeLow TimeHigh


StringOrCode VCodeSystem Negation EncounterID ProviderID
Allergies PatientID ID Code CodeSystem TimeLow TimeHigh


StringOrCode VCodeSystem Negation EncounterID ProviderID
Medications PatientID ID Code CodeSystem TimeLow TimeHigh Negation EncounterID OrderID ProviderID
Immunizations PatientID ID Code CodeSystem TimeLow TimeHigh Negation EncounterID OrderID ProviderID
Procedures PatientID ID Code CodeSystem TimeLow TimeHigh Negation EncounterID OrderID ProviderID
Encounters PatientID ID Code CodeSystem TimeLow TimeHigh Negation OrderID ProviderID
Results PatientID ID Code CodeSystem TimeLow TimeHigh Type Value Unit StringOrCode VCodeSystem Interpretation EncounterID OrderID ProviderID
VitalSigns PatientID ID Code CodeSystem TimeLow TimeHigh
Value Unit Interpretation EncounterID ProviderID
FamilyHistory PatientID ID Code CodeSystem TimeLow TimeHigh StringOrCode VCodeSystem Negation EncounterID ProviderID
SocialHistory PatientID ID Code CodeSystem TimeLow TimeHigh StringOrCode VCodeSystem Negation EncounterID ProviderID
Orders PatientID ID Code CodeSystem TimeLow TimeHigh EncounterID ProviderID
Rx PatientID ID Code CodeSystem TimeLow TimeHigh EncounterID ProviderID


1 comment:

  1. Hi,
    Wondering whether we can use the same structure(with some additional fields) for storing the data received from a ccd document.
    In this case there are some missing structures as given below.....

    *Purpose
    Problems problems
    Procedures Procedures
    Family History FamilyHistory
    Social History SocialHistory
    *Payers
    *Advance Directives
    Alerts Allergies
    Medications Medications
    Immunizations Immunizations,Rx
    *Medical Equipment
    Vital Signs VitalSigns
    *Functional Status
    Results Results
    Encounters Encounters
    *Plan of Care

    I know the purpose of your tables are differenet .. but just a thought if we can have an Ehr based on the same....

    Thanks and Regards,

    SKT

    ReplyDelete