Here's a table showing my initial mappings. The first column indicates the I2B2 fact or dimension table. The second column indicates the field. The third is an XPath expression giving either the context for the table (for table heading rows), or the data element (relative to the table context element) that appears within the table field. XPath expressions using the cda: namespace identifier can be found in the CDA schema. Those with the rim: namespace identifier represent extensions defined by HL7 SDWG on behalf of HITSP to represent the field. The last column describes either the table or the field within the table based on the I2B2 documentation
To load a CDA document, one would iterate over each document, stopping at the table context points, and create a row of data using the field specifications. Then each fact or dimension table would be loaded from the unique rows produced. This an overly simplified description of the algorithm (table load order is important for referential integrity), and that are lot of other details I'll get into later. First, let's look at the (somewhat simplified) mapping:
Table | Field | CDA | I2B2 Definition | ||
Observation | cda:act| cda:observation| cda:substanceAdministration| cda:supply| cda:encounter| cda:procedure |
In healthcare, a logical fact is an observation on a patient. It is important to note that an observation may not represent the onset or date of the condition or event being described, but instead is simply a recording or a notation of something. For example, the observation of ‘diabetes’ recorded in the database as a ‘fact’ at a particular time does not mean that the condition of diabetes began exactly at that time, only that a diagnosis was recorded at that time (there may be many diagnoses of diabetes for this patient over time) | |||
Encounter ID | ancestor-or-self::cda:*[@classCode='ENC'] [1]/cda:id |
patient visit number | |||
Patient ID | //cda:patientRole/cda:id | patient number | |||
Concept Code | @classCode or cda:code | Code for observation of interest
(i.e. diagnoses, procedures, medications, lab test) |
|||
Provider ID | ancestor-or-self::cda:*[@typeCode='AUT' or @typeCode='PRF'][1]/cda:*/cda:id | Practitioner id or provider id | |||
Start/End Date Range | cda:effectiveTime | Starting and ending date-time of observation | |||
Modifier | (computed) | Code for modifier of interest (i.e. “ROUTE”, ”DOSE”), note that value columns are often used to hold the amounts such as “100” (mg) or “PO" | |||
Instance ID | cda:id | Encoded instance number that allows more that one modifier to be provided for each concept_cd. Each row will have a different modifier_cd but a similar instance_num. | |||
Value Type | cda:value/@xsi:type | Format of the concept N = Numeric T = Text (enums/short messages) B = Raw Text (notes/reports) NLP = NLP result text |
|||
Value | cda:value | ||||
Location Code | ancestor-or-self::cda:*[@typeCode='LOC']/cda:*[@classCode='SDLOC']/cda:id | A location code, such as for a clinic | |||
Patient | //cda:patientRole | Each record in the
patient_dimension table represents a patient in the database. The table includes demographics fields such
as gender, age, race, etc. Most
attributes of the patient dimension table are discrete (i.e. Male/Female, Zip
code, etc.). |
|||
Patient ID | cda:id | ||||
Vital Status | (computed) | Contains a code that represents the vital status (alive or dead) of the patient and the precision of the vital status data. | |||
Birth Date | cda:patient/cda:birthTime | ||||
Death Date | cda:patient/rim:deceasedTime | ||||
Gender | cda:patient/ cda:administrativeGenderCode |
||||
Age | (computed) | ||||
Language | cda:patient/ cda:languageCommunication/ cda:languageCode |
||||
Race | cda:patient/cda:raceCode | ||||
Marital Status | cda:patient/ cda:maritalStatusCode |
||||
Religion | cda:patient/ cda:religiousAffiliationCode |
||||
Zip Code | cda:addr/cda:zip | ||||
StateCityZipCode | cda:addr/(cda:state|cda:city|cda:zip) | ||||
Provider | //(cda:author|cda:performer) | Each record in the provider_dimension table represents a physician or provider at an institution. The provider_path is the path that describes how the provider fits into the institutional hierarchy. Institution, department, provider name and a code may be included in the path | |||
Provider ID | cda:id | ||||
Provider Name | cda:name | ||||
Encounter | //cda:*[classCode='ENC'] | The visit_dimension table
represents sessions where observations were made. Each row represents one session (also
called a visit, event or encounter.) This session can involve a patient
directly, such as a visit to a doctor’s office, or it can involve the patient indirectly, as in when several tests are run on a tube of the patient’s blood. More than one observation can be made during a visit. All visits must have a start date/time associated with them, but they may or may not have an end date. The visit record also contains specifics about the location of the session, such as the hospital or clinic the session occurred, and whether the patient was an inpatient or outpatient at the time of the visit. |
|||
Encounter ID | cda:id | ||||
Patient ID | ancestor-or-self::cda:*[ typeCode='SBJ' or typeCode='RCT' ]/cda:*/(cda:id|rim:id)[1] |
||||
Active Status | cda:statusCode | ||||
Start/End Date | cda:effectiveTime | ||||
Encounter Type Code | cda:code | ||||
Location Code | ancestor-or-self::cda:*[typeCode='LOC']/cda:*[classCode='SDLOC']/cda:id |
Now for some comments on it...
Concept Codes
You'll need to look at both the "act" classCode attribute, and maybe the code element within the act, and map that to the I2B2 ontology to figure out how to populate the concept code.
Modifier Codes
In I2B2, a single fact can have multiple parts. Each part of the fact is identified by the Instance identifier, and the part being represented (e.g., medication, Dose, route or frequency for a medication) can be separately represented. In CDA, the "fact" is represented by one of the basic "act" classes, and the properties of that class represent each of the fields. So, some acts will need to be represented as several facts (e.g., medications), while others (e.g., a lab result), will just be represented as a single fact. This shouldn't be too hard to understand.
Value and Value Type
I2B2 has four different basic value types. CDA has a few more that need to be mapped into the SQL tables. Also, I2B2 has different columns in which each value type is placed.
Location Codes [sic]
In the I2B2 schema, location codes really identify specific locations, and so are identifiers, not codes. Thus my mapping to cda:id for a specific location. Locations are set in the document context for each observation, and apply unless overridden later in the document (a rare occurence).
Encounters
A CDA document is "documenation of" an "encompassing encounter". Usually, what is recorded in the document with respect to the encounter and its location applies to everything in the document (it's part of the context of the document). That could be overridden subsequently in the document, indicating that the fact was a component of a different encounter that had a different location participant, but again, that is usually not the case.
Provider
Usually, the "author" of the document is also the performing provider, but again, that can be overridden with a performer participant in the encounter (there are several types of performers as well).
So, if you wanted to load a CCD document into an I2B2 data repository, this is enough to get you started.
My next task is to look at the NQF HQMF documents created by the Measure Authoring Tool, and see what I interpreted incorrectly, and see how well my transforms work against it, and comment on its structure. While HQMF may be the right standard to represent queries, we will need implementation guidance given on how to represent queries in the Query Health environment. The IHE Quality Measure Definition (ftp to Word document) profile might be one source for that guidance, and I've been drafted to help on that profile. I'll certainly be taking what I learn from this project into that one.