Monday, July 14, 2008

Reporting Genetic Test Results

This is the second part of a three part series that started with Understanding Genetics. In this article, I will identify, and describe at a high level the standards that are needed to obtain genetic test results.

But first a little segue. A colleague reminded me recently that what helps us best to deliver is in having a real understanding of the why our customer needs something. Let's see if we can create a little fantasy that might help.

Imagine that you are in Tier 3 technical support at your company. Assume for the time being that all of your company's computers (and those of companies like yours) are the same; it's just the software that's different. A new technique in computer diagnostic now allows technicians like you to actually read the stored programs inside the computer (work with me here). A few years ago, your company and hundreds of others like it came together to work on a major project. They took one of the computers at random, and cataloged every bit in its memory, all 750 megabytes of it. Within this vast amount of data were somewhere between 65 and 80 thousand little subprograms, each of them anywhere from 10 to 15 thousand microinstructions in length. We know only a little bit about how the processor works. We can understand start and stop instructions, and can interpret some of the sequences of microinstructions that make up larger operational instructions, and have some basic understanding of some of these programs, but are still learning more every day.

Your job, given a particular computer malfunction, is this: Based on a particular set of symptoms, and other random information that comes your way about where the computer has been, and what subsystems it was built from, you need to:

  1. obtain any additional necessary information you need to do a good job from your customer, then
  2. identify the particular memory region inside the computer to look at,
  3. locate the instructions at that location
  4. determine whether these instructions are correct, and if not,
  5. whether or not they could be the cause of the given symptoms.

Just to complicate matters, the computer that was selected at random is known to have a few wonky sub-programs installed on it that aren't quite right either. Also, addressing a particular memory location is not an exact science. It's more an art form, and the ways that you access it is by looking for sequences that you know typically precedes or follows the particular memory address you want. It's more like associative memory rather than RAM.

By the way, you have a budget to work with. You can read out vast sections of memory, but it is very expensive (like disassembly), or you can look for known problem causing sequences (like a virus scanner), which is faster and cheaper, but doesn't find everything.

Add to this that the information you have to work with and understand to identify a problem is not only growing at tremendous rates (see the second paragraph in Clinical Decision Support
), but also being changed. What you knew yesterday might be different tomorrow. It may be that one of those wonky subprograms has now been replaced by a better sample.

This is just a small sample of the complexity that faces the clinical geneticist. Hopefully this little segue into an analogical fantasy might help you understand a little bit about how genetic testing works.

Now, back in the real world, we will start simple. A genetic test is, at its core, a laboratory test. This simplifies matters for us, because we can make use of the same standards used in ordering and reporting for laboratory tests. The most commonly used standard for ordering laboratory tests and reporting on results is HL7 Version 2. There are many different releases of HL7 Version 2 (we could call them variants, but that would just be too confusing), including 2.2, 2.3, 2.3.1, 2.4, 2.5, 2.5.1 and 2.6, and coming soon Version 2.7 (it isn't clear whether these would be alleles or mutations).

Various organizations have selected different releases of HL7 Version 2 messages for laboratory orders and results, including:

HL7 Version 2.4
Used in the original ELINCS implementation guide developed initially by the California Healthcare Foundation. This guide is now being completed by HL7 using HL7 Version 2.5.1

HL7 Version 2.5
Used in the Laboratory Technical framework from Integrating the Healthcare Enterprise,

HL7 Version 2.5.1
Selected by ANSI/HITSP, and recognized by Secretary Leavitt of Health and Human Services for use in the US for reporting laboratory results. ANSI/HITSP selected this version because it supports the conveyance of information required by CLIA regulations.

While HL7 Version 3 does support laboratory tests orders and results, this is still a work in progress.

HL7 CDA Release 2.0 (this is another gene altogether) has also been selected by ANSI/HITP and recognized by Secretary Leavitt for reporting laboratory results in a clinical document. ANSI/HITSP's selection of this standard is constrained by the IHE XD-LAB profile found in the IHE Laboratory Technical Framework. The XD-LAB integration profile also conforms to the HL7 Laboratory Claims Attachments Implementation Guide.

Finally, results reported in a laboratory result often use Logical Observation Identifiers and Codes or LOINC® to identify results (all of the examples above use LOINC).

Ordering the Test
While our focus is on the reporting of genetic test results, the process has to start first with the order, otherwise there would be nothing to report on. This is a fairly complex process, which can involve quite a bit of negotiation (this is a requirements gathering process).

Sequencing and genotyping tests are performed using different methods, have different sensitivity and specificity and different costs. More powerful tests are typically more expensive. A typical genetic test can cost hundreds or even several thousand dollars, and take weeks to obtain results, so selecting the right test is important. Reflex testing (running another test when an ordered test shows a particular kind of result) also needs to be considered. Sometimes it is sufficient to order the cheaper test first, and then, if necessary, perform the more expensive one.
Reason for the Test
Ordering the test starts with providing the reason for it. This helps the lab determine what type of test or tests need to be performed and what kind of results need to be returned. It isn't sufficient for the laboratory to just identify a gene sequence. The sequence has clinical ramifications that they need to report to the provider. Remember, understanding the why is important.

Sequencing or re-sequencing, refers to the reading off the nucleotides (A, C, G and T) of the gene sequence directly. This is usually more expensive, but also the most accurate way to obtain a gene sequence. Some researchers use the term re-sequencing, because the Human genome has been sequenced once already.

Genotyping uses a set of probes to identify clinically meaningful variants in the patient’s DNA sample. It is a less expensive technique to produce results. This technique may not have the same power as sequencing, but is often effective, especially when specific variants are well known.

Sequencing is used if identification of novel variants is warranted, because genotyping is limed to the set of variants identifiable by the genotyping probes.

Please note: Genetic technology advances rapidly. Next year, the statements about cost may no longer be true. Search the web for "$1000 genome".

Clinical Question
So, in looking at why, what are the questions that are typically being asked? In genetic testing, there are six common clinical questions\. Half of these are related to specific genetically related diseases, and the other half to medications used for treatment. The "question" being asked by the clinician needs to be described in the order.
Tests on Genetic Conditions
Tests that identify variants associated with genetic conditions can assist the provider in determining if a patient:

  • has a genetic condition,
  • is at (increased) risk of contracting a genetically related disease, or
  • carries a particular genetic variant and can potentially pass it onto their children.

Each of these "questions" needs to be described, and the disease being assessed needed to be identified in both the order and the result.

Tests on Medications
Pharmacogenomic tests can tell a provider:

  • Whether a particular medication will be effective or not in treatment,
  • How quickly particular medication will be metabolized by the patient, or
  • How toxic a medication may be to the patient.

LOINC Vocabulary terms have been proposed to represent each of these different kinds of tests results in panels. The SNOMED CT and RxNORM terminologies have been proposed to represent disease conditions and medications respectively. Some experts have noted that SNOMED CT does not provide great coverage for family related disease (i.e. genetic conditions) but feel that it is more important to use a common reference vocabulary, than it is to introduce vocabularies that are not yet used in healthcare. Use of these vocabularies will enable linkage of genetic data with other clinical data in the health record. I find myself in agreement with them.

Describing the Specimen
The specimen is the source of the DNA examined, as well as eventual source of the variant identified. Genetic material in a tumor specimens can have somatic or germline variations. A somatic variation occurs after cells have been formed, for example from UV damage to skin cells after too much sun exposure. A majority of cancers occur due to somatic changes. A germline variation is one that is incorporated into every cell. The last classification is for specimens of fetal tissue (prenatal). Identifying results based on these three categories. A proposed classification system for specimens uses the terms somatic, germline, or prenatal to describe the specimen.

Reporting the results
When reporting on the results of genetic tests, it is important to include the information in the report necessary for a healthcare provider in interpreting this information. The first step in reporting the results is to repeat everything in the order that was stated or clarified later during the ordering process. The reason for this is to allow subsequent reviewers of the result to understand the original intent of the provider ordering the test. Ordering a genetic test may be an iterative process. In reporting the test results it should be necessary only to report what was finally agreed upon.

Region of Interest
Once a genetic test is selected, the testing laboratory can specify more detail about the region of the gene that was examined. The human genome includes on the order of 3 billion base pairs (which fits into about 750 mb). At present, it isn't practical to sequence a single person's gene, and would take quite a long time (although that may change).

So, the testing laboratory determines where on the genome the test will focus. This region of interest can be described by identifying the genomic or transcriptional reference sequence (to align the region with the genome), the starting and ending nucleotides in the sequence (using the numeric portion of HGVS nomenclature), and a specific gene using the HGNC nomenclature (remember that "associative memory access", here it is). Much of this information is, or can be tied together in appropriate knowledge bases (and these are continuously being updated).

The next step is to report the interpretation. Each test types described in the previous sections above will require different values to interpret the results. Vocabulary has also been proposed in LOINC using LOINC Answer Codes, but the LOINC documentation does not presently describe how to relate LOINC Answer codes to the supplied data. Certainly any set of values used for these interpretations will also need to be mapped into SNOMED CT, to allow for their eventual use in clinical decision support systems that rely on SNOMED CT. Note that some interpretations will remain "inconclusive" (have you ever finished a technical support call only to get no solution to your problem).

Because sequencing and genotyping is so expensive, it shouldn't be repeated unnecessarily. That means that enough detailed information should be conveyed in the result that future re-interpretation is possible. The average gene can contains from 10 to 15 thousand base pairs (think of these as the microinstructions), but this can vary dramatically, with some genes using millions of base pairs. This information is maintained by the testing laboratory and is absolutely essential in the initial interpretation of the results. When reported, these findings are summarized using the recommended standards. This will enable linkage of the genetic data to clinical genetic knowledgebases, so that interpretations can be maintained in a manner similar to other laboratory tests.

Different kinds of tests will require differing detailed results. A test that is attempting to identify a particular DNA marker or allele will need to describe what was found. Again, this identification can be performed using HGNC to describe the gene, NCBI Nucleotide Reference sequence identifiers, and HGVS nomenclature to describe the variations.

Details are not sufficient. The final component of the report should include interpretation of the results performed by a geneticist. Genetics is so complex that providers will need that expertise to understand the results. This analysis may include references to research, educational materials, suggested treatments or additional testing.

The Final Step
For most clinical uses, once the provider's question about the patient's genetics is answered, many more questions are asked by the provider and by the patient. Some of these involve how to communicate the information to the patient and/or their relatives; others involve what the next steps should be in management of the patient's health. These fall outside of the domain of Healthcare Standards, so I will not dwell upon them. However, The American Society of Clinical Oncology published a policy statement that addresses some of these issues. I recommend reading it.

American Society of Clinical Oncology Policy Statement Update: Genetic Testing for Cancer Susceptibility, Journal of Clinical Oncology, Vol 21, No 12 (Jun 15), 2003: pp 2397-2406 available on the web from
This is an excellent article that describes many of the issues surrounding the need for, and appropriate use of genetic testing. While its principle audience is clinical oncologists, the explanations given for the ASCO positions are very clear, and address many issues that need to be considered with respect to genetic testing.

Thanks to Sandy Aronson, Director of IT, and Mollie Ullman-Cullere, both of HPCGG for arranging a tour of their genetic testing laboratory and answering my many questions on genetic testing. Thanks also to Mollie, and to Scott Bolte of GE Healthcare for their reviews of an early draft of this article. Accuracy is due to them, any errors are of course my own.

IHE Webinars

IHE North America Connectathon Webinar Series

The next two sessions in the Webinar series takes place Tuesday, July 15 and Wednesday, July 16:

IT Infrastructure: Profiles for Health Information Exchange Tuesday, July 15, 9:00 - 11:00 AM (all times CDT)

  • XDS Affinity Domain Profiles
    (XDS.a, XDS.b, Merge, Web Services, XCA, XDR, XDM, NAV, PIX/PDQ, PIX/PDQ V3)
    - Bill Majurski, National Institute of Standards and Technology

Session 11: IT Infrastructure: Security and Privacy Tuesday, July 15, 1:00 - 3:00 PM (all times CDT)

  • Security and Privacy (ATNA, EUA, XUA, BPPC, DSG)
    - John Moehrke, GE Healthcare

Patient Care Coordination: Medical Document Content ProfilesWednesday, July 16 (all times CDT)
Part 1:
9:00 -10:00 AM PCC Content Profiles for Health Information Exchange
- Keith Boone, GE Healthcare
10:00 - 11:00 PCC Integration Profiles for Care Management and Query
- Keith Boone, GE Healthcare and Laura Bright, Bell Canada
Part 2:
1:00 - 1:30 PM Content Profiles for Prenatal Care
- Tone Southerland, Greenway Medical Technologies
1:30 - 2:00 Content Profile for Cancer Registries Pathology Reporting
- Wendy Scharber, Registry Widgets
2:00 - 2:30 Content Profile for Immunization Content
- Alean Kirnak, Software Partners LLC
2:30 - 3:00 Content Profile for Functional Status Assessment
- Marcia Veenstra, CPM Resource Center and Audrey Dickerson, HIMSS

The entire Webinar series is free, but participants are required to register in advance. Further information and a link to registration (via Webex) are available at

Monday, July 7, 2008

Understanding Genetics

You can't do anything about your genetic background or your family history, but you can do something about the medicine you put in your mouth.
-- Dr. Andrew Glass of the Center for Health Research in Portland

My interest in genetics and family history started with the development of the Family History section of the HL7 Continuity of Care Document. It was recently expanded by the introduction of the Personalized Healthcare use case into the 2008 work cycle for ANSI/HITSP. We are all fairly well aware by now that increasing a provider's knowledge of a patient's genetics and family history will allow them to better select effective management. Over the course of the last six months, many of us have been getting a crash course in genetics, genomics and genetic testing, and the need for structured family histories. What follows below is what I have learned over the last six months about genetics. Given that this material is as new to me as it is to you, I've had this material reviewed by an expert for accuracy.

This is the first part of a three part series on Genetics and Family History, describing enough genetics for healthcare IT implementers that need to review and/or implement specifications produced by ANSI/HITSP for the Personalized Healthcare use case.

Part 1: Understanding Genetics follows below.

Part 2: Reporting Genetic Results will describe the standards needed to exchange the information described in Part 1.

Part 3: Family History and Risk Assessment will describe the necessary information to communicate in family histories, and the importance of this information in assessing risk, and determining the need for genetic testing or additional treatment.

Understanding Genetics
Genomic and family history data is an excellent source of information on health risks for a variety of conditions, both chronic and acute. By using family history or genetic testing to identify patients at high risk, the medical system is better able to predict the risk of disease, allowing patients and providers to make better care plans to address those risks, and ensure earlier detection and better preventative efforts.

Genetic information can also help predict how effective a medication will be, providing for better care by reducing side effects, avoiding toxicity and unnecessary therapies.

However, before we can begin to incorporate genetic testing data into EHR systems, we need to understand enough clinical genomics to correctly incorporate these results into healthcare IT systems.

I'm certain that most of you understand what DNA and chromosomes are, and that most humans have 22 pairs of chromosomes plus a pair of sex chromosomes (XX for females or XY for males) Half (23 chromosomes) come from each parent, for 46 altogether. However, there is a great deal of specialized vocabulary that goes beyond chromosomes that we all need to understand. I've translated this very specialized vocabulary into language that engineers can understand (recall that I consider myself to be in this category).

Most of you recognize a picture of a chromosome as a vaguely X shaped object. You can think of this as two identical lengths or strands of rope called chromatids (1). These strands are effectively tied in a knot together at the middle called the centromere (2). The strands at what is usually depicted in the top part of the chromosome are the short arms (3) and those at the bottom are the long arms (4).

Genes and Nucleotides
Along each arm are sequences of nucleotides, typically represented using the letters A, C, G and T making up the DNA. DNA and RNA are known as nucleic acids because they are made up of nucleotides. A gene is a distinct DNA sequence that provides instructions for producing a single protein that in turn produces a single trait, such as eye color (actually, there are several genes controlling eye color, and a single protein can cause multiple effects, but let’s keep it simple for the moment. The set of genes belonging to a person make up their genotype. A person normally has 2 full sets of genes, one from each set of chromosomes (Thus one set from each parent)

Alleles and Genotypes
Genes can have variations, known as alleles. Two commonly known alleles are for eye color, the brown and blue allele. Just because you have an allele for blue eye color in the gene controlling color, doesn't mean your eyes are blue. You might also have the allele for brown eyes in your second copy of that gene. That means that your eyes will be brown, because the brown allele is dominant and the blue allele is recessive. A dominant allele will be expressed when there is only one copy, whereas a recessive allele will be expressed only when it is present in both copies of the gene controlling that trait. There are other variations besides the most commonly know dominant and recessive, but we probably don't need to go into that level of detail. In the previous example, your genotype would be Blue/Brown, indicating that you have the alleles for blue eyes and for brown eyes in the gene controlling eye color.

The fact that you have brown eyes is known as your phenotype (for eye color), and basically amounts to which alleles in your genotype are being expressed. Other alleles can affect how the body metabolizes (or fails to metabolize) a particular drug, or increases or decreases the risk of a particular disease. Having a particular allele doesn't necessarily mean that you will have a particular disease or react to a particular treatment in a certain way.

Sometimes, more than one gene in a group of closely linked genes tends to be inherited as a group, also known as a haplotype, or haplogroup.

HUGO and Gene Names
Most (if not all) genes relevant in genetic testing have already been identified as to their clinical significance. These genes will have a name and identifier issued by the Human Genome Organization, otherwise known as HUGO. HUGO maintains a database of names for genes known as the HGNC (Human Gene Nomenclature Committee) Database. These identifiers are the codes in the ontology of human genes.

Each gene occurs in particular locations on the chromosome, known as its locus, or in some cases, the gene may have multiple loci. Geneticists have special methods to represent loci. This basically involves recording the distance up (or down) the strand of DNA in the chromosome.

Mutations and Polymorphism
Something that is polymorphic has more than one (poly) form (morph). Many genes are polymorphic. Each form of a gene is known an allele, as I previously described above. Alleles that are common in the population are known as polymorphisms.

A genetic mutation is a permanent alteration in the form of a gene. Some of these alterations are detrimental, and others are advantageous, but many have no significant impact on the organism. A polymorphism is a variation that occurs in more than 1% of the population that does not cause disease. We tend to thnk of mutations as detrimental.

Mutations can be caused many different ways, including exposure to radiation or mutagenic chemicals, or simple accidents during replication. Mutations cannot be passed to offspring unless they occur in reproductive (sperm and egg) cells. It's unlikely that exposure to radiation or chemicals would ever cause a mutation like Spiderman or the Incredible Hulk, but it does make for fun reading.

Describing Genetic Variation
Just as you can "diff" two pieces of source code and build a script to turn one into another, you can also compare two gene sequences to each other, and explain how one differs from the other. In clinical genomics, these "edit scripts" also have a standard form, and can be used to describe a particular alteration that hasn't previously been identified. A nomenclature for describing these alterations has been recommended by the Human Gene Variation Society (HGVS) and can be found here: The use of a standardized nomenclature for describing gene alterations allows for subsequent review and analysis when new genetic research results become available.

Gene Sequences and DNA Markers
In order to identify genes and their alleles and polymorphisms we need sequences of nucleotides (A, C, G and T) to reference, known as DNA Markers. A collection of reference sequences has been put together in the GenBank® database maintained by the National Center for Biotechnology Information (NCBI).

Different kinds of polymorphisms have different names. The simplest is the change of a single nucleotide from one type to another, known as a SNP (for single nucleotide polymorphism), and pronounced "snip". These changes tend to occur in the DNA between genes, in areas that that are not functional. They are useful as DNA markers, identifying individuals or related individuals. NCBI maintains a database of SNPs known as dbSNP.

Talking Glossary of Genetics, National Human Genome Research Institute, January 1999, available on the web at
This is an excellent resource containing simple definitions of genetic terms, and a number of freely available images, including the two used in this article. The chromosome image was modified for this article.

Thanks to Dr. Kevin Hughes of Massachusetts General Hospital for his review an comments on an early draft of this article, and to Scott Bolte (also of GE Healthcare) and Mollie Ullman-Cullere of Partners Healthcare for educating me and providing excellent reference material for this series.

Wednesday, July 2, 2008

Healthcare Revolutions

In honor of Independence Day, I have a few rebellious thoughts on our healthcare system, and some revolutions that we might encourage. I happen to like fireworks, so consider this my contribution to an interesting Fourth of July.

Wipe out ICD for billing
Why is it that experts in the field of healthcare standards routinely comment on the fact that billing codes are not suitable for providing data useful for clinical care, and yet we are required to report care provided using billing codes. If we really want to improve healthcare, would it not make sense to use the same measures on both the clinical and billing side? One of the principles of Six Sigma (and similar process improvement initiatives) is that you need to be able to accurately measure inputs and outputs of a process in order to improve it. Furthermore, having appropriately calibrated measurements is vital to the success of these efforts.
Why have we invested so much time and effort in the US National program1 to promote reference terminologies like SNOMED CT, and yet we require the use of a vocabulary originally designed for reporting mortality statistics, and an outdated version at that so that providers can get paid? Wouldn't it make sense to require that billing be done using clincial codes? Why do we need to spend so much time dealing with two different coding systems? Why should providers be the ones who have to make the conversion from one to the other?

Here's a radical idea. Why don't we require that the values used for billing codes come from a clinical reference vocabulary like SNOMED CT. Furthermore, we could select a reasonable value set from SNOMED CT that would allow clinical users of that vocabulary to roll up their SNOMED CT codes into the billing value set automatically. If, for some arcane reason you have an absolute need to be able to map to a vocabulary such as ICD-10, then when you creating the billing value set, do so in such a way that the mapping to ICD-10 is also automatable.

One of the advantages of this revolution would be to accellerate the adoption of clinical reference vocabularies, as recommended by the NCVHS, the Consolidated Health Informatics Initiative, and ANSI/HITSP. Another potential advantage would be to increase the value of claims data to providers. A third rationale would be that the "instruments" used to measure the practice of care, and the cost of care, would be calibrated on the same scale.

Create a Healthcare Price Index
We note that consumers do not have a good way to understand the costs of healthcare, either direct or indirect. Yet we do have a way to compare fuel economy for different automobiles, and have had ways to compare the cost of living in two different cities for many years. Why can't we create a basket of healthcare goods and services that meet the needs of various healthcare constituencies, and use that as a standard measure?

Different healthcare providers could report their costs for each of the items in that basket of goods, and different insurers could also describe what the consumer's payroll deductions and out of pocket costs would be for goods and services. We would be able to easily determine which plans provided better value based on our own needs for items in that basket, and be able to compare the value given by various healthcare providers.

This may be radical, but it shouldn't be revolutionary. It just applies some of the same principles we've been applying to the economy to the economics of healthcare. I'll bet we could hire a blue ribbon panel to develop the basket of goods for the cost of say, two or three useful new terminology definitions.

Determine which jobs we need to eliminate
This breakdown shows that most of the costs in healthcare are labor. The most productive way to take costs out of the healthcare system would be to cut labor costs. Obviously it makes more sense to cut the most expensive labor costs before the lesser costs. So, the question to answer becomes, who are we going to get rid of, and what are we going to replace them with?

This question is so radical that I'd like to hear your own revolutionary thoughts...

1 You'll see me describe the ONC/AHIC/HITSP/CCHIT/NHIN/HISPC activities as the US National Program from time to time, usually when I've recently had to describe it to someone from outside the US.