Convert your FHIR JSON -> XML and back here. The CDA Book is sometimes listed for Kindle here and it is also SHIPPING from Amazon! See here for Errata.

Monday, July 7, 2008

Understanding Genetics

You can't do anything about your genetic background or your family history, but you can do something about the medicine you put in your mouth.
-- Dr. Andrew Glass of the Center for Health Research in Portland

My interest in genetics and family history started with the development of the Family History section of the HL7 Continuity of Care Document. It was recently expanded by the introduction of the Personalized Healthcare use case into the 2008 work cycle for ANSI/HITSP. We are all fairly well aware by now that increasing a provider's knowledge of a patient's genetics and family history will allow them to better select effective management. Over the course of the last six months, many of us have been getting a crash course in genetics, genomics and genetic testing, and the need for structured family histories. What follows below is what I have learned over the last six months about genetics. Given that this material is as new to me as it is to you, I've had this material reviewed by an expert for accuracy.

This is the first part of a three part series on Genetics and Family History, describing enough genetics for healthcare IT implementers that need to review and/or implement specifications produced by ANSI/HITSP for the Personalized Healthcare use case.

Part 1: Understanding Genetics follows below.

Part 2: Reporting Genetic Results will describe the standards needed to exchange the information described in Part 1.

Part 3: Family History and Risk Assessment will describe the necessary information to communicate in family histories, and the importance of this information in assessing risk, and determining the need for genetic testing or additional treatment.

Understanding Genetics
Genomic and family history data is an excellent source of information on health risks for a variety of conditions, both chronic and acute. By using family history or genetic testing to identify patients at high risk, the medical system is better able to predict the risk of disease, allowing patients and providers to make better care plans to address those risks, and ensure earlier detection and better preventative efforts.

Genetic information can also help predict how effective a medication will be, providing for better care by reducing side effects, avoiding toxicity and unnecessary therapies.

However, before we can begin to incorporate genetic testing data into EHR systems, we need to understand enough clinical genomics to correctly incorporate these results into healthcare IT systems.

I'm certain that most of you understand what DNA and chromosomes are, and that most humans have 22 pairs of chromosomes plus a pair of sex chromosomes (XX for females or XY for males) Half (23 chromosomes) come from each parent, for 46 altogether. However, there is a great deal of specialized vocabulary that goes beyond chromosomes that we all need to understand. I've translated this very specialized vocabulary into language that engineers can understand (recall that I consider myself to be in this category).

Most of you recognize a picture of a chromosome as a vaguely X shaped object. You can think of this as two identical lengths or strands of rope called chromatids (1). These strands are effectively tied in a knot together at the middle called the centromere (2). The strands at what is usually depicted in the top part of the chromosome are the short arms (3) and those at the bottom are the long arms (4).

Genes and Nucleotides
Along each arm are sequences of nucleotides, typically represented using the letters A, C, G and T making up the DNA. DNA and RNA are known as nucleic acids because they are made up of nucleotides. A gene is a distinct DNA sequence that provides instructions for producing a single protein that in turn produces a single trait, such as eye color (actually, there are several genes controlling eye color, and a single protein can cause multiple effects, but let’s keep it simple for the moment. The set of genes belonging to a person make up their genotype. A person normally has 2 full sets of genes, one from each set of chromosomes (Thus one set from each parent)

Alleles and Genotypes
Genes can have variations, known as alleles. Two commonly known alleles are for eye color, the brown and blue allele. Just because you have an allele for blue eye color in the gene controlling color, doesn't mean your eyes are blue. You might also have the allele for brown eyes in your second copy of that gene. That means that your eyes will be brown, because the brown allele is dominant and the blue allele is recessive. A dominant allele will be expressed when there is only one copy, whereas a recessive allele will be expressed only when it is present in both copies of the gene controlling that trait. There are other variations besides the most commonly know dominant and recessive, but we probably don't need to go into that level of detail. In the previous example, your genotype would be Blue/Brown, indicating that you have the alleles for blue eyes and for brown eyes in the gene controlling eye color.

The fact that you have brown eyes is known as your phenotype (for eye color), and basically amounts to which alleles in your genotype are being expressed. Other alleles can affect how the body metabolizes (or fails to metabolize) a particular drug, or increases or decreases the risk of a particular disease. Having a particular allele doesn't necessarily mean that you will have a particular disease or react to a particular treatment in a certain way.

Sometimes, more than one gene in a group of closely linked genes tends to be inherited as a group, also known as a haplotype, or haplogroup.

HUGO and Gene Names
Most (if not all) genes relevant in genetic testing have already been identified as to their clinical significance. These genes will have a name and identifier issued by the Human Genome Organization, otherwise known as HUGO. HUGO maintains a database of names for genes known as the HGNC (Human Gene Nomenclature Committee) Database. These identifiers are the codes in the ontology of human genes.

Each gene occurs in particular locations on the chromosome, known as its locus, or in some cases, the gene may have multiple loci. Geneticists have special methods to represent loci. This basically involves recording the distance up (or down) the strand of DNA in the chromosome.

Mutations and Polymorphism
Something that is polymorphic has more than one (poly) form (morph). Many genes are polymorphic. Each form of a gene is known an allele, as I previously described above. Alleles that are common in the population are known as polymorphisms.

A genetic mutation is a permanent alteration in the form of a gene. Some of these alterations are detrimental, and others are advantageous, but many have no significant impact on the organism. A polymorphism is a variation that occurs in more than 1% of the population that does not cause disease. We tend to thnk of mutations as detrimental.

Mutations can be caused many different ways, including exposure to radiation or mutagenic chemicals, or simple accidents during replication. Mutations cannot be passed to offspring unless they occur in reproductive (sperm and egg) cells. It's unlikely that exposure to radiation or chemicals would ever cause a mutation like Spiderman or the Incredible Hulk, but it does make for fun reading.

Describing Genetic Variation
Just as you can "diff" two pieces of source code and build a script to turn one into another, you can also compare two gene sequences to each other, and explain how one differs from the other. In clinical genomics, these "edit scripts" also have a standard form, and can be used to describe a particular alteration that hasn't previously been identified. A nomenclature for describing these alterations has been recommended by the Human Gene Variation Society (HGVS) and can be found here: The use of a standardized nomenclature for describing gene alterations allows for subsequent review and analysis when new genetic research results become available.

Gene Sequences and DNA Markers
In order to identify genes and their alleles and polymorphisms we need sequences of nucleotides (A, C, G and T) to reference, known as DNA Markers. A collection of reference sequences has been put together in the GenBank® database maintained by the National Center for Biotechnology Information (NCBI).

Different kinds of polymorphisms have different names. The simplest is the change of a single nucleotide from one type to another, known as a SNP (for single nucleotide polymorphism), and pronounced "snip". These changes tend to occur in the DNA between genes, in areas that that are not functional. They are useful as DNA markers, identifying individuals or related individuals. NCBI maintains a database of SNPs known as dbSNP.

Talking Glossary of Genetics, National Human Genome Research Institute, January 1999, available on the web at
This is an excellent resource containing simple definitions of genetic terms, and a number of freely available images, including the two used in this article. The chromosome image was modified for this article.

Thanks to Dr. Kevin Hughes of Massachusetts General Hospital for his review an comments on an early draft of this article, and to Scott Bolte (also of GE Healthcare) and Mollie Ullman-Cullere of Partners Healthcare for educating me and providing excellent reference material for this series.


  1. Hi Keith,

    Looking forward to part 2 of this article! My wife is a genetic counselor, and I deal with electronic health records, so it's definitely an area of interest. I've been curious as to how reporting standardized genetic testing information will pan out, especially with regards to family history, privacy, and the electronic health record. Thanks for taking the time to post, I've enjoyed reading your blog.

  2. I hope you enjoy part two now posted. There's a wee bit at the end about genetics counseling.