Convert your FHIR JSON -> XML and back here. The CDA Book is sometimes listed for Kindle here and it is also SHIPPING from Amazon! See here for Errata.

Monday, May 9, 2011

Some thoughts on Canonical Pedigrees

One of my colleagues is working on the HL7 Canonical Pedigree Project.  The point of this project is to develop reference content that could be used to test various representations of the pedigree.

One of the interesting challenges in Pedigree representation is being able to look the genetic information from the perspective of different probands.  Being able to look at a genetic history from different perspectives allows for a variety of different techniques to be used for analysis.

In order to represent the family tree, the HL7 Pedigree model allows for two persons to be represented with a coded relationship between them.  The coded relationship comes from the HL7 Family Relationship Role Type vocabulary.  The essential model is that the patient is related to (at least) one other person, who could in turn be related to other persons, et cetera.

There is no requirement for the "pedigree" graph produced to use any specific relationship (e.g., parent, sibling, spouse), unlike what would be found in a "historical pedigree" such as this one.  The vocabulary allows for exact (natural mother) and inexact (mother) relationships to be represented. Transforming when this sort of vocabulary is used to represent relationships, changing the viewpoint from one subject to another can be challenging.

I argued with my colleague that any canonical representation of a pedigree needs to also include a canonical representation of the relationships, and that the current vocabulary doesn't help at all, since it has a focal point that isn't really reversible.  Changing the proband requires changing the direction of relationships and associated vocabulary.

23 and me has a great video describing the cousin relationship which helped me work through some of this.  If you want to whether someone is your Nth cousin, and at how many removes they are, there's a simple answer.  Cousins share a common grandparent (or great-grandparent, et cetera), but not common parents (that would be sibling, nephew, aunt, et cetera).  To find the degree, count the number of greats between both parties to their common grandparent.  Take the largest number and add one.  If you and I share a common grand-parent, then the number of greats for both of us is 0, to which I add one, and discover we are first cousins.  Now, if it's my grandparent, but your great-grandparent.  To find the number removed, you are looking at the difference in generations.  Simply take the difference in the "great" count.

The unifying principal that I worked out is this:
In order to canonically represent relationships, you must only represent those between ancestors and their offspring.  You can take it in either direction, dealing with "begat" or "was begatted by" as the preferred direction.

In the canonical form, then, you wouldn't need to represent the "cousin" relationship directly.  Instead, you'd relate the two subject to their common *-grandparent.

We can construct a vocabulary to do that.  Just use F to represent natural father, and M to represent natural mother.  FF becomes my father's father, and FM my father's mother.  The vocabulary can be modified to be more concise.  Whenever F or M repeats, just put the repeat count following it.  FF could be represented as F2.

To deal with ambiguity of the ancestry (if X is my first cousin, is it through my father's or mother's side), we could simply use P to represent parent.  So, if X is my first cousin, we have a common grandparent, Y.  X is related to Y by P2, and I'm related to Y by P2.  If you don't know how far back the relationship goes, you could use the + operator.  My cousin (first, second, third or more) and I would be related to a common ancestor Y using P+.

The nice thing about using this form for a canonical relationship is that it doesn't matter who the proband is, the set of relationships that are in the pedigree don't have to be modified when the subject changes.

While this would seem to capture almost everything needed in a pedigree, there's one genetic relationship that isn't expressed.  See if you can discover which one.  The answer is in comments below.