Sunday, April 7, 2013

أنا كيث بون

For those that cannot read Arabic, a semi-literal translation is I am Keith Boone.  The Arabic script reads right to left, rather than left to right, and while there are no capital letters, each character in the script takes on a slightly different appearance depending on whether it appears at the beginning, middle or end of the word.

Arabic script is an Abdjad, essentially a syllabary but missing vowels (because those sounds are inferred by rules of the language).  Japanese writing uses a syllabary that has consonant sounds built in and from that we have the ひらがな and カタカナ writing systems as well as romanji transcriptions commonly used for those names (Hiragana and Katakana respectively).

In healthcare, one of the challenges with matching names is that the best matching algorithm varies depending upon the cultural patterns associated with the name.  And so, in a culture where the name patterns are Arabic, the name matching algorithm needs to account for that as a source of possible variation.  One of the popular names in the world is that of مُحَمَّد, also the name of the prophet of Islam.  There's really only one way to spell it Arabic, but at least 9 different spelling variations using the Roman alphabet.

In countries where Arabic is the official language it is still challenging because most software still doesn't deal well with the Arabic writing system, especially given the script changes necessary for rendering an the beginning, middle and end of the word.  It wasn't until the advent of smart fonts in the late-1990's/early-2000's that software could really deal with the script changes necessary to properly render Arabic script on many computers, and those capabilities weren't well integrated into operating systems (some would argue that they still aren't) until a decade or more later.  And so, Arabic names are still often romanized when entered into that software.

But knowing that the name was a romanized form of an Arabic name is a crucial hint that name matching software can use to better determine how to look up appropriate matches.

One might be tempted to offer this hint to the name matching software, for example using the IHE PDQ Version 3 Query's matching algorithm parameter.  In fact, I even suggested this idea myself.  But I'm probably very wrong to do so, as I think about it.

If I were designing a search algorithm to locate names in a master patient index, I'd probably leave the algorithm tuning to the computer.  Here's how I would go about it:

In the context of looking a patient up by name and potentially other demographics, there are many other features present that may provide this and many other hints about how to tune the name matching algorithm.  This includes location (e.g., Riyadh, where I am now), nationality, place of birth, et cetera, and these values may also be provided in the query.

These features, along with the character sequences in the name (and other fields), become a set of features which can be used to compare two names for a proximity match.  In the fields of machine learning and linguistics, there are sufficient ways to describe features and feature distances, and to combine these into matching algorithms.  It's a Big Data problem, and the computer will be better at determine which features will be helpful to decide on the matching algorithm.  I'd let it do that work, and it would do a much better job than my simple hint.

What might be the most important that could be provided in a name matching algorithm is something that the receiving system doesn't necessarily know, and cannot deduce or induce from the data that it is provided.  What information is that?  It might well be the keyboard layout or input device used to enter the name.  If the device is the tiny little keypad on my smartphone, the kinds of human input errors I make will be far different than if from my keyboard.

This is where human interface design intersects with something with what many would consider to be a largely compute bound, human interface free context.  But it's not.  There's spelling correction mistakes I make on my phone (or that my phone makes for me) that I would never make on a real keyboard.


  -- Keith