Monday, December 7, 2015

What is a Standard?

In my spare time [yes, such a thing does exist], I've been thinking about how my "PubMed for Standards" capstone project should index standards.  Which leads to the question in the title of this posting.

It's not the usual form of this question, but rather, what I'm trying to figure out is what is teh "unit of indexing" in the database.  Let's take some examples:  IHE Cross Enterprise Document Sharing (XDS), Web Access to DICOM Objects (WADO), DICOM Key Object Selection Document, HL7 Messaging Standard Version 2.3.1 (HL7 V2.3.1), HL7 V2.3.1 Patient Administration Message, HL7 Version 3, HL7 Clinical Document Architecture Release 2, HL7 Version 3 Person Registry, the IHE PCC Technical Framework, HL7 Consolidated CDA, HL7 CCD Version 1.1, and FHIR.

Now, how would you like to see these indexed in a retrieval system?  Each of these "publications" works in different ways.  XDS lives in the ITI Technical Framework, and is described in Section (Chapter) 10 of Volume 1, and several sections in Volumes 2a, 2b, and nearly all of 2x.  Web Access to DICOM Objects is the last of 18 sections in the DICOM Standard.  A Key Object Selection Document is a DICOM Information Object found in section A.35.4.1 of Part 3 of the DICOM standard.  HL7 Version 2.3.1 is 1026 page document with 12 chapters and 5 appendices, Patient Administration Messages make up one of these chapters (chapter 3). HL7 Version 3 is an aggregation of several standards published on the web by HL7 under the Version 3 title.  Clinical Document Architecture is one of those several standards published under the Universal Domains section of the previous publication, where Person Registry is a topic are under the Patient Administration domain of the Universal Domains section in the same.  The IHE PCC Technical framework is a collection of profiles developed by the Patient Care Coordination Domain.  HL7 Consolidated CDA is an implementation guide, the HL7 CCD Version 1.1 is a type of document described in the previous. FHIR is a standards framework: a collection of data types, resources, and protocols used for the development of health IT interfaces and systems.

Every single one of these may be relevant in a query for information about relevant standards.  As a system designer I may want to learn about all of FHIR, or just a single resource, all of CDA, or just a single template, all of DICOM or just a single part or part of a part, et cetera.

The challenge this creates for developing an appropriate index is trying to understand the granularity. Publication unit is just one level of granularity.  What makes the unit size is perhaps best understood as the smallest invariant unit size for provenance of the information: what is the smallest unit to which you'd provide author information about, or even better, what requires a briefer explanation of the thing being described as a whole.

What I've finally worked out is that an indexable unit is something to which I can identify some form of abstract: A subsection labeled abstract, description, introduction, purpose, scope or some similar heading is what I'm going to call a "Standard" -- for the purpose of indexing them.  Because that's the level at which they can be used or reused.

Where I won't go is in indexing templates beyond those of a document in CDA, or publication units below Implemetation Guide in FHIR, because the degree of proliferation once I break that level of granularity becomes unreasonable ... at least for the scope of my capstone project.

So, if I can reasonably find an abstract, and it isn't a fine grained data object, I would index it.  Where does that leave FHIR resources or OpenEHR templates?  Honestly, I don't know.  Both are as fine grained as CCDA or IHE PCC templates, and yet, they are also considered by some to be primary standards with some level of separately maintained provenance.  I think what I want to do is leave them out for now.

The next area for me to address is vocabularies and value sets, but the real answer for that is that it is out of scope.  I'm trying to fill a gap, and for value sets and vocabulary, UMLS, VSAC, and PHINVADS all address the location problem for these resources.  I don't need to do something to address that non-gap.

   Keith

0 comments:

Post a Comment