Those of you who have been reading this blog for years are already familiar with what I've written in the past about Query Health. For those of you who haven't check the link.
Recently, I've been looking into All of Us, a precision medicine research program that takes the ideas of Query Health to the next level. The original thinking on Query Health was about taking the question to the data. All of Us has a similar approach, but instead of querying data in possibly thousands of information systems, it uses a raw data research repository to collect data, and a cloud-based infrastructure to support research access to the curated data that is prepared from the raw data sourced from thousands of information systems. I find the best detailed description today to be found in the All of Us Research Operational Protocol.
There's a lot to be learned from Query Health, and the first thing that any group putting together a large repository of curated and anonymized data is certainly going to be security and confidentiality. Anonymization itself is a difficult process, and given the large data sets being considered, there's no real way to fully make the data anonymous.
Numerous studies and articles have shown that you don't need much to identify a single individual from a large collection of data collected over time. A single physician may see 3-6 thousand patients in a year. Put data from two of them together an the overlap is going to be smaller. Add other data elements, and pretty soon you get down to a very small group of people, perhaps a group of one that combined with other data can pretty easily get you to the identity of a patient.
For Query Health, we had discussed this in depth, and regarded counts and categories smaller than 5 as being something that needs special attention (e.g., masking of results for small counts). There was a whole lot of other discussion, and unfortunately my memory of that part of the project (over 8 years old now), is rather limited (especially since it wasn't my primary focus).
Another area of interest is patient consent, and how that might related to "authorization" to access data via APIs from other external sources. A lot of this can be automated today using technologies like OAuth2, OpenID Connect, and for EHR data, SMART on FHIR. But as you look at the variety of health information data repositories that might be connected to All of Us through APIs, you wind up with a lot of proprietary APIs with a variety of OAuth2 implementations. That's another interesting standards challenge, probably not on the near-term horizon for All of Us, considering their present focus.
It's interesting how everything comes back eventually, only different. One of my ongoing roles seems to be "standards historian", something I never actually thought about. I'm guessing if you hang around long enough, that becomes one of the roles you wind up adopting by default.
Recently, I've been looking into All of Us, a precision medicine research program that takes the ideas of Query Health to the next level. The original thinking on Query Health was about taking the question to the data. All of Us has a similar approach, but instead of querying data in possibly thousands of information systems, it uses a raw data research repository to collect data, and a cloud-based infrastructure to support research access to the curated data that is prepared from the raw data sourced from thousands of information systems. I find the best detailed description today to be found in the All of Us Research Operational Protocol.
There's a lot to be learned from Query Health, and the first thing that any group putting together a large repository of curated and anonymized data is certainly going to be security and confidentiality. Anonymization itself is a difficult process, and given the large data sets being considered, there's no real way to fully make the data anonymous.
Numerous studies and articles have shown that you don't need much to identify a single individual from a large collection of data collected over time. A single physician may see 3-6 thousand patients in a year. Put data from two of them together an the overlap is going to be smaller. Add other data elements, and pretty soon you get down to a very small group of people, perhaps a group of one that combined with other data can pretty easily get you to the identity of a patient.
For Query Health, we had discussed this in depth, and regarded counts and categories smaller than 5 as being something that needs special attention (e.g., masking of results for small counts). There was a whole lot of other discussion, and unfortunately my memory of that part of the project (over 8 years old now), is rather limited (especially since it wasn't my primary focus).
Another area of interest is patient consent, and how that might related to "authorization" to access data via APIs from other external sources. A lot of this can be automated today using technologies like OAuth2, OpenID Connect, and for EHR data, SMART on FHIR. But as you look at the variety of health information data repositories that might be connected to All of Us through APIs, you wind up with a lot of proprietary APIs with a variety of OAuth2 implementations. That's another interesting standards challenge, probably not on the near-term horizon for All of Us, considering their present focus.
It's interesting how everything comes back eventually, only different. One of my ongoing roles seems to be "standards historian", something I never actually thought about. I'm guessing if you hang around long enough, that becomes one of the roles you wind up adopting by default.