Medical records can be tricky to access because of confidentiality and variability, but data-sharing efforts are helping to overcome these hurdles — without compromising patient privacy.
For the gastrointestinal condition known as ulcerative colitis, some physicians recommend using a particular drug twice a day, others, three times. But which protocol is the best way to help people with the condition to avoid surgery? Instead of launching a clinical trial, Peter Higgins, a gastroenterologist at the University of Michigan at Ann Arbor, examined the data.
Many health systems in the United States export clinical data from electronic health records (EHRs) into repositories known as health data warehouses for institutional use by researchers, Higgins says. Working with the University of Michigan’s health informaticians, he identified and compared people on the two protocols. The scientists found that giving people the drug three times a day seemed to result in fewer operations (J. A. Berinstein et al. Clin. Gastroenterol. Hepatol. 19, 2112–2120; 2021).
Such searches are complex because the underlying records are so variable, Higgins says. “It’s a little bit of a needle in a haystack hunt,” he explains, because the data are not standardized.
The variations in data formats, combined with regulations to protect patient privacy, make working with data warehouses challenging. Access to a repository is usually restricted to people within an institution, and international data protections can prove even more daunting. “The data are just truly not interoperable across health systems,” says Melissa Haendel, a data scientist at the University of Colorado Anschutz Medical Campus in Aurora.
Even for those trained in health informatics, learning how to work with such data is not trivial. “A lot of good research that could be done on the EHR is dropped because there’s a huge learning curve to using these systems,” says Charisse Madlock-Brown, a health-informatics researcher at the University of Tennessee Health Science Center in Memphis. Small institutions also often lack a health-informatics team that can assist biologists wanting to use these repositories, she says.
Spurred by the COVID-19 pandemic, researchers have begun to aggregate data from individual institutions in national repositories that are more accessible. In the United States, the National COVID Cohort Collaborative (N3C) is the largest patient-privacy-limited data set in the country’s history, says Haendel, who co-leads the effort. Supported by the US National Institutes of Health, N3C encompasses data from more than 70 institutions and holds patient-level information for 13 million individuals. The data include EHRs, imaging scans and genomic sequences of viral variants, all of which are described using a common data model
Send us a message with any questions or inquiries. We'll get back to you!