How anonymized data helps fight against disease
Data has always been a vital tool in understanding and fighting disease — from Florence Nightingale’s 1800s hand drawn illustrations that showed how poor sanitation contributed to preventable diseases to the first open source repository of data developed in response to the 2014 Ebola crisis in West Africa.
When the first cases of COVID-19 were reported in Wuhan, data again became one of the most critical tools to combat the pandemic. A group of researchers, who documented the initial outbreak, quickly joined forces and started collecting data that could help epidemiologists around the world model the trajectory of the novel coronavirus outbreak. The researchers came from University of Oxford, Tsinghua University, Northeastern University and Boston Children’s Hospital, among others.
However, their initial workflow was not designed for the exponential rise in cases. The researchers turned to Google.org for help. As part of Google’s $100 million contribution to COVID relief, Google.org granted $1.25 million in funding and provided a team of 10 fulltime Google.org Fellows and 7 part-time Google volunteers to assist with the project.
Google volunteers worked with the researchers to create Global.health, a scalable and open-access platform that pulls together millions of anonymized COVID-19 cases from over 100 countries. This platform helps epidemiologists around the world model the trajectory of COVID-19, and track its variants and future infectious diseases. The need for trusted and anonymized case dataWhen an outbreak occurs, timely access to organized, trustworthy and anonymized data is critical for public health leaders to inform early policy decisions, medical interventions, and allocations of resources — all of which can slow disease spread and save lives.
The insights derived from “line-list” data (e.g. anonymized case level information), as opposed to aggregated data such as case counts, are essential for epidemiologists to perform more detailed statistical analyses and model the effectiveness of interventions. Volunteers at the University of Oxford started manually curating this data, but it was spread over hundreds of websites, in dozens of formats, in multiple languages. The HealthMap team at Boston Children’s Hospital also identified early reports of COVID-19 through automated indexing of news sites and official sources.
These two teams joined forces, shared the data, and published peer-reviewed findings to create a trusted resource for the global community.Enter the Google.org FellowshipTo help the global community of researchers in this meaningful endeavour, Google.org decided to offer the support of 10 Google.org Fellows who spent 6 months working full-time on Global.health, in addition to $1.25M in grant funding. Working hand in hand with the University of Oxford and Boston Children’s Hospital, the Google.org team spoke to researchers and public health officials working on the frontline to understand real-life challenges they faced when finding and using high-quality trusted data — a tedious and manual process that often takes hours. Upholding data privacy is key to the platform’s design.
The anonymized data used at Global.health comes from open-access authoritative public health sources, and a panel of data experts rigorously checks it to make sure it meets strict anonymity requirements. The Google.org Fellows assisted the Global.health team to design the data ingestion flow to implement best practices for data verification and quality checks to make sure that no personal data made its way into the platform. (All line-list data added to the platform is stored and hosted in Boston Children’s Hospital’s secure data infrastructure, not Google’s.)
Looking to the future
With the support of Google.org and The Rockefeller Foundation, Global.health has grown into an international consortium of researchers at leading universities curating the most comprehensive line-list COVID-19 database in the world. It includes millions of anonymized records from trusted sources spanning over 100 countries, including India.Today, Global.health helps researchers across the globe access data in a matter of minutes and a series of clicks.
The flexibility of the Global.health platform means that it can be adapted to any infectious disease data and local context as new outbreaks occur. Global.health lays a foundation for researchers and public health officials to access this data no matter their location, be it New York, São Paulo, Munich, Kyoto or Nairobi.