Making data useful for public health

Sep 17, 2020

Katherine Chou

Director of Product Management, Google Health

Researchers around the world have used modelling techniques to find patterns in data and map the spread of COVID-19, in order to combat the disease. Modelling a complex global event is challenging, particularly when there are many variables—human behavior, evolving science and policy, and socio-economic issues—as well as unknowns about the virus itself. Teams across Google are contributing tools and resources to the broader scientific community of epidemiologists, analysts and researchers who are working to address the health and economic impacts of the virus.

Organizing the world’s data for epidemiological researchers

Lack of access to useful high-quality data has posed a significant challenge, and much of the publicly available data is scattered, incomplete, or compiled in many different formats. To help researchers spend more of their time understanding the disease instead of wrangling data, we've developed a set of tools and processes to make it simpler for researchers to discover and work with normalized high-quality public datasets.

With the help of Google Cloud, we developed a COVID-19 Open Data repository—a comprehensive, open-source resource of COVID-19 epidemiological data and related variables like economic indicators or population statistics from over 50 countries. Each data source contains information on its origin, and how it’s processed so that researchers can confirm its validity and reliability. It can also be used with Data Commons, BigQuery datasets, as well as other initiatives which aggregate regional datasets.

This repository also includes two Google datasets developed to help researchers study the impact of the disease in a privacy-preserving manner. In April, we began publishing the COVID-19 Community Mobility Reports, which provide anonymized insights into movement trends to understand the impact of policies like shelter in place. These reports have been downloaded over 16 million times and are now updated three times a week in 64 languages, with localized insights covering 12,000 regions, cities and counties for 135 countries. Groups including the OECD, World Bank and Bruegel have used these reports in their research, and the insights inform strategies like how public health could safely unwind social distancing policies.

The latest addition to the repository is the Search Trends symptoms dataset, which aggregates anonymized search trends for over 400 symptoms. This will help researchers better understand the spread of COVID-19 and its potential secondary health impacts.

Tools for managing complex prediction modeling

The data that models rely upon may be imperfect due a range of factors, including a lack of widespread testing or inconsistent reporting. That’s why COVID-19 models need to account for uncertainty in order for their predictions to be reliable and useful. To help address this challenge, we’re providing researchers examples of how to implement bespoke epidemiological models using TensorFlow Probability (TFP), a library for building probabilistic models that can measure confidence in their own predictions. With TFP, researchers can use a range of data sources with different granularities, properties, or confidence levels, and factor that uncertainty into the overall prediction models. This could be particularly useful in fine-tuning the increasingly complex models that epidemiologists are using to understand the spread of COVID-19, particularly in gaining city or county-level insights when only state or national-level datasets exist.

While models can help predict what happens next, researchers and policymakers are also turning to simulations to better understand the potential impact of their interventions. Simulating these "what if" scenarios involve calculating highly variable social interactions at a massive scale. Simulators can help trial different social distancing techniques and gauge how changes to the movement of people may impact the spread of disease.

Google researchers have developed an open-source agent-based simulator that utilizes real-world data to simulate populations to help public health organizations fine tune their exposure notification parameters. For example, the simulator can consider different disease and transmission characteristics, the number of places people visit, as well as the time spent in those locations. We also contributed to Oxford’s agent-based simulator by factoring in real world mobility and representative models of interactions within different workplace sectors to understand the effect of an exposure notification app on the COVID-19 pandemic.

The scientific and developer community are working on important work to understand and manage the pandemic. Whether it’s by contributing to open source initiatives or funding data science projects and providing Google.org Fellows, we’re committed to collaborating with researchers on efforts to build a more equitable and resilient future.

POSTED IN:

Making data useful for public health

Organizing the world’s data for epidemiological researchers

Tools for managing complex prediction modeling

Related stories