Data Commons is using AI to make the world’s public data more accessible and helpful
Every moment, all around the world, governments, organizations, and many others are generating data on topics as widely varied as temperature, trade or rates of disease. It’s data that could be extraordinarily useful to understanding and addressing major societal challenges like climate change, hunger or epidemics. Fortunately, much of this data is publicly available, with more to come. Unfortunately, being publicly available is not the same as being easy to access and use. This is the gap that Data Commons, an initiative from Google, is working to bridge.
Data is often fragmented by state and country borders, collected and published by different agencies, research institutions and other non-governmental organizations, and shared in different formats on varying timelines. It can be difficult, time consuming, and cost prohibitive to make these public data sets work together in a way that’s useful to policymakers, researchers, nonprofit organizations, journalists, students and members of the general public trying to better understand societal issues and find solutions. Data Commons’ long-term vision is to do for publicly available data what Google Search does for the internet or Google Maps does for navigation – organize it and make it accessible and useful.
Our goal of making data and the insights from it more available to those seeking to understand and work on society’s most pressing challenges and opportunities is being powered by two innovations, with more to come.
First, since 2017, the Data Commons team has sought to standardize and process thousands of data sets from publicly available, reliable sources ranging from the United Nations’ Intergovernmental Panel on Climate Change to the Brazilian Institute of Geography and Statistics to the United States Department of Commerce. This required innovation to make it possible to bring together data in widely varying formats, schemas and access methods, and to create a Knowledge Graph with a single API and schema, creating one unified view. This unified view makes it possible for data-experienced users to accomplish in hours what would normally take weeks, if not longer. While having this data standardized and accessible was a huge step forward, making use of it via APIs and visualization tools still required a significant investment of time – and often coding skills – for someone to understand and use the data effectively.
Second, to address this issue and to make Data Commons even more usable, Data Commons is now harnessing the power of AI, specifically large language models (LLMs), to create a natural language interface that allows users to ask questions like: Which states in India have the highest poverty levels per capita? How do literacy rates compare to poverty there? How much has infant mortality changed over time in these states?
AI makes it possible to ask questions like: "Which countries in Africa have had the greatest increase in electricity access?" and "How does income correlate with diabetes in US counties?" or offer prompts like "Compare greenhouse gas emissions from agriculture in Europe vs their GDP?"
LLMs are used to understand the query and the results come straight from Data Commons, including a link to the original data source; thus the output is not generated by the LLM. This approach allows Data Commons to avoid some of the current known limitations of LLMs around factuality in some instances.
Data Commons does not collect or own any data, instead it draws on publicly available data from over 200 sources, covering thousands of data sets including demographics, economics, education, housing, public health, climate, sustainability, and biomedicine. There’s data from 194 countries, in some countries down to the state or county level. However, the data accessible so far isn’t evenly distributed nor is it complete – unfortunately data availability reflects many of the same equity challenges the world faces on other issues, so we currently have more data for the US, India, and OECD countries than countries in Africa, South America, and parts of Asia. More and ongoing work is needed to make additional and up-to-date data available. We hope more public data will be published to help fill the gaps, and seek to add more categories of data useful to better understand the world and enable those working to tackle pressing societal challenges. We are actively looking for additional data and partners to help fill in some of these gaps.
Data Commons is open source, open process and accessible to all. In addition to the Data Commons site, a subset of data points from Data Commons are used in responses to queries in Google Search. We are also partnering with organizations who are using Data Commons to tackle society’s challenges – the result is a growing ecosystem that allows groups like Resources for the Future, Feeding America, IIT Madras’ Robert Bosch Centre for Data Science and Artificial Intelligence, Stanford Doerr School of Sustainability, and Harvard University's Institute for Quantitative Social Science to have their own versions of Data Commons, providing organizations with a unified view of their own data with all the public data already accessible via Data Commons.
Marnie Webb, Chief Community Impact Officer for TechSoup, a longtime Google partner, shared how Data Commons can also be helpful to the smaller nonprofits her organization works with: “Data Commons gives grassroots organizations access to the data they need. It gives them the tools to ask questions about the needs in their community in the language they would use to ask a colleague a question, and to get reliable information in return, as if they had data scientists and data engineers on staff. What we're talking about is the democratization of information for better decision making, so that organizations can take smart risks to better serve their communities. We're talking about putting the power of data into the hands of those who know their communities best.”
For example, with funding from Google.org, TechSoup is helping nonprofits harness the power of Data Commons to assess and address societal challenges. For example, Cemefi is highlighting the intersections between hunger and gender in Mexico and Makaia is tracking economic and social growth in Colombia. TechSoup is illustrating the relationship between food security, farming, and climate change by bringing together data from sources like the USDA and Feeding America.
Data Commons is a work in progress. Though the team has been working on it since 2017, in some ways we’re just getting started – and we need others to continue to join us in this work. To make more data more accessible, we need partners helping to identify and fill data gaps. And we need organizations like TechSoup, Resources for the Future, Feeding America, and many more to put this data to work as they try to address some of the world’s biggest challenges. There’s still so much more to do, together.
Learn more about how to make data accessible via Data Commons.