Discovering millions of datasets on the web
Across the web, there are millions of datasets about nearly any subject that interests you. If you’re looking to buy a puppy, you could find datasets compiling complaints of puppy buyers or studies on puppy cognition. Or if you like skiing, you could find data on revenue of ski resorts or injury rates and participation numbers. Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is. Over the past year, people have tried it out and provided feedback, and now Dataset Search is officially out of beta.
What's new in Dataset Search?
Based on what we’ve learned from the early adopters of Dataset Search, we’ve added new features. You can now filter the results based on the types of dataset that you want (e.g., tables, images, text), or whether the dataset is available for free from the provider. If a dataset is about a geographic area, you can see the map. Plus, the product is now available on mobile and we’ve significantly improved the quality of dataset descriptions. One thing hasn't changed however: anybody who publishes data can make their datasets discoverable in Dataset Search by using an open standard (schema.org) to describe the properties of their dataset on their own web page.
We have also learned how many different types of people look for data. There are academic researchers, finding data to develop their hypotheses (e.g., try oxytocin), students looking for free data in a tabular format, covering the topic of their senior thesis (e.g., try incarceration rates with the corresponding filters), business analysts and data scientists looking for information on mobile apps or fast food establishments, and so on. There is data on all of that! And what do our users ask? The most common queries include "education," "weather," "cancer," "crime," "soccer," and, yes, "dogs".
What datasets can you find in Dataset Search?
Dataset Search also gives us a snapshot of the data out there on the Web. Here are a few highlights. The largest topics that the datasets cover are geosciences, biology, and agriculture. The majority of governments in the world publish their data and describe it with schema.org. The United States leads in the number of open government datasets available, with more than 2 million. And the most popular data formats? Tables–you can find more than 6 million of them on Dataset Search.
The number of datasets that you can find in Dataset Search continues to grow. If you have a dataset on your site and you describe it using schema.org, an open standard, others can find it in Dataset Search. If you know that a dataset exists, but you can't find it in Dataset Search, ask the provider to add the schema.org descriptions and others will be able to learn about their dataset as well.
Dataset Search is out of beta, but we will continue to improve the product, whether or not it has the "beta" next to it. If you haven't already, take Dataset Search for a spin, and tell us what you think.