Skip to main content
The Keyword


A breakthrough to better represent human genetic diversity

A rendering of a DNA double helix, just above a schematic diagram showing the bases that compose it.

Today, a group of researchers reached a breakthrough in our understanding and representation of human genomics, helping create more inclusive and equitable genetic testing and treatment. A consortium of 119 scientists from 60 institutions — including engineers from Google Research — announced the first draft human pangenome reference in a Nature paper.

This new human pangenome — “pan” from the Greek word for “involving all members” — combines assembled genomes from 47 people from diverse ancestries around the world. Unlike the current human reference genome, which represents data from just one person at each point along the DNA, the pangenome reference includes data from many individuals at each position. This creates a new resource that better represents human genetic diversity, allowing scientists and doctors to more accurately diagnose and treat diseases and develop new therapeutics.

It is essential to the future of precision medicine that all people, regardless of ancestry, are able to get accurate genomic testing when needed. the pangenome will be a key ingredient in making that possible." Dr. Benedict Paten, Pangenome Project Co-Lead
and Associate Director, UCSC Genomics Institute

To contribute to the consortium’s efforts, Google engineers helped develop and apply deep learning approaches to solve genomics challenges. Engineers adapted their open-source tool DeepVariant, which uses convolutional neural networks to identify genetic variants. The consortium then used the adapted methods to improve pangenome analysis techniques and eliminate sequencing errors from the long, particularly hard-to-decode stretches of the human genome.

Google’s DeepConsensus, which uses transformers to correct errors in sequencing instrument data, helped to improve the accuracy of the data used to construct the pangenome. High accuracy is critical for a reference pangenome to ensure that it isn’t a source of error in genome analysis. Using DeepConsensus data, the consortium was able to develop a long-read assembler that achieved a final accuracy of more than 99.999%. You can learn even more about these deep learning approaches on our Google Research blog.

This breakthrough was only made possible through the collaboration of an international community of experts, including geneticists, engineers and ethicists. This demonstrates the progress made through diverse contributions — just like the pangenome itself.

Let’s stay in touch. Get the latest news from Google in your inbox.