Developers

Gemma 2 is now available to researchers and developers

Jun 27, 2024

[[read-time]] min read

Gemma 2 offers best-in-class performance, runs at incredible speed across different hardware and easily integrates with other AI tools.

Clement Farabet

VP of Research, Google DeepMind

Tris Warkentin

Director, Google DeepMind

AI has the potential to address some of humanity's most pressing problems — but only if everyone has the tools to build with it. That's why earlier this year we introduced Gemma, a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. We’ve continued to grow the Gemma family with CodeGemma, RecurrentGemma and PaliGemma — each offering unique capabilities for different AI tasks and easily accessible through integrations with partners like Hugging Face, NVIDIA and Ollama.

Now we’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements built in. In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December. And that’s now achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.

A new open model standard for efficiency and performance

We built Gemma 2 on a redesigned architecture, engineered for both exceptional performance and inference efficiency. Here’s what makes it stand out:

Outsized performance: At 27B, Gemma 2 delivers the best performance for its size class, and even offers competitive alternatives to models more than twice its size. The 9B Gemma 2 model also delivers class-leading performance, outperforming Llama 3 8B and other open models in its size category. For detailed performance breakdowns, check out the technical report.
Unmatched efficiency and cost savings: The 27B Gemma 2 model is designed to run inference efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU, significantly reducing costs while maintaining high performance. This allows for more accessible and budget-friendly AI deployments.
Blazing fast inference across hardware: Gemma 2 is optimized to run at incredible speed across a range of hardware, from powerful gaming laptops and high-end desktops, to cloud-based setups. Try Gemma 2 at full precision in Google AI Studio, unlock local performance with the quantized version with Gemma.cpp on your CPU, or try it on your home computer with an NVIDIA RTX or GeForce RTX via Hugging Face Transformers.

A chart showing Gemma 2 performance benchmarks

Built for developers and researchers

Gemma 2 is not only more powerful, it's designed to more easily integrate into your workflows:

Open and accessible: Just like the original Gemma models, Gemma 2 is available under our commercially-friendly Gemma license, giving developers and researchers the ability to share and commercialize their innovations.
Broad framework compatibility: Easily use Gemma 2 with your preferred tools and workflows thanks to its compatibility with major AI frameworks like Hugging Face Transformers, and JAX, PyTorch and TensorFlow via native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp and Ollama. In addition, Gemma is optimized with NVIDIA TensorRT-LLM to run on NVIDIA-accelerated infrastructure or as an NVIDIA NIM inference microservice, with optimization for NVIDIA’s NeMo to come. You can fine-tune today with Keras and Hugging Face. We are actively working to enable additional parameter-efficient fine-tuning options.¹
Effortless deployment: Starting next month, Google Cloud customers will be able to easily deploy and manage Gemma 2 on Vertex AI.

Explore the new Gemma Cookbook, a collection of practical examples and recipes to guide you through building your own applications and fine-tuning Gemma 2 models for specific tasks. Discover how to easily use Gemma with your tooling of choice, including for common tasks like retrieval-augmented generation.

Responsible AI development

We're committed to providing developers and researchers with the resources they need to build and deploy AI responsibly, including through our Responsible Generative AI Toolkit. The recently open-sourced LLM Comparator helps developers and researchers with in-depth evaluation of language models. Starting today, you can use the companion Python library to run comparative evaluations with your model and data, and visualize the results in the app. Additionally, we’re actively working on open sourcing our text watermarking technology, SynthID, for Gemma models.

When training Gemma 2, we followed our robust internal safety processes, filtering pre-training data and performing rigorous testing and evaluation against a comprehensive set of metrics to identify and mitigate potential biases and risks. We publish our results on a large set of public benchmarks related to safety and representational harms.

A chart showing Gemma 2 safety evaluations

Projects built with Gemma

Our first Gemma launch led to more than 10 million downloads and countless inspiring projects. Navarasa, for instance, used Gemma to create a model rooted in India’s linguistic diversity.

Developing for Indic languages: Gemma and Navarasa

10:25

Now, Gemma 2 will help developers get even more ambitious projects off the ground, unlocking new levels of performance and potential in their AI creations. We'll continue to explore new architectures and develop specialized Gemma variants to tackle a wider range of AI tasks and challenges. This includes an upcoming 2.6B parameter Gemma 2 model, designed to further bridge the gap between lightweight accessibility and powerful performance. You can learn more about this upcoming release in the technical report.

Getting started

Gemma 2 is now available in Google AI Studio, so you can test out its full performance capabilities at 27B without hardware requirements. You can also download Gemma 2’s model weights from Kaggle and Hugging Face Models, with Vertex AI Model Garden coming soon.

To enable access for research and development, Gemma 2 is also available free of charge through Kaggle or via a free tier for Colab notebooks. First-time Google Cloud customers may be eligible for $300 in credits. Academic researchers can apply for the Gemma 2 Academic Research Program to receive Google Cloud credits to accelerate their research with Gemma 2. Applications are open now through August 9.

POSTED IN:

Note for Hugging Face Transformer users: Fine-tuning Gemma 2 requires the use of attention soft-capping, which is only supported by the _eager_ attention implementation. For inference, faster implementations (FA2 / SDPA) that don’t support soft-capping can be safely used: The impact on quality is expected to be minimal while delivering significant efficiency gains.

Gemma 2 is now available to researchers and developers

A new open model standard for efficiency and performance

Built for developers and researchers

Responsible AI development

Projects built with Gemma

Getting started

Related stories

110 new languages are coming to Google Translate

Play the I/O Crossword, an AI twist on the classic word game

7 principles for getting AI regulation right

How we created our Google AI Essentials course

Quiz: Test your knowledge of Google’s May news

Introducing Google's new Academic Research Awards