3 things to know about Ironwood, our latest TPU
Today's most advanced AI models, like those powering complex thinking and calculations, need speed and efficiency from the hardware that powers them. That's why at Cloud Next in April, we unveiled Ironwood, our seventh-generation Tensor Processing Unit (TPU).
Ironwood is our most powerful, capable, and energy-efficient TPU yet, designed to power thinking, inferential AI models at scale.
By acting as a hugely efficient parallel processor, Ironwood excels at managing massive calculations and significantly minimizes the internal time required for data to shuttle across the chip. This breakthrough dramatically speeds up complex AI, making models run significantly faster and smoother across our cloud.
And now, Ironwood is here for Cloud customers.
Here are three things to know about it.
1. It’s purpose-built for the age of inference
As the industry’s focus shifts from training frontier models to powering useful, responsive interactions with them, Ironwood provides the essential hardware. It’s custom built for high-volume, low-latency AI inference and model serving. It offers more than 4X better performance per chip for both training and inference workloads compared to our last generation, making Ironwood our most powerful and energy-efficient custom silicon to date.
2. It’s a giant network of power
TPUs are a key component of AI Hypercomputer, our integrated supercomputing system designed to boost system-level performance and efficiency across compute, networking, storage and software. At its core, the system groups individual TPUs into interconnected units called pods. With Ironwood, we can scale up to 9,216 chips in a superpod. These chips are linked via a breakthrough Inter-Chip Interconnect (ICI) network operating at 9.6 Tb/s.
Part of an Ironwood superpod, directly connecting 9,216 Ironwood TPUs in a single domain.
This massive connectivity allows thousands of chips to rapidly communicate and access a staggering 1.77 Petabytes of shared High Bandwidth Memory (HBM), overcoming data bottlenecks for even the most demanding models. This efficiency significantly reduces the compute-hours and energy required for training and running cutting-edge AI services.
3. It’s designed for AI with AI
Ironwood is the result of a continuous loop at Google where researchers influence hardware design, and hardware accelerates research. While competitors rely on external vendors, when Google DeepMind needs a specific architectural advancement for a model like Gemini, they collaborate directly with their TPU engineer counterparts. As a result, our models are trained on the newest TPU generations, often seeing significant speedups over previous hardware. Our researchers even use AI to design the next chip generation — a method called AlphaChip — which has used reinforcement learning to generate superior layouts for the last three TPU generations, including Ironwood.