Today, Google DeepMind announced a new family of Gemini models designed for robotics. Gemini Robotics is a vision-language-action (VLA) model that takes natural language and images as input and outputs actions, allowing robots to physically move and perform tasks. The second model is Gemini Robotics-ER, a reasoning model that enhances skills like identifying objects and their parts in 3D space.
Take a look at what robots can do using these Gemini models, from folding origami to packing lunches to spelling words with Scrabble tiles.