I tried 8 of our newest AI products and updates

Jun 05, 2024

Here's one Googler's experience trying out our newest AI products and updates at Google I/O 2024.

Chaim Gartenberg

Keyword Contributor

General summary

At Google I/O, attendees got hands-on experience with the latest AI products and updates. Gemini 1.5 Pro, a large-scale foundation model, can now summarize and analyze documents up to 1,500 pages long and is integrated into Workspace apps like Gmail and Docs. Imagen 3, a text-to-image model, can generate decorative text and letters. Gemini's overlay on Android phones provides context-aware suggestions and answers questions about content on the screen. Project Astra, a multimodal conversational AI, understands speech prompts and live video feeds, enabling new experiences like playing Pictionary.

Summaries were generated by Google AI. Generative AI is experimental.

Bullet points

Google I/O event showcased new AI products and updates.
Gemini 1.5 Pro offers improved long context window and document analysis.
Imagen 3 generates high-quality text-to-image, including decorative text.
Gemini overlay on Android provides context-aware suggestions for text and videos.
Project Astra combines speech and visual inputs for interactive experiences.

Summaries were generated by Google AI. Generative AI is experimental.

Shakespeare-ish

At Google I/O, new AI revealed,
Gemini's prowess, our minds are sealed.
Parsing leases, texts, with ease untold,
Knowledge distilled, in seconds, bold.

Imagen's artistry, a sight to behold,
Text to image, stories yet untold.
Letters adorned, with jam and balloons,
Menus crafted, with delicious tunes.

Astra's multimodal, a future so bright,
Speaks and sees, with conversational might.
Pictionary's challenge, Astra's triumph grand,
AI's potential, in our humble hand.

Summaries were generated by Google AI. Generative AI is experimental.

Explore other styles:

At I/O, we don’t just announce a bunch of news, like new Gemini models, AI agents and Android updates — we let developers, reporters and partners experience some of that newness in action for the very first time through product demos.

This year, I was lucky enough to spend the day at Shoreline Amphitheatre, where I/O happens, and dive into all of the demos. Here’s the inside scoop on a few of them.

For my first demo of the day, I watched Gemini Advanced parse through a 20-plus page property lease, full of complicated legal phrasing and gotchas. I could then ask questions about the lease, like whether my landlord would allow me to have a pet dog, or whether I’d have to pay any extra fees. (I’m personally looking forward to being able to use the feature to decipher my next convoluted lease when my apartment is up for renewal.)

The next demo took things up a notch: Two Googlers fed Gemini the PDF of an entire economics textbook that was hundreds of pages long. It would have taken me hours to read the book, but Gemini was able to summarize and highlight important topics to study in seconds. It also made a multiple choice quiz — creating not just the single correct choice but also three incorrect answers to try and trip me up — to help me prepare for a theoretical upcoming test.

Googlers Sid Lall (left) and Adam Kurzrok (right) demonstrate how Gemini Advanced can now summarize a hefty economic textbook or thousands of pages of documents.

Two Googlers stand in front of laptops, with a large TV screen behind them showing a Gemini Advanced conversation. On the table between the laptops are a printed out economics textbook and a huge stack of 1,500 pages of paper.

Both these demos took advantage of Gemini 1.5 Pro — which added the longest context window of any large-scale foundation model when we debuted it earlier this year. We’re opening up early Gemini 1.5 Pro access to Gemini Advanced subscribers and giving them the ability to upload documents to the tool right from Drive, so they can use Gemini to summarize or analyze documents up to 1,500 pages long.

Gemini 1.5 Pro is also coming to the side panel of Workspace apps like Gmail, Docs, Sheets, Slides, and Drive. To see this in action, I used Gemini in Gmail to summarize a sample email of a weekly school report, and pull out specific details like which activities were for 7th-grade students, or what the packing list for an overnight trip was.

Gemini’s side panel can help you answer key questions about your content in Gmail, Drive and more.

The improved long context window can even pull information from multiple documents when responding to a single prompt. In the side panel in Docs, I asked for help writing a sample letter to a potential job candidate — in the prompt I linked to the job description document and the applicant’s PDF portfolio, both of which were in my Drive — and instantly received a email draft, which factored in relevant details from both documents.

Gemini 1.5 Pro isn’t our only shiny new model, though: I also got to try the freshly-announced Imagen 3, our highest-quality text-to-image model yet. One of the new abilities I was excited about was its ability to generate decorative text and letters, so I put it through its paces. I started by asking for a stylized alphabet — like letters spelled out in jam on toast, or with silver balloons floating in the sky. Imagen 3 generated a full alphabet of letters, which I could then use to type out my own (delicious) menus.

After my Imagen 3 interlude, I continued with more Gemini demos. In one of them, I could pull up Gemini’s overlay on an Android phone and ask questions about anything on the screen. This really showed how we’re not only expanding what you can ask Gemini, but we’re also making Gemini context aware, so it can anticipate your needs and provide helpful suggestions.

The use case here was a lengthy oven manual. Whether it's a demo or real life, that's not something I'd be excited about reading. Instead of skimming through the document, I pulled up Gemini and immediately got an "Ask this PDF" suggestion. I tested questions like "how do I update the clock" and quickly got accurate answers. It worked just as well with YouTube videos. Instead of watching a 20-minute workout video, I asked a quick question about how to modify planks, got an answer, and was on my way onto the next demo, where I tested a new conversation mode called Gemini Live that lets you talk with Gemini in the app, no typing required.

Speaking with Gemini was a different experience than the traditional chatbot interface: Gemini’s answers are a lot more conversational than the paragraphs of texts and bullet-pointed lists you might usually get. In my demo, I learned you could even cut off Gemini in the middle of an answer. After asking for a list of kid’s activities for a summer vacation, I was able to interrupt a list of suggestions to dive in deeper on what materials I’d need for tie-dying a shirt.

The Project Astra — or “advanced seeing and talking responsive agent” — demo took things a step further to show the cutting edge of where our conversational AI projects are heading.

Our AI Sandbox, where developers and attendees tried out demos like Project Astra and other creative AI experiments, like MusicFX’s DJ Mode.

Instead of only working with whatever is on your screen, or the information that you’ve typed into a chat box, Astra’s multimodal capabilities can understand conversational speech prompts and live video feeds at the same time to unlock new kinds of AI experiences.

Astra’s alliteration demo started out simply: I showed the camera — an overhead camera in this setup, but Astra can also use a phone camera or a camera on a wearable device — an object, like a banana or a piece of bread, and Gemini riffed on it with an alliterative sentence. I added more objects, and Gemini kept the conversation going, from “Bright bananas bask beautifully on the board” with a single fruit to “Culinary creations can catch the eye” when presented with a whole buffet board.

Astra alliterates with bananas, baguettes… and anything else you can show it.

Two side by side images. The left shows the Project Astra demo, with Gemini having a conversation that comes up with alliterative phrases based on the banana, hot dog and baguette in front of it. On the right, a large shelf filled with toys and other objects sits for people to use in the demo.

Another Astra demo let me play Pictionary with Gemini: a seemingly simple interaction, but one that required the agent to understand images, remember what had been drawn each round of the game and use general knowledge to actually guess what I was drawing. In one demo, Astra knew that a circle wasn’t enough to base a guess on, but as I added lines underneath it, it went quickly from ID-ing a stick figure to recognizing that a person holding up a skull emoji was Hamlet.

Astra is undefeated at Pictionary.

Two side by side images showing games of Pictionary with Gemini in the Astra demo. The left sees Gemini guess an image of a tree off a green squiggle, while the right shows a guess of Hamlet for a stick figure holding a skull emoji.

Moving through the AI Sandbox and other demo stations felt like a glimpse into tomorrow. It was also humbling: Astra beat me at Pictionary in multiple rounds!

POSTED IN:

I tried 8 of our newest AI products and updates

General summary

Bullet points

Shakespeare-ish

Explore other styles:

Related stories