Skip to Main Content
Mitigating prompt injection attacks with a layered defense strategy
["How can teachers and students use AI?", "What are the newest features in Chrome?", "How can I learn new AI skills?"]

Security

Mitigating prompt injection attacks with a layered defense strategy



A diagram of Gemini’s actions based on the detection of the malicious instructions by content classifiers.

This diagram illustrates a content moderation flow where user input is evaluated for malicious instructions. Depending on whether malicious instructions are found in all, some, or none of the components (such as files or emails), the system responds in three ways: completely blocking the response, filtering the unsafe content from the response, or allowing the response unchanged. The use of colors (red for "bad," yellow for "caution/partial," green for "good") and iconographic shapes (X, ✓, warning) makes the flow intuitive and clear.

A diagram of Gemini’s actions based on additional protection provided by the security thought reinforcement technique.

This diagram illustrates a multi-layered content moderation flow. First, user input is evaluated for malicious instructions. Depending on whether these are found in all, some, or none of the components (such as files or emails), the system responds in three ways: blocking, filtering, or allowing. However, even responses initially deemed "safe" undergo a second layer of "security thinking reinforcement," where an additional reasoning step determines whether the final response should be blocked or allowed. The use of colors (red for blocked, yellow for partial, green for successful) and iconographic shapes makes the flow intuitive and clear.

Gemini in Gmail provides a summary of an email thread. In the summary, there is an unsafe URL. That URL is redacted in the response and is replaced with the text “suspicious link removed”.

Gemini in Gmail provides a summary of an email thread. In the summary, there is an unsafe URL. That URL is redacted in the response and is replaced with the text “suspicious link removed”.

The Gemini app with instructions to delete all events on Saturday. Gemini responds with the events found on Google Calendar and asks the user to confirm this action.

The image captures a confirmation dialog moment. The user requested to delete all events from a specific Saturday, and the assistant responded by identifying the three relevant events and displaying their details so the user could confirm the action before it was performed.

Gemini in Docs with instructions to provide a summary of a file. Suspicious content was detected and a response was not provided. There is a yellow security notification banner for the user and a statement that Gemini’s response has been removed, with a “Learn more” link to a relevant Help Center article.

The image captures an interaction where a user asked Gemini to summarize a file, but the AI ​​detected unsafe content (suspicious prompts or links) and therefore blocked the response, displaying an explicit security warning instead. The rest of the interface shows the standard controls for interacting and sending new prompts.