|
| 1 | +--- |
| 2 | +title: "Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation" |
| 3 | +date: 2024-02-22T22:00:06-08:00 |
| 4 | +draft: true |
| 5 | +tags: [ |
| 6 | + "aiml", "machine learning","ai injections","ttp" |
| 7 | + ] |
| 8 | +twitter: |
| 9 | + card: "summary_large_image" |
| 10 | + site: "@wunderwuzzi23" |
| 11 | + creator: "@wunderwuzzi23" |
| 12 | + title: "Google Gemini (now Gemini) - Planting Instructions For Delayed Automatic Tool Invocation" |
| 13 | + description: "Attackers can pollute the prompt context of large language model applications and invoke tools, which otherwise might not be accessible." |
| 14 | + image: "https://embracethered.com/blog/images/2024/llm-planting-instructions.png" |
| 15 | +--- |
| 16 | + |
| 17 | +Last November, while testing `Google Bard` (now called `Gemini`) for vulnerabilities, I had a couple of interesting observations when it comes to automatic tool invocation. |
| 18 | + |
| 19 | +## Confused Deputy - Automatic Tool Invocation |
| 20 | + |
| 21 | +First, what do I mean by this... "automatic tool invocation"... |
| 22 | + |
| 23 | +Consider the following scenario: An attacker sends a malicious email to a user containing instructions to call an external tool. Google named these tools `Extensions`. |
| 24 | + |
| 25 | +When the user analyzes the email with an LLM, it interprets the instructions and calls the external tool, leading to a kind of `request forgery` or maybe better called **automatic tool invocation**. |
| 26 | + |
| 27 | +It's the `Confused Deputy` problem that we first discussed and demonstrated with [ChatGPT and Plugins (and now AI Actions)](/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/), here with [Zapier Plugin](https://embracethered.com/blog/posts/2023/chatgpt-cross-plugin-request-forgery-and-prompt-injection./) and with [Chat with Code](https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/). |
| 28 | + |
| 29 | +The same attack idea applies here with Google Gemini! |
| 30 | + |
| 31 | +## Google's Mitigations |
| 32 | + |
| 33 | +What I discovered is that it's possible to invoke Extensions, such as `Flights`, `Hotels` and also the `YouTube Extension` during an attack (for example when analyzing an image). |
| 34 | + |
| 35 | +**BUT, these Extensions do not give access to personal information, such as emails or documents.** |
| 36 | + |
| 37 | +To gain access to the user's data the `Workspace Extension` is used. However, the `Workspace Extension` is not automatically invoked. |
| 38 | + |
| 39 | +Take a look, you can see how it invokes four extensions in one shot: |
| 40 | + |
| 41 | +[](/blog/images/2023/bard-many-extensions.png) |
| 42 | + |
| 43 | + |
| 44 | +Observe how the Flights, Hotels, and other Extensions got invoked, but the `Workspace Extension` was not called. |
| 45 | + |
| 46 | +**This means Google implemented a special kind of mitigation. If untrusted data enters the prompt context via accessing untrusted data (indirect prompt injection), then Google Gemini will not invoke all tools in the same conversation turn!** |
| 47 | + |
| 48 | +### Threat Model |
| 49 | + |
| 50 | +It seems Google identified this threat and prevents an adversary issuing commands and bringing the user's `Google Docs` or `Gmail` into the chat context during an attack. The `Workspace Extension` is not invoked in that situation. |
| 51 | + |
| 52 | +The reason this would be a security issue is because: |
| 53 | + |
| 54 | +1. Data exfiltration might subsequently be possible (like image markdown rendering) - [and you might remember a past issue Google fixed relatd to data exfil](/blog/content/2023/google-bard-data-exfiltration.md) |
| 55 | +2. Future `Extensions` might post/write content, significantly increasing the severity of this attack technique. |
| 56 | + |
| 57 | +The concept of `Human in the Loop` is an important mitigation for anything that automatically takes action on behalf of the user. Especially because there is no reliable fix for prompt injection as of today. |
| 58 | + |
| 59 | +So, while hacking along I had an interesting idea... |
| 60 | + |
| 61 | +## Planting Instructions To Invoke Tools |
| 62 | + |
| 63 | +What if the adversary "pollutes" the chat context during the prompt injection attack and plants instructions that will trigger the invocation of the `Workspace Extension` at a later stage? |
| 64 | + |
| 65 | +The idea to plant instructions or special trigger commands is not new, and a rather obvious attack technique. However, what is new here was the question if such planted instructions could lead to an automatic (and unwanted) invocation of the `Workspace Extension` at a **later point without the user explicitly and knowingly authorizing the action.** |
| 66 | + |
| 67 | +## Attack Overview |
| 68 | + |
| 69 | +1. Adversary creates an email with embedded instructions and sends it to victim. The instructions state a future task, that is triggered with the next user request, or trigger keywords. |
| 70 | +2. The victim interacts/summarizes the email via Google Gemini. |
| 71 | +3. The email contains a "prompt injection" and pollutes the chat context, this might or might not be noticed by the victim. |
| 72 | +4. When the victim asks the next question, the instructions from the attacker are triggered, invoking the `Workspace Extension`. |
| 73 | + |
| 74 | +Here is the example email I used, notice the highlighted instructions: |
| 75 | + |
| 76 | +[](/blog/images/2023/google-bard-context-pollution-email.png) |
| 77 | + |
| 78 | +The goal was to retrieve the text from this Google Doc: |
| 79 | + |
| 80 | +[](/blog/images/2023/google-bard-context-pollution-document2.png) |
| 81 | + |
| 82 | +Now, when using Google Gemini to retrieve the email the question was what would happen... |
| 83 | + |
| 84 | +{{< youtube qYMt9QJFzmI >}} |
| 85 | + |
| 86 | +<br> |
| 87 | +<br> |
| 88 | +<br> |
| 89 | + |
| 90 | +**Note:** Even though this proof-of-concept worked, the exploit was a bit flaky, but as more capabable models got released I think it improved. |
| 91 | + |
| 92 | +## Mitigation Ideas |
| 93 | + |
| 94 | +Google's thought process around not invoking certain tools when untrusted data entered the chat context is an interesting approach that other vendors have not yet adopted. However, it can be bypassed as shown in this post and because of the lack of fixes for indirect prompt injection we have to stay alert. |
| 95 | + |
| 96 | +Currently the security impact for Gemini is limited, since Google fixed the [Data Exfiltration angle via Image Markdown rendering](/blog/content/2023/google-bard-data-exfiltration.md). I did report this behavior and bypass to Google last November to raise awareness. It's still unclear how Google plans to address it, but they confirmed tracking the issue. |
| 97 | + |
| 98 | +If there is a new data exfiltration vulnerability lurking somewhere, or a new Extension with some kind of "write" capabilities, then the impact combined with this technique would be high. |
| 99 | + |
| 100 | +## Conclusion |
| 101 | + |
| 102 | +Polluting the chat context and planting instructions is a powerful way for an adversary to persist. |
| 103 | + |
| 104 | +The interesting new discovery here is that an LLM application might prevent the automatic invocation of a tool during the conversation turn the untrusted data entered the chat. But the LLM application might happily invoke such tools in subsequent conversations turns. This can be exploited by an adversary by planting instructions that execute at a later time. |
| 105 | + |
| 106 | +Hope this was interesting and helpful. |
| 107 | + |
| 108 | +Cheers. |
| 109 | + |
| 110 | +## References |
| 111 | + |
| 112 | +* [Google Gemini: Prompt Injection to Data Exfiltration](/blog/content/2023/google-bard-data-exfiltration.md) |
| 113 | +* [ChatGPT and Plugins (and now AI Actions)](/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/) |
| 114 | +* [ChatGPT Prompt Injection via Email To Data Exfil via Zapier Plugin](https://embracethered.com/blog/posts/2023/chatgpt-cross-plugin-request-forgery-and-prompt-injection./) |
| 115 | +* [Chat with Code Plugin Vulnerability](https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/) |
| 116 | + |
0 commit comments