Skip to content

Commit e00adbd

Browse files
committed
prompt pollution instruction planting
1 parent e169e85 commit e00adbd

30 files changed

+887
-43
lines changed
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
title: "Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation"
3+
date: 2024-02-22T22:00:06-08:00
4+
draft: true
5+
tags: [
6+
"aiml", "machine learning","ai injections","ttp"
7+
]
8+
twitter:
9+
card: "summary_large_image"
10+
site: "@wunderwuzzi23"
11+
creator: "@wunderwuzzi23"
12+
title: "Google Gemini (now Gemini) - Planting Instructions For Delayed Automatic Tool Invocation"
13+
description: "Attackers can pollute the prompt context of large language model applications and invoke tools, which otherwise might not be accessible."
14+
image: "https://embracethered.com/blog/images/2024/llm-planting-instructions.png"
15+
---
16+
17+
Last November, while testing `Google Bard` (now called `Gemini`) for vulnerabilities, I had a couple of interesting observations when it comes to automatic tool invocation.
18+
19+
## Confused Deputy - Automatic Tool Invocation
20+
21+
First, what do I mean by this... "automatic tool invocation"...
22+
23+
Consider the following scenario: An attacker sends a malicious email to a user containing instructions to call an external tool. Google named these tools `Extensions`.
24+
25+
When the user analyzes the email with an LLM, it interprets the instructions and calls the external tool, leading to a kind of `request forgery` or maybe better called **automatic tool invocation**.
26+
27+
It's the `Confused Deputy` problem that we first discussed and demonstrated with [ChatGPT and Plugins (and now AI Actions)](/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/), here with [Zapier Plugin](https://embracethered.com/blog/posts/2023/chatgpt-cross-plugin-request-forgery-and-prompt-injection./) and with [Chat with Code](https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/).
28+
29+
The same attack idea applies here with Google Gemini!
30+
31+
## Google's Mitigations
32+
33+
What I discovered is that it's possible to invoke Extensions, such as `Flights`, `Hotels` and also the `YouTube Extension` during an attack (for example when analyzing an image).
34+
35+
**BUT, these Extensions do not give access to personal information, such as emails or documents.**
36+
37+
To gain access to the user's data the `Workspace Extension` is used. However, the `Workspace Extension` is not automatically invoked.
38+
39+
Take a look, you can see how it invokes four extensions in one shot:
40+
41+
[![many extenions invoked by Gemini](/blog/images/2023/bard-many-extensions.png)](/blog/images/2023/bard-many-extensions.png)
42+
43+
44+
Observe how the Flights, Hotels, and other Extensions got invoked, but the `Workspace Extension` was not called.
45+
46+
**This means Google implemented a special kind of mitigation. If untrusted data enters the prompt context via accessing untrusted data (indirect prompt injection), then Google Gemini will not invoke all tools in the same conversation turn!**
47+
48+
### Threat Model
49+
50+
It seems Google identified this threat and prevents an adversary issuing commands and bringing the user's `Google Docs` or `Gmail` into the chat context during an attack. The `Workspace Extension` is not invoked in that situation.
51+
52+
The reason this would be a security issue is because:
53+
54+
1. Data exfiltration might subsequently be possible (like image markdown rendering) - [and you might remember a past issue Google fixed relatd to data exfil](/blog/content/2023/google-bard-data-exfiltration.md)
55+
2. Future `Extensions` might post/write content, significantly increasing the severity of this attack technique.
56+
57+
The concept of `Human in the Loop` is an important mitigation for anything that automatically takes action on behalf of the user. Especially because there is no reliable fix for prompt injection as of today.
58+
59+
So, while hacking along I had an interesting idea...
60+
61+
## Planting Instructions To Invoke Tools
62+
63+
What if the adversary "pollutes" the chat context during the prompt injection attack and plants instructions that will trigger the invocation of the `Workspace Extension` at a later stage?
64+
65+
The idea to plant instructions or special trigger commands is not new, and a rather obvious attack technique. However, what is new here was the question if such planted instructions could lead to an automatic (and unwanted) invocation of the `Workspace Extension` at a **later point without the user explicitly and knowingly authorizing the action.**
66+
67+
## Attack Overview
68+
69+
1. Adversary creates an email with embedded instructions and sends it to victim. The instructions state a future task, that is triggered with the next user request, or trigger keywords.
70+
2. The victim interacts/summarizes the email via Google Gemini.
71+
3. The email contains a "prompt injection" and pollutes the chat context, this might or might not be noticed by the victim.
72+
4. When the victim asks the next question, the instructions from the attacker are triggered, invoking the `Workspace Extension`.
73+
74+
Here is the example email I used, notice the highlighted instructions:
75+
76+
[![Email that pollutes the context with instructions](/blog/images/2023/google-bard-context-pollution-email.png)](/blog/images/2023/google-bard-context-pollution-email.png)
77+
78+
The goal was to retrieve the text from this Google Doc:
79+
80+
[![cats](/blog/images/2023/google-bard-context-pollution-document2.png)](/blog/images/2023/google-bard-context-pollution-document2.png)
81+
82+
Now, when using Google Gemini to retrieve the email the question was what would happen...
83+
84+
{{< youtube qYMt9QJFzmI >}}
85+
86+
<br>󠁎󠁩󠁣󠁥
87+
<br>
88+
<br>
89+
90+
**Note:** Even though this proof-of-concept worked, the exploit was a bit flaky, but as more capabable models got released I think it improved.
91+
92+
## Mitigation Ideas
93+
94+
Google's thought process around not invoking certain tools when untrusted data entered the chat context is an interesting approach that other vendors have not yet adopted. However, it can be bypassed as shown in this post and because of the lack of fixes for indirect prompt injection we have to stay alert.
95+
96+
Currently the security impact for Gemini is limited, since Google fixed the [Data Exfiltration angle via Image Markdown rendering](/blog/content/2023/google-bard-data-exfiltration.md). I did report this behavior and bypass to Google last November to raise awareness. It's still unclear how Google plans to address it, but they confirmed tracking the issue.
97+
98+
If there is a new data exfiltration vulnerability lurking somewhere, or a new Extension with some kind of "write" capabilities, then the impact combined with this technique would be high.
99+
100+
## Conclusion
101+
102+
Polluting the chat context and planting instructions is a powerful way for an adversary to persist.
103+
104+
The interesting new discovery here is that an LLM application might prevent the automatic invocation of a tool during the conversation turn the untrusted data entered the chat. But the LLM application might happily invoke such tools in subsequent conversations turns. This can be exploited by an adversary by planting instructions that execute at a later time.
105+
106+
Hope this was interesting and helpful.
107+
108+
Cheers.
109+
110+
## References
111+
112+
* [Google Gemini: Prompt Injection to Data Exfiltration](/blog/content/2023/google-bard-data-exfiltration.md)
113+
* [ChatGPT and Plugins (and now AI Actions)](/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/)
114+
* [ChatGPT Prompt Injection via Email To Data Exfil via Zapier Plugin](https://embracethered.com/blog/posts/2023/chatgpt-cross-plugin-request-forgery-and-prompt-injection./)
115+
* [Chat with Code Plugin Vulnerability](https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/)
116+
Loading
Loading
1.02 MB
Loading
1.49 MB
Loading

docs/images/2024/whoamipic.png

929 KB
Loading

docs/index.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,9 @@ <h2>2024</h2>
103103

104104
<ul>
105105
<li>
106+
<time datetime="2024-02-22 22:00:06 PST">Feb 22</time>
107+
<a href="/blog/posts/2024/llm-context-pollution-and-delayed-automated-tool-invocation/">Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation</a>
108+
</li><li>
106109
<time datetime="2024-02-14 03:30:17 PST">Feb 14</time>
107110
<a href="/blog/posts/2024/lack-of-isolation-gpts-code-interpreter/">ChatGPT: Lack of Isolation between Code Interpreter sessions of GPTs</a>
108111
</li><li>

docs/index.xml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,19 @@
77
<generator>Hugo -- gohugo.io</generator>
88
<language>en-us</language>
99
<copyright>(c) WUNDERWUZZI 2018-2024</copyright>
10-
<lastBuildDate>Wed, 14 Feb 2024 03:30:17 -0800</lastBuildDate><atom:link href="https://embracethered.com/blog/index.xml" rel="self" type="application/rss+xml" />
10+
<lastBuildDate>Thu, 22 Feb 2024 22:00:06 -0800</lastBuildDate><atom:link href="https://embracethered.com/blog/index.xml" rel="self" type="application/rss+xml" />
11+
<item>
12+
<title>Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation</title>
13+
<link>https://embracethered.com/blog/posts/2024/llm-context-pollution-and-delayed-automated-tool-invocation/</link>
14+
<pubDate>Thu, 22 Feb 2024 22:00:06 -0800</pubDate>
15+
16+
<guid>https://embracethered.com/blog/posts/2024/llm-context-pollution-and-delayed-automated-tool-invocation/</guid>
17+
<description>Last November, while testing Google Bard (now called Gemini) for vulnerabilities, I had a couple of interesting observations when it comes to automatic tool invocation.
18+
Confused Deputy - Automatic Tool Invocation First, what do I mean by this&amp;hellip; &amp;ldquo;automatic tool invocation&amp;rdquo;&amp;hellip;
19+
Consider the following scenario: An attacker sends a malicious email to a user containing instructions to call an external tool. Google named these tools Extensions.
20+
When the user analyzes the email with an LLM, it interprets the instructions and calls the external tool, leading to a kind of request forgery or maybe better called automatic tool invocation.</description>
21+
</item>
22+
1123
<item>
1224
<title>ChatGPT: Lack of Isolation between Code Interpreter sessions of GPTs</title>
1325
<link>https://embracethered.com/blog/posts/2024/lack-of-isolation-gpts-code-interpreter/</link>

docs/page/2/index.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,9 @@ <h2>2024</h2>
103103

104104
<ul>
105105
<li>
106+
<time datetime="2024-02-22 22:00:06 PST">Feb 22</time>
107+
<a href="/blog/posts/2024/llm-context-pollution-and-delayed-automated-tool-invocation/">Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation</a>
108+
</li><li>
106109
<time datetime="2024-02-14 03:30:17 PST">Feb 14</time>
107110
<a href="/blog/posts/2024/lack-of-isolation-gpts-code-interpreter/">ChatGPT: Lack of Isolation between Code Interpreter sessions of GPTs</a>
108111
</li><li>

docs/posts/2024/lack-of-isolation-gpts-code-interpreter/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ <h2 id="references">References</h2>
208208

209209
<ul class="pager">
210210

211-
<li class="next disabled"><a href="#">Newer <span aria-hidden="true">&rarr;</span></a></li>
211+
<li class="next"><a href="https://embracethered.com/blog/posts/2024/llm-context-pollution-and-delayed-automated-tool-invocation/">Newer <span aria-hidden="true">&rarr;</span></a></li>
212212

213213

214214
<li class="author-contact">

0 commit comments

Comments
 (0)