From 2f9c9abe9ed0f4cd579b5b41b2ec2e0c73740db0 Mon Sep 17 00:00:00 2001 From: Nikola Jovanovic Date: Mon, 24 Jun 2024 17:39:45 +0200 Subject: [PATCH] Website update --- .gitignore | 4 +++- index.html | 24 +++++++++++------------- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/.gitignore b/.gitignore index c138f46e..454445d9 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,3 @@ -.history/ \ No newline at end of file +.history/ +.jekyll-cache/ +_site/ \ No newline at end of file diff --git a/index.html b/index.html index a252b233..706e271d 100644 --- a/index.html +++ b/index.html @@ -74,7 +74,7 @@

- Paper + Paper (ICML 2024) @@ -211,8 +211,8 @@

What are spoofing attacks?< aligned to refuse any harmful prompts. We show some examples below.
  • In our experiments - we additionally demonstrate similar success across several other schemes, study how our attack scales - with query cost, and show success in the setting where the attacker paraphrases existing (non-watermarked) text.

    @@ -230,8 +230,8 @@

    What are scrubbing attacks? Scrubbing
    - Our stealing attacker can also strip the watermark from LLM outputs even in challenging settings (85% - success rate, 1% before our work), concealing misuse such as plagiarism. + Our attacker can also strip the watermark from LLM outputs even in challenging settings (>80% + success, below 25% before our work), concealing misuse such as plagiarism.

      @@ -246,10 +246,10 @@

      What are scrubbing attacks?
    • We show that this is not the case under the threat of watermark stealing. Our attacker can apply its partial knowledge of the watermark rules () to significantly boost the success rate of scrubbing - on long texts with no need for additional queries to the server. Notably, we boost scrubbing success - from 1% to 85% for the KGW2-SelfHash scheme. Similar results are obtained for several other schemes, as we - show in our experimental evaluation in from 1% to 85% for the KGW2-SelfHash scheme. The best baseline we are aware of achieves below 25%. + Similar results are obtained for several other schemes, as we show in our experimental evaluation in the paper. Below, we also show several examples.
    • Our results challenge the common belief that robustness to spoofing @@ -522,10 +522,8 @@

      Citation

      @article{jovanovic2024watermarkstealing,
         title = {Watermark Stealing in Large Language Models},
         author = {Jovanović, Nikola and Staab, Robin and Vechev, Martin},
      -  year = {2024},
      -  eprint={2402.19361},
      -  archivePrefix={arXiv},
      -  primaryClass={cs.LG}
      +  jorunal = {{ICML}},
      +  year = {2024}
       }