Testing Performance #32

FloofCat · 2025-01-19T17:52:49Z

Hi,

We are actively testing the performance of our method on the RAID dataset and would like to check the effectiveness on the test dataset. However, please do not pull to the main branch yet.

Our algorithm set the scores of AI-generated text to 0 and human text to 1. Please lmk if anything needs to change here.

Please allow for the bot to do its evaluations on the latest commit only. Thank you!

This reverts commit 53e1667.

liamdugan · 2025-01-20T06:47:38Z

Sure I can run evaluation for you, but you should probably change this:

Our algorithm set the scores of AI-generated text to 0 and human text to 1

Our evaluation script expects AI-generated text to be 1 and human-written text to be 0. I'll make this more clear in the documentation (and also add a bit more information about the run_evaluation function).

FloofCat · 2025-01-20T07:46:13Z

Labels have been flipped.

Please allow for the evaluation and thank you for adding the documentation!

github-actions · 2025-01-21T12:00:55Z

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

Interpret-Detector

Release date: 2025-01-19

I've committed detailed results of this detector's performance on the test set to this PR.

Warning

No aggregate score across all settings is reported here as some domains/generator models/decoding strategies/repetition penalties/adversarial attacks were not included in the submission. This submission will not appear in the main leaderboard; it will only be visible within the splits in which all samples were evaluated.

Warning

No aggregate score across all non-adversarial settings is reported here as some domains/generator models/decoding strategies/repetition penalties were not included in the submission.
If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

liamdugan · 2025-01-21T12:08:20Z

Done, please pull the results.json file and inspect it to see the evaluation results for individual splits of the RAID dataset.

FloofCat added 3 commits January 16, 2025 16:19

DO NOT PULL, this is a test for our performance.

53e1667

Revert "DO NOT PULL, this is a test for our performance."

0dddebd

This reverts commit 53e1667.

DO NOT PULL, this is purely for testing the performance of our method!

151ffa0

FloofCat had a problem deploying to raid-main January 19, 2025 17:52 — with GitHub Actions Failure

Flip labels; test evaluation only

ddd736c

FloofCat temporarily deployed to raid-main January 20, 2025 07:45 — with GitHub Actions Inactive

leaderboard: add eval results (liamdugan#32)

657fe3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing Performance #32

Testing Performance #32

FloofCat commented Jan 19, 2025 •

edited

Loading

liamdugan commented Jan 20, 2025

FloofCat commented Jan 20, 2025 •

edited

Loading

github-actions bot commented Jan 21, 2025 •

edited

Loading

liamdugan commented Jan 21, 2025

Testing Performance #32

Are you sure you want to change the base?

Testing Performance #32

Conversation

FloofCat commented Jan 19, 2025 • edited Loading

liamdugan commented Jan 20, 2025

FloofCat commented Jan 20, 2025 • edited Loading

github-actions bot commented Jan 21, 2025 • edited Loading

Interpret-Detector

liamdugan commented Jan 21, 2025

FloofCat commented Jan 19, 2025 •

edited

Loading

FloofCat commented Jan 20, 2025 •

edited

Loading

github-actions bot commented Jan 21, 2025 •

edited

Loading