-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing Performance #32
base: main
Are you sure you want to change the base?
Conversation
Sure I can run evaluation for you, but you should probably change this:
Our evaluation script expects AI-generated text to be 1 and human-written text to be 0. I'll make this more clear in the documentation (and also add a bit more information about the |
Labels have been flipped. Please allow for the evaluation and thank you for adding the documentation! |
Eval run succeeded! Link to run: link Here are the results of the submission(s): Interpret-DetectorRelease date: 2025-01-19 I've committed detailed results of this detector's performance on the test set to this PR. Warning No aggregate score across all settings is reported here as some domains/generator models/decoding strategies/repetition penalties/adversarial attacks were not included in the submission. This submission will not appear in the main leaderboard; it will only be visible within the splits in which all samples were evaluated. Warning No aggregate score across all non-adversarial settings is reported here as some domains/generator models/decoding strategies/repetition penalties were not included in the submission. |
Done, please pull the |
Hi,
We are actively testing the performance of our method on the RAID dataset and would like to check the effectiveness on the test dataset. However, please do not pull to the main branch yet.
Our algorithm set the scores of AI-generated text to 0 and human text to 1. Please lmk if anything needs to change here.
Please allow for the bot to do its evaluations on the latest commit only. Thank you!