Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Performance #32

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Testing Performance #32

wants to merge 5 commits into from

Conversation

FloofCat
Copy link

@FloofCat FloofCat commented Jan 19, 2025

Hi,

We are actively testing the performance of our method on the RAID dataset and would like to check the effectiveness on the test dataset. However, please do not pull to the main branch yet.

Our algorithm set the scores of AI-generated text to 0 and human text to 1. Please lmk if anything needs to change here.

Please allow for the bot to do its evaluations on the latest commit only. Thank you!

@liamdugan
Copy link
Owner

Sure I can run evaluation for you, but you should probably change this:

Our algorithm set the scores of AI-generated text to 0 and human text to 1

Our evaluation script expects AI-generated text to be 1 and human-written text to be 0. I'll make this more clear in the documentation (and also add a bit more information about the run_evaluation function).

@FloofCat
Copy link
Author

FloofCat commented Jan 20, 2025

Labels have been flipped.

Please allow for the evaluation and thank you for adding the documentation!

Copy link

github-actions bot commented Jan 21, 2025

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

Interpret-Detector

Release date: 2025-01-19

I've committed detailed results of this detector's performance on the test set to this PR.

Warning

No aggregate score across all settings is reported here as some domains/generator models/decoding strategies/repetition penalties/adversarial attacks were not included in the submission. This submission will not appear in the main leaderboard; it will only be visible within the splits in which all samples were evaluated.

Warning

No aggregate score across all non-adversarial settings is reported here as some domains/generator models/decoding strategies/repetition penalties were not included in the submission.
If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

@liamdugan
Copy link
Owner

Done, please pull the results.json file and inspect it to see the evaluation results for individual splits of the RAID dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants