HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?
This is the official implementation of our final submission on SemEval 2024, Task 8. Paper is available on arXiv.
git clone
Go to the project directory
cd SemEval24-task8
Install dependencies
conda env create -f environment.yml
conda activate sem24_task8
Download Data
gdown -O ./data --folder
Run trainer
python src/ --exp_name=EXP_NAME
'accumulate_grad_batches': 16,
'batch_size': 2,
'cls_dropout': 0.6,
'encoder_type': 'sen',
'loss_weight_con': 0.7,
'loss_weight_gen_text': 0.1,
'loss_weight_text': 0.8,
'lr': 1e-05,
'max_doc_len': 64,
'max_epochs': -1,
'max_sen_len': 4096,
'model_name': 'jpwahle/longformer-base-plagiarism-detection',
'seed': 42,
'validate_every': 0.04,
'weight_decay': 0.0