HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?
This is the official implementation of our final submission on SemEval 2024, Task 8. Paper is available on arXiv.
Clone
git clone https://github.com/dipta007/SemEval24-task8
Go to the project directory
cd SemEval24-task8
Install dependencies
conda env create -f environment.yml
conda activate sem24_task8
Download Data
gdown https://drive.google.com/drive/folders/1FrhMQ5QvMgaeSgcBmZbk7l_GbU-ga99P -O ./data --folder
Run trainer
python src/train.py --exp_name=EXP_NAME
'accumulate_grad_batches': 16,
'batch_size': 2,
'cls_dropout': 0.6,
'encoder_type': 'sen',
'loss_weight_con': 0.7,
'loss_weight_gen_text': 0.1,
'loss_weight_text': 0.8,
'lr': 1e-05,
'max_doc_len': 64,
'max_epochs': -1,
'max_sen_len': 4096,
'model_name': 'jpwahle/longformer-base-plagiarism-detection',
'seed': 42,
'validate_every': 0.04,
'weight_decay': 0.0