Baselines #75

rbroc · 2024-10-14T12:00:07Z

On top of the feature-based classifiers, we want to have the following baselines:

MinaAlmasi · 2024-10-14T12:02:52Z

Sentence embedding-based (sentence transformers)

MinaAlmasi · 2024-10-30T08:51:25Z

Sentence Embedding-based -> Sentence Transformers

Currently have embedded with a somewhat big Nvidia-model (see #80 desc.), but we can scale down e.g., with an e5 model.

Notes for that:

Need to check up again, but pretty sure the e5 models require a "passage: " prompt in front of all rows with text, which the current Nvidia model does not.
We might also not need to scale down to FP16 (with model.half()) if we use a smaller model,

MinaAlmasi · 2024-10-30T12:43:07Z

For LLM detector (bookkeeping)

Want to use a "small" (3b-ish max) model that we we will not be using for generation
Ruling out phi-3.5-mini since it has been trained on a lot of synthetic data

MinaAlmasi self-assigned this Oct 14, 2024

MinaAlmasi mentioned this issue Oct 28, 2024

Baselines: add results for embeddings #80

Merged