-
Notifications
You must be signed in to change notification settings - Fork 124
models roberta large openai detector
Description: RoBERTa Large OpenAI Detector is a fine-tuned transformer-based language model developed by OpenAI to detect text generated by GPT-2 models. The model has an accuracy of approximately 95% for detecting 1.5B GPT-2-generated text, but the developers note that accuracy may decrease as model sizes increase. The model should not be used to intentionally harm others or support efforts to evade detection, but the model could be used in research related to synthetic text generation. The model has limitations and biases, including disturbing stereotypes and harmful biases, which are discussed further in the associated paper. The model is trained using a sequence classifier based on RoBERTa Large and fine-tuned using the outputs of the 1.5B GPT-2 model. It is evaluated on test data consisting of 5,000 samples from the WebText dataset and 5,000 samples generated by a GPT-2 model. > The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model. ### Inference samples Inference type|Python sample (Notebook)|CLI with YAML |--|--|--| Real time|text-classification-online-endpoint.ipynb|text-classification-online-endpoint.sh Batch |entailment-contradiction-batch.ipynb| coming soon ### Model Evaluation Task| Use case| Dataset| Python sample (Notebook)| CLI with YAML |--|--|--|--|--| Text Classification|Detecting GPT2 Output|GPT2-Outputs | evaluate-model-text-classification.ipynb|evaluate-model-text-classification.yml ### Finetuning samples Task|Use case|Dataset|Python sample (Notebook)|CLI with YAML |--|--|--|--|--| Text Classification|Emotion Detection|Emotion|emotion-detection.ipynb|emotion-detection.sh Token Classification|Named Entity Recognition|Conll2003|named-entity-recognition.ipynb|named-entity-recognition.sh Question Answering|Extractive Q&A|SQUAD (Wikipedia)|extractive-qa.ipynb|extractive-qa.sh ### Sample inputs and outputs (for real-time inference) #### Sample input json { "inputs": { "input_string": ["Today was an amazing day!", "It was an unfortunate series of events."] } }
#### Sample output json [ { "0": "LABEL_0" }, { "0": "LABEL_0" } ]
Version: 7
Preview
computes_allow_list : ['Standard_NV12s_v3', 'Standard_NV24s_v3', 'Standard_NV48s_v3', 'Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC24rs_v3', 'Standard_NC6s_v2', 'Standard_NC12s_v2', 'Standard_NC24s_v2', 'Standard_NC24rs_v2', 'Standard_NC4as_T4_v3', 'Standard_NC8as_T4_v3', 'Standard_NC16as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_ND6s', 'Standard_ND12s', 'Standard_ND24s', 'Standard_ND24rs', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4']
license : mit
task : text-classification
View in Studio: https://ml.azure.com/registries/azureml/models/roberta-large-openai-detector/version/7
License: mit
SHA: 5002d695ecf610d8bbfb1fa0d14f1575185b4915
datasets: bookcorpus, wikipedia
evaluation-min-sku-spec: 2|0|7|14
evaluation-recommended-sku: Standard_DS2_v2
finetune-min-sku-spec: 4|1|28|176
finetune-recommended-sku: Standard_NC24rs_v3
finetuning-tasks: text-classification, token-classification, question-answering
inference-min-sku-spec: 2|0|7|14
inference-recommended-sku: Standard_DS2_v2
languages: en