Skip to content

This repository hosts a Hugging Face Space that provides an API for submitting models to the Tox21 Leaderboard on Huggingface.

License

Notifications You must be signed in to change notification settings

ml-jku/tox21_gpt-oss_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tox21 GPT-OSS Classifier

This repository provides the evaluation code for the GPT-OSS model in the Tox21 Leaderboard.

Here a GPT-OSS model is evaluated on the Tox21 dataset.

Repository Structure

  • full_evaluation_run.py - description
  • eval_results.ipynb - description

Installation

To run the GPT-OSS classifier, clone the repository and install dependencies:

git clone https://github.com/ml-jku/tox21_gpt-oss_classifier.git
cd tox21_gpt-oss_classifier
pip install -r requirements.txt

Evaluation procedure

The pipeline consists of two components:

  • full_evaluation_run.py — generates predictions from the model.
  • eval_result.ipynb — inspects and analyzes the generated prediction file.

Running the evaluator (full_evaluation_run.py)

The evaluator loads the Tox21 test set, builds prompts for every target, runs the model via vLLM, and writes all predictions to a CSV file.

Example usage:

python full_evaluation_run.py \
    --test_csv path/to/tox21_test.csv \
    --model_name openai/gpt-oss-120b \
    --train_csv path/to/tox21_train.csv \
    --n_shots 0 \
    --output_dir results \
    --n_rollouts 1

What the script does

  • Load data
    • Loads the test set (test_csv).
    • Loads the training set if few-shot examples are enabled.
  • Construct prompts For each molecule–target pair, a chat prompt is created containing:
    • a fixed system instruction,
    • the SMILES string,
    • a description of the target,
    • optionally few-shot examples (sampled from the training data). Prompts are formatted through the model’s chat template (tokenizer.apply_chat_template).
  • Run vLLM inference
    • Generates model outputs for every prompt.
    • Supports multiple rollouts per sample (--n_rollouts).
    • Configurable temperature, reasoning effort, and max tokens.
  • Extract answer and type conversion
    • direct float parsing when possible,
    • otherwise the last valid number between 0 and 1 is used,
    • percentage formats are handled,
    • invalid outputs default to 0.
  • Save predictions

The final prediction file is written to:

results/<model_name>/<shot_setting>_<reasoning_effort>/predictions.csv

Inspecting results (eval_result.ipynb)

The notebook eval_result.ipynb is intended for:

  • loading and inspecting the generated predictions.csv,
  • computing performance metrics.

About

This repository hosts a Hugging Face Space that provides an API for submitting models to the Tox21 Leaderboard on Huggingface.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •