LLM Annotator

Overview

LLM Annotator is a powerful framework for automated data annotation using state-of-the-art Large Language Models (LLMs). This tool allows users to leverage Claude 3-7 Sonnet, ChatGPT-4o, and DeepSeek for labeling and processing textual data efficiently.

Features

Supports multiple LLMs for flexible annotation: Claude 3-7 Sonnet and ChatGPT-4o.
Modular pipeline architecture, allowing customizable annotation workflows.
Batch Processing.

Setup

0. Create a conda runtime enviornment

conda create -n llm_annotator python=3.10
conda activate llm_annotator

1. Clone the Repository

git clone https://github.com/KingArthur0205/LLM-Annotator.git
cd LLM-Annotator

2. Install Dependencies

Install the package in editable mode:

pip install -e .

3. Set Up API Keys

LLM Annotator requires API keys for the LLMs. Create a .env file in the root directory by running cd LLM-Annotator, touch .env, and open .env with the following format:

OPENAI_API_KEY=your-chatgpt-4o-key
ANTHROPIC_API_KEY=your-claude-3-7-sonnet-key

Replace your-chatgpt-4o-key and your-claude-3-7-sonnet-keywith the respective API keys.

4. Handling Data Sources

Google Colab: Provide the Google Sheet ID for data loading.
Local Execution: Place the required files inside the data folder.

Usage

1. Running the Annotation Pipeline

Once set up, you can use the annotator in Python:

from llm_annotator.main import annotate

annotate(
    model_list=["gpt-4o", "claude-3-7"],
    obs_list=["146", "170"], # Use "all" to annotate all utterances
    feature="Mathcompetent",
    transcript_source="xxxxx", # Can be either path or Google ID
    sheet_source="xxxxx", # Can be either path or Google ID
    prompt_path="data/prompts/prompt.txt", # Path to the user prompt
    system_prompt_path="data/prompts/system_prompt.txt",
    n_uttr=30, # No. of utterances to include in one LLM request
    if_wait=True, # This keeps the program running until the annotations are generated.
    mode="CoT" # This sets the annotations with advanced reasoning techniques
    save_dir="/content/Drive/result", # Directory to save the results into
)

We provided a tutorial python notebook. To access, use this link.

2. Fetch

To fetch the result and generate annotations, we can use the fetch() function:

from llm_annotator.main import fetch

feature = "Mathcompetent"
save_dir = "/content/Drive/result"
time_stamp="2025-05-06_13:38:36"

fetch(feature=feature, timestamp=time_stamp, save_dir=save_dir)

Note: The batch_dir parameter can be ignored. In this case, the results of the last batch request will be fetched.

3. Structure

The structure of the setup is presented below.

llm_annotator
|--data
|   |--prompts
|   |   |--prompt1.txt, prompt2.txt, ...
|   |--Role Features.xlsx
|   |--transcript_data.csv
|--result
|   |--feature1
|   |   |--date_time1
|   |   |   |--claude-3-7.json # Meta-data of claude batch request
|   |   |   |--gpt-4o.json # Meta-data of GPT-4o batch request
|   |   |   |--metadata.json # Meta-data of runtime
|   |   |   |--atn_df.csv # Annotated Result
|   |--feature2
...

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.claude		.claude
Results		Results
data		data
public		public
src		src
temp		temp
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
11_6_long.txt		11_6_long.txt
12_3_user_prompt.txt		12_3_user_prompt.txt
Annotator_Tutorial.ipynb		Annotator_Tutorial.ipynb
MOL Roles Features.xlsx		MOL Roles Features.xlsx
README.md		README.md
example_local_llm.py		example_local_llm.py
pyproject.toml		pyproject.toml
system_prompt.txt		system_prompt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Annotator

Overview

Features

Setup

0. Create a conda runtime enviornment

1. Clone the Repository

2. Install Dependencies

3. Set Up API Keys

4. Handling Data Sources

Usage

1. Running the Annotation Pipeline

2. Fetch

3. Structure

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

EduNLP/LLM-Annotator

Folders and files

Latest commit

History

Repository files navigation

LLM Annotator

Overview

Features

Setup

0. Create a conda runtime enviornment

1. Clone the Repository

2. Install Dependencies

3. Set Up API Keys

4. Handling Data Sources

Usage

1. Running the Annotation Pipeline

2. Fetch

3. Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages