Skip to content

Dataset and analysis code for BEA2025 paper @ ACL: "Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring" (Almasi & Kristensen-McLachlan, 2025)

License

Notifications You must be signed in to change notification settings

INTERACT-LLM/alignment-drift-llms

Repository files navigation

alignment-drift-header

🚀 Overview

This repository contains the dataset and analysis from Almasi & Kristensen-McLachlan (2025):

Item Location Documentation
📦 Text Dataset (v3.0) data/v3.0_dataset.csv data/README.md
📦 Metrics Dataset (v3.0) metrics/*.csv metrics/README.md
🧪 Analysis src/ src/README.md
📊 Plots & Results plots/ & results/

Teacher-student dialogue simulations were performed in a separate repository:

Item Location Documentation
🛠️ Generation of Dialogues Interact-LLM repo (src/scripts/alignment-drift) README.md

Note: The prefix v3.0 for the data refers to the prompt version used to simulate the dialogues. See the prompts in the Interact-LLM repo.

🛠️ Technical Requirements

The code was run on Python 3.12.3 on both a macOS (15.3.1) and Ubuntu system (24.04). The project also requires:

Tool Installation
make Installed via Homebrew
uv Installed through this project's makefile (see Usage)
R 4.4.3 + R Markdown Installed separately via CRAN for R and Posit's RStudio for running R-Markdown (or an IDE of your liking).

⚙️ Usage

You can run the code using the makefile by entering the following command in the terminal:

make run-project

This command installs uv on macOS/Linux, sets up a virtual environment with the required Python dependencies, and finally runs the code.

If you prefer to run your own installation of uv (or already have it installed), you can run only the code directly:

make run-code

Note: This does not execute stats.rmd. It must be run seperately (requires R and R Markdown, see Technical Requirements).

📝 Citation

If you use our work, please cite:

@article{almasi2025alignmentdriftcefrpromptedllms,
  title={Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring}, 
  author={Mina Almasi and Ross Deans Kristensen-McLachlan},
  journal={arXiv preprint arXiv:2505.08351},
  year={2025},
  url={https://arxiv.org/abs/2505.08351},
  note={cs.CL}
}

Note: This paper has been accepted to the ACL workshop BEA2025 (20th Workshop on Innovative Use of NLP for Building Educational Applications). The final version, appearing in the ACL Anthology, is forthcoming.

✨ Acknowledgements

This work was made possible thanks to the following open-source resources:

See also metrics/README.md.

About

Dataset and analysis code for BEA2025 paper @ ACL: "Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring" (Almasi & Kristensen-McLachlan, 2025)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published