Skip to content

amazon-science/idioms-incontext-mt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Idioms in Context Dataset

This repository contains the "Idioms in Context" dataset used in our ACL 2024 paper: The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities.

Description

The dataset consists of idiomatic expressions in context and their human-written translations. It covers 2 language pairs (English-German and English-Russian) with 3 translation directions:

  1. English → German
  2. German → English
  3. Russian → English

The dataset is designed to evaluate the performance of large language models and machine translation systems in handling idiomatic expressions, which can be challenging due to their non-literal meanings.

Usage

If you use this dataset in your work, please cite our paper:

@misc{stap2024-idioms,
      title={The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities},
      author={David Stap and Eva Hasler and Bill Byrne and Christof Monz and Ke Tran},
      year={2024},
      eprint={2405.20089},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2405.20089},
}

Security

See CONTRIBUTING for more information.

License

This dataset is licensed under the CC-BY-NC-4.0 License.