The official repository which contains the code and model checkpoints for our paper Tokenization Falling Short: On Subword Robustness in Large Language Models (Findings of EMNLP 2024).
- 21 September, 2024: 🎉 Our work has been accepted to EMNLP 2024 (Findings)! ⭐
Coming soon 🎠