- This repository aims to collect and categorize GEC (Grammatical Error Correction) papers.
- Unlike NLP-progress, GEC-Info does not consider performance on benchmarks.
- Authors and conferences are also not be considered.
- The papers are limited to refereed papers in international conferences for now.
- This is not the case for survey papers.
- Pull Requests for adding papers are accepted. Please make a commit changing only lines regarding the addition of papers (and take care of changing by auto-formatting).
- You can also request to add papers as an issue.
It can also be viewed on GitHub Pages
- Surveys
- Shared Tasks
- Libraries
- Datasets
- Performance Measures
- Quality Estimation
- Models
- Ensembles / Post-processing
- Strategies
- Data Augmentation
- Analyses
- Other Tools
- Spoken Domain
- Applications
- Projects
- Other Materials
- Related Tasks
- Other Languages
| Title | Year | Page | Note |
|---|---|---|---|
| "Automated Grammatical Error Correction: A Comprehensive Review" | 2017 | [paper] | |
| "A Comprehensive Survey of Grammar Error Correction" | 2020 | [paper] | |
| "Recent Trends in the Use of Deep Learning Models for Grammar Error Handling" | 2020 | [paper] | |
| "Grammatical Error Correction: A Survey of the State of the Art" | 2022 | [paper] |
| Name | Year | Paper | Note |
|---|---|---|---|
| HOO 2011 | 2011 | [paper] | [website] |
| HOO 2012 | 2012 | [paper] | [website] |
| CoNLL-2013 | 2013 | [paper] | [website] |
| CoNLL-2014 | 2014 | [paper] | [website] [system outputs] |
| BEA-2019 | 2019 | [paper] | [website] [system outpus] |
| Name | Year | Paper | Note |
|---|---|---|---|
| UnifiedGEC | 2025 | UnifiedGEC: Integrating Grammatical Error Correction Approaches for Multi-languages with a Unified Framework | [code] |
| gec-metrics | 2025 | gec-metrics: A Unified Library for Grammatical Error Correction Evaluation | [code] |
| Name | Year | Paper | Note |
|---|---|---|---|
| PIE-synthetic | 2019 | [Parallel Iterative Edit Models for Local Sequence Transduction] | [download] |
| OmniGEC | 2025 | Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction | [HF datasets], [code]. Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Slovene, Swedish, and Ukrainian |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Re-rank the CoNLL14 systems by human evaluation | 2015 | Human Evaluation of Grammatical Error Correction Systems | [code] |
| Reassess M^2, I-measure, GLEU by comparing human evaluation | 2018 | [A Reassessment of Reference-Based Grammatical Error Correction Metrics] | [code] |
| MAEGE | 2018 | Automatic Metric Validation for Grammatical Error Correction | [code] |
| SEEDA | 2024 | Revisiting Meta-evaluation for Grammatical Error Correction | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2022 | Proficiency Matters Quality Estimation in Grammatical Error Correction |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| LSTM tagger for word coice task | 2019 | [Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems] | [code] |
| PIE | 2019 | [Parallel Iterative Edit Models for Local Sequence Transduction] | [code] |
| LaserTagger | 2019 | [Encode, Tag, Realize: High-Precision Text Editing] | [code] |
| GECToR | 2020 | [GECToR – Grammatical Error Correction: Tag, Not Rewrite] | [code] |
| Seq2Edits | 2020 | [Seq2Edits: Sequence Transduction Using Span-level Edit Operations] | [code] |
| GAN-like sequence labeling | 2021 | [Grammatical Error Correction as GAN-like Sequence Labeling] | |
| GECToR Large | 2022 | Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction | [code] [Author's Master Thesis] |
| 2021 | Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction | [code] | |
| 2022 | Type-Driven Multi-Turn Corrections for Grammatical Error Correction | [code] | |
| 2023 | An Extended Sequence Tagging Vocabulary for Grammatical Error Correction | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Use MENT | 2014 | System Combination for Grammatical Error Correction | |
| 2016 | Grammatical Error Correction: Machine Translation and Classifiers | ||
| 2019 | [Learning to combine Grammatical Error Corrections] | [code] | |
| Diversity-Driven Combination (DDC) | 2021 | [Diversity-Driven Combination for Grammatical Error Correction] | [code] |
| Select a system for each error type with IP | 2021 | [System Combination for Grammatical Error Correction Based on Integer Programming] | [code] |
| 2022 | Frustratingly Easy System Combination for Grammatical Error Correction | [code] | |
| EditScorer | 2022 | Improved grammatical error correction by ranking elementary edits | [code] |
| GRECO | 2023 | System Combination via Quality Estimation for Grammatical Error Correction | [code] |
| 2024 | Improving Grammatical Error Correction by Correction Acceptability Discrimination |
This includes methods such as decoding techniques and approaches that modify the loss function while keeping the model architecture unchanged.
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| A Self-Refinement Strategy for Noise Reduction | 2020 | [A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction] | |
| cLang8 (Cleaned Lang-8) | 2021 | [A Simple Recipe for Multilingual Grammatical Error Correction] | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2019 | AUTOMATIC GRAMMATICAL ERROR DETECTION OF NON-NATIVE SPOKEN LEARNER ENGLISH | ||
| 2020 | Grammatical error detection in transcriptions of spoken English | ||
| Disfluency detection (DD) model | 2020 | Spoken Language ‘Grammatical Error Correction’ | |
| 2022 | On Assessing and Developing Spoken ’Grammatical Error Correction’ Systems |
| Name | Year | Paper | Note |
|---|---|---|---|
| GECko++ | [GECko+: a Grammatical and Discourse Error Correction Tool] | [website] [code] An English assiting tool. Correction grammatical error and re-ordering sentences automatically. |
|
| MiSS | 2021 | [MiSS: An Assistant for Multi-Style Simultaneous Translation] | [website] [demo video] |
| ALLECS | 2023 | ALLECS: A Lightweight Language Error Correction System | [website] [code] |
| 2023 | Doolittle: Benchmarks and Corpora for Academic Writing Formalization | [code] | |
| 2025 | Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore | [code] |
| Name | Website |
|---|---|
| GramFormer | [GitHub] |
| Name | Code | Note |
|---|---|---|
| Lang8-NAIST-extractor | [code] | Scripts for extracting error-correct pairs from the Lang-8 Corpus. |
| M2Converter | [code] | Scripts for converting m2 file into source file and target file. |
| EFCamDat-Preprocess | [code] |
| Name | Paper | Note |
|---|---|---|
| NLP-progress | [website] The performance ranking on some datasets. |
|
| A Crash Course in Automatic Grammatical Error Correction | [paper] | [materials] The tutorial about GEC in COLING2020. |
| Chunngai/gec-papers | [github] The papers are being compiled around 2019-2020? |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2014 | [Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages] | ||
| English grammar checker with feedback in Japanese | 2018 | [Grammatical Error Checker for Japanese Learners of English] | This is not a research as a feedback comment generation, but I classify it here for now |
| 2019 | [Toward a Task of Feedback Comment Generation for Writing Learning] | ||
| 2020 | [Creating Corpora for Research in Feedback Comment Generation] | ||
| 2021 | [Shared Task on Feedback Comment Generation for Language Learners] | ||
| 2023 | Template-guided Grammatical Error Feedback Comment Generation |
- Studies to explain the reasons for and intentions of error correction.
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| EXPECT | 2023 | Enhancing Grammatical Error Correction Systems with Explanations | [code] |
| XGEC dataset | 2024 | Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction | [data] |
| GEE | 2024 | GEE! Grammar Error Explanation with Large Language Models | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| TETRA | 2024 | Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Arabic Learner Corpus | 2013 | [Arabic Learner Corpus v1: A New Resource for Arabic Language Research] | [website] |
| QALB | 2014 | [Large Scale Arabic Error Annotation: Guidelines and Framework] | [QALB Project Website] |
| QALB 2014 Shared Task | 2014 | [The First QALB Shared Task on Automatic Text Correction for Arabic] | [website] |
| QALB 2015 Shared Task | 2015 | [The Second QALB Shared Task on Automatic Text Correction for Arabic] | |
| ARETA | 2021 | [Automatic Error Type Annotation for Arabic] | [code] |
| 2023 | Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation | [code] | |
| 2023 | Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction | [[code]] | |
| 2025 | ARWI: Arabic Write and Improve | [website] | |
| 2025 | Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| AKCES-GEC dataset | 2019 | [Grammatical Error Correction in Low-Resource Scenarios] | [data] |
| Grammar Error Correction Corpus for Czech (GECCC) | 2022 | Czech Grammar Error Correction with a Large and Diverse Corpus | [data] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2025 | Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian | [data] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2024 | Correcting Challenging Finnish Learner Texts With Claude, GPT-3.5 and GPT-4 Large Language Models | ||
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Greek Learner Corpus | 2018 | [Stand-off annotation in learner corpora: compiling the Greek Learner Corpus (GLC)] | |
| ELERRANT | 2021 | [ELERRANT: Automatic Grammatical Error Type Classification for Greek] | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Falko-MERLIN dataset | 2018 | [Using Wikipedia Edits in Low Resource Grammatical Error Correction] | [data] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2014 | [Detection and correction of non word spelling errors in Hindi language] | ||
| HiWikiEd dataset | 2020 | [Generating Inflectional Errors for Grammatical Error Correction in Hindi] | [data] |
| Hi-GEC | 2025 | Hi-GEC: Hindi Grammar Error Correction in Low Resource Scenario | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Byte-level approach | 2023 | Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| Character-level RNN-based seq2seq | 2018 | [Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation] | |
| Constructing retrieval system for Japanese GEC | 2019 | [Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language] | |
| TMU Evaluation Corpus for Japanese Learners | 2020 | [Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language] | [data: Fill this form] |
| Non-Autoregressive approach | 2020 | [Non-Autoregressive Grammatical Error Correction Toward a Writing Support System] | |
| 2022 | Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2022 | Towards Lithuanian grammatical error correction | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2020 | [Neural Grammatical Error Correction for Romanian] | [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| RULEC-GEC dataset | 2019 | [Grammar Error Correction in Morphologically Rich Languages: The Case of Russian] | [data] |
| RU-Lang8 dataset | 2021 | [New Dataset and Strong Baselines for the Grammatical Error Correction of Russian] | [data] |
| Additional annotations for RULEC and RU-Lang8 | 2024 | Multi-Reference Benchmarks for Russian Grammatical Error Correction | [RULEC] [RU-Lang8] |
| 2024 | Universal Dependencies for Learner Russian | [code] | |
| 2025 | Grammatical Error Correction via Sequence Tagging for Russian | [code] | |
| LORuGEC | 2025 | LLMs in alliance with Edit-based models: advancing In-Context Learning for Grammatical Error Correction by Specific Example Selection | [data] [code] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| COWS-L2H | 2020 | [Developing NLP Tools with a New Corpus of Learner Spanish] | [data] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| 2024 | Evaluation of Really Good Grammatical Error Correction | code |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| ERRANT-TR | 2023 | Towards Automatic Grammatical Error Type Classification for Turkish | [code] |
| GECTurk WEB | 2025 | GECTurk WEB: An Explainable Online Platform for Turkish Grammatical Error Detection and Correction | [website] |
| Keywords / Overview | Year | Paper | Note |
|---|---|---|---|
| UA-GEC | 2023 | [UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language] | [data] |
| UNLP 2023 Shared Task | 2023 | The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian | |
| 2023 | Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction | UNLP-2023: Pravopysnyk | |
| 2023 | A Low-Resource Approach to the Grammatical Error Correction of Ukrainian | UNLP-2023: QC-NLP | |
| 2023 | RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans | UNLP-2023: WebSpellChecker |