MILPaC: A Novel Benchmark for Evaluating Translation of Legal Text to Indian Languages

The MILPaC Dataset

MILPaC (Multilingual Indian Legal Parallel Corpus) is the first parallel corpus of legal text in Indian languages, consisting of 3 high-quality datasets (MILPaC-IP, MILPaC-CCI-FAQ, and MILPaC-Acts) compiled from reliable sources of legal information in India. It includes parallel text units in English and 9 Indian languages, covering Indo-Aryan (Hindi, Bengali, Marathi, Punjabi, Gujarati, & Oriya) and Dravidian (Tamil, Telugu, & Malayalam) languages, many of which are low-resource. MILPaC is designed to evaluate the performance of Machine Translation models in translating legal texts from English to Indian languages or between Indian languages.

License

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit CC BY-NC-SA 4.0.

Contact

For any inquiries, feedback, or collaboration opportunities, please contact to {debtanudatta04 [at] gmail [dot] com}.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MILPaC: A Novel Benchmark for Evaluating Translation of Legal Text to Indian Languages

The MILPaC Dataset

License

Contact

About

Releases

Packages

License

Law-AI/MILPaC

Folders and files

Latest commit

History

Repository files navigation

MILPaC: A Novel Benchmark for Evaluating Translation of Legal Text to Indian Languages

The MILPaC Dataset

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages