SimplifyUR

This repository contains code, dataset and models for Urdu text simplification as described in paper SimplifyUR: Unsupervised Lexical Text Simplification for Urdu.

Requirement(s)

The source is available as a Jupyter notebook for a Python 3 kernel. Please see requirements.txt for details.

Model(s)

Pre-trained models including Word2Vec, Parts of Speech (PoS) tagger and Language Model (LM) are available for download. Download and extract them to root directory, SimplifyUR.

Dataset

A parallel corpus of complex-simplified Urdu sentence-pairs is the Data folder.

Reference(s)

If you use this tool in any of your work, please cite below paper.

SimplifyUR: Unsupervised Lexical Text Simplification for Urdu

@InProceedings{qasmi-EtAl:2020:LREC,
  author    = {Qasmi, Namoos Hayat  and  Zia, Haris Bin  and  Athar, Awais  and  Raza, Agha Ali},
  title     = {SimplifyUR: Unsupervised Lexical Text Simplification for Urdu},
  booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month          = {May},
  year           = {2020},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {3484--3489},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.428}
}

License(s)

Code licensed under the MIT License: http://opensource.org/licenses/MIT. Data licensed under CC-BY 4.0: https://creativecommons.org/licenses/by/4.0/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SimplifyUR

Requirement(s)

Model(s)

Dataset

Reference(s)

License(s)

Files

README.md

Latest commit

History

README.md

File metadata and controls

SimplifyUR

Requirement(s)

Model(s)

Dataset

Reference(s)

License(s)