Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 470 Bytes

README.md

File metadata and controls

8 lines (6 loc) · 470 Bytes

Text-Normalization-Of-Code-Mix

May 2017
  • Designed a system to efficiently preprocess Impure Code-Mixed text obtained from Social Media
  • Performed data cleaning and preprocessing of text.
  • Identified and converted various Net Lingo (e.g. Abbreviations, Slang words, Intentionally Misspelt words etc.) using a dictionary-based approach and Regex
  • Designed an algorithm for transliteration of Romanized Hindi words to Devanagari script using syllabification