Skip to content

Neurotech-HQ/common-swahili-slangs-typos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

common-swahili-slangs-typos

Common Swahili stopwords, slangs and typos dataset is output of research paper Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words by Bernard Masua and Noel Masasi from University of Dar-es-Salaam.

Dataset

* **Stopwords**: `stopwords.csv`
* **Slangs**: `slangs.csv`
* **Typos**: `typos.csv`

Usage

Swahili stopwords, slangs and typos are all available in CSV format, the slangs and typos can be quite useful for text pre-processing as it help to reduce the variety of words during vectorization.

About

Common Swahili stopwords, slangs and typos dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published