This repo consist of a preprocess Swahili Masakhane NER data to support training in the spacy pipeline.
The script used in this repository can be applied to preprocess any other kind of Masakhane NER data apart from swahili.
There few things you should be familiar with to use these scripts
, for now you might to change manually the path to load raw data and also where to store preprocessed one in the clean_ner_data.py
file.
When you're done, just run the script and it will immediately preprocess the data for you
python3 clean_ner_data.py
In case you're experiencing issue with anything to do with script and ner data, please raise an issue so as we can quickly fix it.
Contributions are very much welcomed, from typo to code to documentation to examples, JUST FORK IT
.
Did you find this repo useful, give it a star so as more people can find it.
All the Credits to