- Install requirements
- python3
- python packages: pandas, sklearn, fasttext, sqlalchemy, ...
- Configure collector
- Edit hiit_collector.py.example and save it as hiit_collector.py
- Configure PostgreSQL
- Edit postgre_keys.py.example and save it as postgre_keys.py
- Get the data
- FastText model for Finnish trained by Facebook using Finnish Wikipedia: Facebook's trained models
usage:
`collector.py [-h] [--user USER] [--password PASSWORD] [--hostname HOSTNAME] [--outdir OUTDIR] [--startdate STARTDATE] [--enddate ENDDATE]
optional arguments: -h, --help show this help message and exit --user USER Username --password PASSWORD Password --hostname HOSTNAME Hostname --outdir OUTDIR Directory to store data --startdate STARTDATE Startdate as YYYY-MM-DD --enddate ENDDATE Enddate as YYYY-MM-DD`
Example:
./collector.py --startdate 2017-03-01 --enddate 2017-03-15
Example:
./predict.py --inputdir data/incoming --outdir data/output/ --featurename bow --featurefile data/models/feature_extractor_bow.pkl --predictor data/models/fasttext_svm.pkl
Example:
./predict.py --inputdir data/incoming --outdir data/output/ --featurename bow --featurefile data/models/feature_extractor_bow.pkl --predictor data/models/bow_svm.pkl
Example:
./sync.py --inputdir data/output/
- CNN on Embedding Matrix (c.f Willi)
- Stemmings, stop words for BoW
- Study SVM factors (with BoW)
- Mezadona ? To Models
- Plot TSNE manifolds for wikipedia model and twitter model
- Highlight hatewords
- Try Naive Bayes-classifier with BoW
- Naive Bayes (Gaussian) did perform comparable to RF, but worse than SVM
- With FastText it performed poorly