Automatic hate speech detection

Setup

Install requirements

python3
python packages: pandas, sklearn, fasttext, sqlalchemy, ...

Configure collector

Edit hiit_collector.py.example and save it as hiit_collector.py

Configure PostgreSQL

Edit postgre_keys.py.example and save it as postgre_keys.py

Get the data

FastText model for Finnish trained by Facebook using Finnish Wikipedia: Facebook's trained models

Usage:

Collect new data

usage:

`collector.py [-h] [--user USER] [--password PASSWORD] [--hostname HOSTNAME] [--outdir OUTDIR] [--startdate STARTDATE] [--enddate ENDDATE]

optional arguments: -h, --help show this help message and exit --user USER Username --password PASSWORD Password --hostname HOSTNAME Hostname --outdir OUTDIR Directory to store data --startdate STARTDATE Startdate as YYYY-MM-DD --enddate ENDDATE Enddate as YYYY-MM-DD`

Example:

./collector.py --startdate 2017-03-01 --enddate 2017-03-15

Train predictor

Example:

./predict.py --inputdir data/incoming --outdir data/output/ --featurename bow --featurefile data/models/feature_extractor_bow.pkl --predictor data/models/fasttext_svm.pkl

Predict hate speech

Example:

./predict.py --inputdir data/incoming --outdir data/output/ --featurename bow --featurefile data/models/feature_extractor_bow.pkl --predictor data/models/bow_svm.pkl

Sync data

Example:

./sync.py --inputdir data/output/

TODO

CNN on Embedding Matrix (c.f Willi)
Stemmings, stop words for BoW
Study SVM factors (with BoW)
Mezadona ? To Models
Plot TSNE manifolds for wikipedia model and twitter model

Highlight hatewords

DONE:

Try Naive Bayes-classifier with BoW

Naive Bayes (Gaussian) did perform comparable to RF, but worse than SVM
With FastText it performed poorly

LINK

https://ieeexplore.ieee.org/document/9835776

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
benchmarks		benchmarks
confs		confs
libs		libs
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic hate speech detection

Setup

Usage:

Collect new data

Train predictor

Predict hate speech

Sync data

TODO

DONE:

LINK

About

Releases

Packages

Languages

License

farazulhoda/hate-speech-detection

Folders and files

Latest commit

History

Repository files navigation

Automatic hate speech detection

Setup

Usage:

Collect new data

Train predictor

Predict hate speech

Sync data

TODO

DONE:

LINK

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages