Credits :
- Yudistira Dwi Cahya
- Wulan Akhsah
- Kamal Muftie Yafi
- Rachel Thyffani Margaretha S
- Vesya Padmadewi
- Dataset: Hoax dataset obtained from MAFINDO (Masyarakat Anti Fitnah Indonesia);
- Slang: Modified Kamus Alay based on Kamus Alay (Colloquial Indonesian Lexicon);
- Feature Extraction:
Bag-of-Words
,TF-IDF
; - Classifier:
Naive Bayes
,SVM
,Logistic Regression
,Decision Tree
,kNN
,ANN
. - Cross-Validation:
GridSearch
,RandomSearch
This dataset contained two label values, namely "1" for hoax and "0" for not hoax. The total data in this dataset is 4,701. Each label has a varied amount of data distribution, including 3850 data for hoax and 851 data for not hoax.
Label | Hoax | Not Hoax |
---|---|---|
Total Data | 3850 | 851 |
- Text cleaning/preprocessing
- Non-standard word replacement
- Feature extraction: BoW, TF-IDF
- Classification: Naive Bayes, SVM, Logistic Regression, Decision Tree, kNN, ANN
- Cross-Validation: Grid Search, Random Search
- Post analysis: topicwizard, Voyant Tools, WordCloud