-
Notifications
You must be signed in to change notification settings - Fork 4
Functionality & Accuracy Levels
The browser takes the text content of selected HTML tags and sends those values to the detectors. The detectors then output scores for that text. The scores are displayed in the right pane of the browser, and content on the current webpage is highlighted based on the settings for detector scores. You can change highlight colors and settings under "Highlight Options" for each detector.
LitrlBrowser.exe takes values from HTML on the current page -> sends to detector Python process (clickbait/satire/falsification) -> which sends classification scores back to LitrlBrowser.exe
The clickbait detector runs to 94% accuracy using a test set of approximately 5670 clickbait/non-clickbait headlines. The train/test set for the clickbait detector combines two well-known clickbait datasets by Potthast et al. (2018) and Chakraborty et al. (2016) (references on README.md/licenses folder).
From one example run:
- SVM Validation Test set score: 0.94
- length of clickbait identified as clickbait (+,good): 5221 / 5670 (92.0811287478%)
- length of legit headlines identified as clickbait (-,bad): 240 / 5671 (4.23205783812%)
- length of clickbait identified as legit headlines (-,bad): 449 / 5670 (7.9188712522%)
- length of legit headlines identified as legit headlines (+,good): 5431 / 5671 (95.7679421619%)
Note that this claim of 94% accuracy is strictly for the test set of 5670 here. In real-world internet use the detector will not perform at this rating and we make no claim of that.
Key files in the "clickbaitdetector" folder include:
- clickbaitml.py (main code for the detector)
- nvsclickbait.py (the script the browser uses for I/O between the detector and the user interface)
- clickbait.py (run this to train an SVM-based model and serialize it as a .dill file) The detector is serialized into the "/pickles/" folder.
The Satire Detector runs to approximately 84% accuracy on our test set consistent with the original version found in Rubin, Conroy, Chen and Cornwell's paper(http://www.aclweb.org/anthology/W16-0802). We do not make any claim about its accuracy in real-world internet use.
The training data is based on a set collected by LiT.RL and is available at http://victoriarubin.fims.uwo.ca/news-verification/data-to-go/.
The Satire Detector is the oldest of the three detectors and a previous version has already been used frequently. Originally, the Satire Detector was hosted on a webpage people could submit text into by FIMS, Western University. The Satire Detector in the Litrl Browser is essentially the same thing, with a couple of features missing and code to detect humor in text disabled due to the long amount of time it took to compute. Features were also re-written to remove a dependency on an external service and replaced with Python code.
From one example run: (80/20 split)
- Satire Stories for Training: 380
- Satire Stories for Testing: 95
- Correct classifications: 82
- Total classifications: 95
- ('SVM Test set score: ', 0.8631578947368421)
Key files in the "satiredetector" folder include:
- satireml.py (main code for the detector)
- nvssatire.py (the script the browser uses for I/O between the detector and the user interface)
- satire.py (run this to train an SVM-based model and serialize it as a .dill file) The detector is serialized into the "/pickles/" folder.
The Falsification detector is based on work by Rubin, Conroy and Asubiaro. This is the newest of the three detectors, and also the least accurate, at 71%. We do not make any claim about its accuracy in real-world internet use.
The training data is based on a set collected by LiT.RL and is available at http://victoriarubin.fims.uwo.ca/news-verification/data-to-go/.
From one example run: (80/20 split)
- Train set Legit Stories Amount: 109
- Train set False Stories Amount: 109
- Test set Legit Stories Amount: 28
- Test set False Stories Amount: 28
- Test set score: 0.71
Key files in the "satiredetector" folder include:
- falsificationml.py (main code for the detector)
- nvsfalsification.py (the script the browser uses for I/O between the detector and the user interface)
- falsification.py (run this to train an SVM-based model and serialize it as a .dill file) The detector is serialized into the "/pickles/" folder.