-
Notifications
You must be signed in to change notification settings - Fork 4
Development & build process
This page contains information about the files in the project, how to build it, and how to create an installer package.
Litrl Browser was written in C# (User Interface) and Python 2.7 (Detectors).
Tools needed: Visual Studio Community 2017, PyScripter (IDLE or any Python 2.7 IDE should work as well), pip install of the libraries used (see the Install.cmd batch file)
- Clone the repository; in Visual Studio 2017 Community set the project build type to Debug x64 or Release x64.
- Build and run the software (press "Start") to restore the NuGet packages - you should get a System.ComponentModel.Win32Exception, because you need to do #3 below
- Open up the build output folder (... "x64/Debug/" or "x64/Release/") and run the "install.cmd" script to setup the Python virtual environment that the browser depends on. Note that this process is done with a batch script and is slow; it will take a few minutes.
- Build and run or press "Start" again in Visual Studio; the browser and the detectors should work.
The user interface of the project was built using Visual Studio 2017 Community. There is a .sln solution file in the LITRL folder. The detectors (clickbait, satire, and falsifications) are separate from this Visual Studio project since PyScripter was used to work on them individually. There are PyScripter project files in each detector folder.
After you open the Visual Studio solution, set the Project Build Type to Debug|x64 if you are making frequent changes, or Release|x64 if you are planning on building the installer. Press "Start" and the software should throw an exception. The software will build properly, but the Python virtual environment (using virtualenv) is not setup where the project was built, causing the exception. Open up the build location (.\litrl\vs_solution\LITRL\Decoy\bin\x64\Debug) and run the Install.cmd batch file. This will create the Python environment which the browser depends on. You also need the dependencies (VC++ 2017 redistributable: https://aka.ms/vs/15/release/vc_redist.x64.exe, .NET 4.6.2 (minimum) runtime: https://dotnet.microsoft.com/download/thank-you/net462). You may not need to install these because VS Community will probably install versions of these that will work.
Press "Start" in Visual Studio again and the browser should launch correctly.
- Branch off master when working towards a new version of the software.
- Tag all completed releases starting with "exp-" (experimental) and then the version number.
Litrl version numbers can be interpreted as follows: StableIteration, ExperimentalIteration, Revision, Patch. A stable version of the software is never expected to be released so StableIteration will always be 0. If ExperimentalIteration is incremented, you can assume that important changes were made and that it may not create databases in the same format as previous versions.
Example: 0.10.0.1 means: Unstable, Experimental Iteration 10, no revisions, one patch.
NSIS was used to create the installer, with Zip2Exe. Zip2Exe was used with a custom Modern.nsh config file found under the "working_installer_info" folder in our repository. Copy that into the config folder for Zip2Exe (normally C:\Program Files (x86)\NSIS\Contrib\zip2exe).
To build a new installer, set the Build Configuration in Visual Studio to either Debug or Release x64, build the project, run "clean.cmd" in the build output folder to remove junk files, and then ZIP everything in the output folder (make sure you EXCLUDE the PyDeps folder). Open Zip2Exe, check off the settings that are found in the "working_installer_info" folder in the repository, and browse to the location of the ZIP you just made of the build files. Automating this process more is an open issue.
The installer name should always be: "Litrl Browser Experimental"
- "clickbaitdetector" - the python code/training data for the clickbait detector
- "falsificationsdetector" - the python code/training data for the falsifications detector
- "satiredetector" - the python code/training data for the satire detector
- "vs_lib_licenses" - the licenses of all the libraries used in the project
- "vs_solution" - the visual studio project and C# code for the user interface of the browser
- "working_installer_info" - the script for NSIS and a screenshot of the correct settings for Zip2exe in NSIS
- clickbait.py - Run this script to train the clickbait detector and produce a new .dill file, which is a serialized copy of the clickbaitDetector class. The .dill file is used to avoid training the detector each time the browser starts.
- clickbaitml.py - The features and training code for the clickbait detector.
- featureunittester.py - This was an attempt at having unit tests to verify each feature in the detector, but it fell out of use early in development. It would be good to update this again.
- nvsclickbait.py - nvs means "News Verification Suite" - this script is started by the browser C# user interface as a new Python process and it loads the clickbait_detector.dill file generated by clickbait.py
- traininghandler.py - An admittedly messy approach to working with the training sets which are in separate formats - one of them was changed frequently as the clickbait detector was written and this file was used to quickly read in the headlines.
- CB_Headlines.sta - a very short list of clickbait headlines from Yimin Chen and a qualitative study
- ganguly-stop-clickbait - from [CLICKBAIT DATASET 1] Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly. "Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media”. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Fransisco, US, August 2016. (URL: https://github.com/bhargaviparanjape/clickbait)
- .jsonl files - from [CLICKBAIT DATASET 2] Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen, and Benno Stein. Crowdsourcing a Large Corpus of Clickbait on Twitter. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pages 1498–1507, August 2018. The COLING 2018 Organizing Committee. (URL: https://webis.de/data/webis-clickbait-17.html)
- falsifications.py - Run this script to train the falsifications detector and produce a new .dill file, which is a serialized copy of the falsificationsDetector class. The .dill file is used to avoid training the detector each time the browser starts.
- falsificationsml.py - The features and training code for the falsifications detector.
- nvsfalsifications.py - nvs means "News Verification Suite" - this script is started by the browser C# user interface as a new Python process and it loads the falsifications_detector.dill file generated by falsifications.py
- Fake - from Tolu Asubiaro's falsified news dataset
- Legit - from Tolu Asubiaro's falsified news dataset
- swearing.txt - from
- satire.py - Run this script to train the falsifications detector and produce a new .dill file, which is a serialized copy of the satireDetector class. The .dill file is used to avoid training the detector each time the browser starts.
- satiresml.py - The features and training code for the falsifications detector.
- nvssatire.py - nvs means "News Verification Suite" - this script is started by the browser C# user interface as a new Python process and it loads the satire_detector.dill file generated by satire.py
- This is the user interface of the browser.
Files:
- CefFrame.cs - the Blink component that is used in the browser
- clean.cmd - removes junk files that are included in the build with the Blink component, need to run this after building the project if you want to create a new installer
- ClickbaitFrame.cs - the table that displays all of the clickbait entries in the browser
- FrmAbout.cs - the about box for the browser
- FrmCBEntryUserScore.cs - this is a dialog that lets users of the browser program type in their scores for clickbait headlines
- FrmColorSettings.cs - this lets you change the highlight colors for clickbait, satire, and falsifications
- FrmErrors.cs - This is a window that appears to display error output of any of the detectors, in case anything goes wrong
- FrmMain.cs - The main browser window.
- FrmNewDataset.cs - This lets you create a new Sqlite database. The button to open this is on the "Statistics" window. In the Analysis tab on the main window, you can save detector results to a selected Sqlite database.
- FrmSelectHtmlTags.cs - This dialog allows you to select the inner text of HTML tags that will be processed by the detectors
- FrmSelectWebsites.cs - This is a small window that is part of the FrmStatistics window which lets you pick the website you want to see detector results from
- FrmStatistics.cs - A window that allows you to perform basic statistics on the results from a dataset and website of your choosing (through FrmSelectWebsites.cs)
- homepage.path - a file that contains the URL of your homepage
- install.cmd - the batch script executed by the NSIS installer to build the Python virtual environment
- PopupHandler.cs - This gives you a "Yes"/"No" popup if you open a link that is meant to be in a new tab or window, since the browser does not support those features.
- py.path - the path of the bundled Python 2.7 install, normally C:/Python27. This path may change if you install on a network drive and you can change the location of the Python path in this file if this is the case.
The original set by the LiT.RL lab is found under "data".