A Machine Learning powered GitHub App built with Probot to jam the spam PRs on your repo and keep maintainers stress-free (even in Hacktober 🎃)
- We listed links of PRs labelled as ⚠
SPAM
orINVALID
⚠ on some popular repositories especially those that faced a pool of spam pull-requests during the recently concluded Hacktoberfest 🎃 in a.csv
file. - Similarly, we also listed links of ✅
MERGED
PRs on the repositories in a separate.csv
file for Ham (not Spam) features. - We used Octokit, an API framework by GitHub to extract Pull Request Information from the PR links and save desired features locally to build our model.
We chose the standard PR attributes and some derived features to train our model
- Standard
- Number of Commits
- Number of Files Changed
- Number of Changes
(Additions + Deletions)
- Derived
-
Number of Files Changed of Documentation Type
# File Extensions considered to be of Doc-Type ['md', 'txt', 'rst', '']
-
Occurences of spam hit-words in text corpus of PR
Text Corpus of a Pull Request includes the PR Title, Body, Commit Messages and Diffs.
All text is pre-processed with regex to exclude any symbols.
-
We are using Keras to build our baseline model. It is essentially a (5-16-16-1) Sequential Neural Network with first three layers being 'RELU' activated and the final output layer activated as a sigmoid function.
The model is run over 500 epochs with a unit batch size.
The model is exported from Python using tensorflowjs
that creates a model.json
and a .bin
file to store the model structure, variables and associated weights.
The model is imported seamlessly into Node.js using @tensorflow/tfjs-node
for predictions to be made for incoming PRs
- For setup instructions to train and export the model, visit jam-spam-ml/README.md
- For setup instructions to build the bot and getting the GitHub App running, head to jam-spam-app/README.md
If you have suggestions for how JamSpam could be improved, or want to report a bug, open an issue! We'd love all and any contributions.
For more, check out the Contributing Guide.
-
If you are a Collaborator, Contributor, Member, or Owner of the repository your pull request will never be flagged.
-
If you are a First Timer, Mannequin or First Time Contributor your pull requests will be checked.
If the pull request is legit, it is not flagged
If the pull request is suspected to be spam, it is marked as spam and closed.
MIT © 2020 MLH Fellowship
Made with ❤️ by Ajwad Shaikh & Vrushti Mody during Sprint 3 of the MLH Fellowship Explorer Batch, Fall 2020.