JamSpam

A Machine Learning powered GitHub App built with Probot to jam the spam PRs on your repo and keep maintainers stress-free (even in Hacktober 🎃)

Summary

Building Dataset

We listed links of PRs labelled as ⚠ SPAM or INVALID ⚠ on some popular repositories especially those that faced a pool of spam pull-requests during the recently concluded Hacktoberfest 🎃 in a .csv file.
Similarly, we also listed links of ✅ MERGED PRs on the repositories in a separate .csv file for Ham (not Spam) features.
We used Octokit, an API framework by GitHub to extract Pull Request Information from the PR links and save desired features locally to build our model.

Feature Extraction

We chose the standard PR attributes and some derived features to train our model

Standard
- Number of Commits
- Number of Files Changed
- Number of Changes (Additions + Deletions)
Derived
- Number of Files Changed of Documentation Type
```
# File Extensions considered to be of Doc-Type 
['md', 'txt', 'rst', '']
```
- Occurences of spam hit-words in text corpus of PR
  
  Text Corpus of a Pull Request includes the PR Title, Body, Commit Messages and Diffs.
  
  All text is pre-processed with regex to exclude any symbols.

Model Design

We are using Keras to build our baseline model. It is essentially a (5-16-16-1) Sequential Neural Network with first three layers being 'RELU' activated and the final output layer activated as a sigmoid function.

The model is run over 500 epochs with a unit batch size.

Transfer Model to Bot

The model is exported from Python using tensorflowjs that creates a model.json and a .bin file to store the model structure, variables and associated weights.

The model is imported seamlessly into Node.js using @tensorflow/tfjs-node for predictions to be made for incoming PRs

Getting Started

For setup instructions to train and export the model, visit jam-spam-ml/README.md
For setup instructions to build the bot and getting the GitHub App running, head to jam-spam-app/README.md

Contributing

If you have suggestions for how JamSpam could be improved, or want to report a bug, open an issue! We'd love all and any contributions.

For more, check out the Contributing Guide.

Screenshots

If you are a Collaborator, Contributor, Member, or Owner of the repository your pull request will never be flagged.
If you are a First Timer, Mannequin or First Time Contributor your pull requests will be checked.

If the pull request is legit, it is not flagged

If the pull request is suspected to be spam, it is marked as spam and closed.

License

Made with ❤️ by Ajwad Shaikh & Vrushti Mody during Sprint 3 of the MLH Fellowship Explorer Batch, Fall 2020.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

JamSpam

Summary

Building Dataset

Feature Extraction

Model Design

Transfer Model to Bot

Getting Started

Contributing

Screenshots

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

JamSpam

Summary

Building Dataset

Feature Extraction

Model Design

Transfer Model to Bot

Getting Started

Contributing

Screenshots

License