Note: This core program is originally from this medium article's tutorial to get started on GRU RNNs in NLP.
Using a dataset of GitHub Issues' titles, bodies and URLs, a Sequence to Sequence model is constructed with GRUs to summarize the GitHub issue body. The machine generated title is better and more compact than the actual user defined title.
Using approximate nearest neighbors search it also finds out the most closely related GitHub Issues by Euclidean distance. The Spotify ANNOY package is used for this purpose.
The model's BLEU score is also obtained.
Note: Training the dataset for this model is computationally expensive owing to the large size of the dataset being over 8M entries.
GitHub Issues are known to be excessively long and complicated. It would be a great help to the community, if the issues could be summarized into a precise single line description using Natural Language Processing.
The dataset can be found here