Skip to content

Natural Language Processing (CSE4022) Text Summarisation Project

Notifications You must be signed in to change notification settings

sanjitk7/textSummarisationGithubIssues

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Summarization of GitHub Issues

Note: This core program is originally from this medium article's tutorial to get started on GRU RNNs in NLP.

What is it?

Using a dataset of GitHub Issues' titles, bodies and URLs, a Sequence to Sequence model is constructed with GRUs to summarize the GitHub issue body. The machine generated title is better and more compact than the actual user defined title.

Using approximate nearest neighbors search it also finds out the most closely related GitHub Issues by Euclidean distance. The Spotify ANNOY package is used for this purpose.

The model's BLEU score is also obtained.

Note: Training the dataset for this model is computationally expensive owing to the large size of the dataset being over 8M entries.

Architecture

Overall

Layers

Text Summarisation of Github Issues with NLP

GitHub Issues are known to be excessively long and complicated. It would be a great help to the community, if the issues could be summarized into a precise single line description using Natural Language Processing.

Dataset

The dataset can be found here

About

Natural Language Processing (CSE4022) Text Summarisation Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published