Issue and Project Recommendation System for a GitHub Newcomer

The project is a content-based filtering approach for suggesting tasks and projects to GitHub newcomers.

2019.12.03 Update - Preliminary Results

Update cosine similarity 6.1 and 8.5 to calculate recommendation scores using known commit authors to validate

Commits used for validation seems off, TODO: recalculate

2019.12.02 Update

Add 6.1. Cosine Similarity (including VSM) to recommend issues for users.

2019.11.26 Update

Add 1.4. User_Textual_Data_Extraction.py to extract users' textual history records.

Add 5.3. User_TF-IDF.py to apply TF-IDF for users. See results

Add 5.4.1, 5.4.2 Build profiles for users and issues. See results

2019.11.24 Update

Fix preprocessing for issues_text; escape characters were being removed and fixed in commit

TF-IDF results for title, body, and title-body. See results These were calculate separately for weighting purposes

Todo incoporate commit documents

2019.11.18 Update

Simplified the code.

For 1.1., now we also collected "referenced commits" with issues.

Update K-Means, Decision Tree and Random Forest.

Add TF-IDF analysis for issues.

Data	Description
`all_issues_REPO-NAME.json`	list of repository issues (also includs it's pr, commits) that are bug fixes and/or "easy pick"
`users_REPO-NAME.json`	list of users from issues that are bug fixes and/or "easy pick" (THIS FILE MAY BE TOO HUGE TO OPEN)
`users_REPO-NAME_filtered.json`	filtered user json file. include different ages with same user.
`data_users_REPO-NAME_ready_to_analysis.csv`	csv format file of `users_REPO-NAME_filtered.json`
`data_users_cluster_with_results.csv`	K-Means result table
`issue_text_REPO-NAME.json`	textual content of each issue

2019.11.17 Update

Simplified the code.

For 1.1., now only concern users who submit pr and commits whth related the issues with "Easy Pick" label

For 1.1., now collect all issue data and its related pr and commits data in order to save needed time when future usage.

For 1.2., now collect users' whole data in order to save needed time when future usage.

For 1.3., simplified the user data extraction process.

For 2.1. and 2.1.2., modified the code to fit latest version of data files.

2019.11.12 Update

For 1.1., modify the process logic in order to reduce the time needed.

For 2.1., add column "newcomer" in order to verify the newcomer.

For 3.1., modify KMeans in order to get more accuracy clustering result.

For 3.2., add "Silhouette Analysis" to determine the number of clusters.

For 4.1., move Decision Tree to this file.

For 4.2., move Random Forest to this file.

Next:

Finalising how many cluster we need to use.

Starting issue classify.

2019.11.05 Update

Param/Model outpts from Nov 4 RESULTS.md

2019.11.04 Update

Rewrite data extraction and user extration in order to get more data and increase predict precision.

Added "User Classification" file to predict newcomer.

Saved "User Decision Tree Model" and "User Random Forest Model" files for future usage.

Symfony data set, MSR 14 https://github.com/symfony/symfony

2019.10.31 Update

Rewrite data extraction (in order to get more data)

Next:

Getting user data and train user model to determine what charateristics that newcomers should have.

2019.10.25 Update

Create Python 3.7 environment for data analysis and process.

Filter the features which may be useful.

Dataset: MSR 2014

The IDE I use: PyCharm

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.idea		.idea
.vscode		.vscode
python37		python37
.gitignore		.gitignore
PROPOSAL.md		PROPOSAL.md
README.md		README.md
RESULTS.md		RESULTS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Issue and Project Recommendation System for a GitHub Newcomer

2019.12.03 Update - Preliminary Results

2019.12.02 Update

2019.11.26 Update

2019.11.24 Update

2019.11.18 Update

2019.11.17 Update

2019.11.12 Update

2019.11.05 Update

2019.11.04 Update

2019.10.31 Update

2019.10.25 Update

License

About

Releases

Packages

Contributors 2

Languages

lancepokaiwang/GitHub-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

Issue and Project Recommendation System for a GitHub Newcomer

2019.12.03 Update - Preliminary Results

2019.12.02 Update

2019.11.26 Update

2019.11.24 Update

2019.11.18 Update

2019.11.17 Update

2019.11.12 Update

2019.11.05 Update

2019.11.04 Update

2019.10.31 Update

2019.10.25 Update

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages