First remeber to install github modue which can be installed easily using below command:
pip install PyGithub
To get data I used Github repositories. For this I used Github api, to get data you need to run bellow command:
python data_collector.py
It will collects python repostiroes in last 3 days which have size less that 5MB. You can change it as you wish.
By the way, remember that you need to get your own token. For this got to this link. After getting your token make a token.txt file next to data_collector and put your token in that file.
Preprocess stage includes tokenizing and creating the datafile. Use the files in preprocess folder to do the preprocess.
Use trainer.py in model directory to train your model. This can be easily done using this command:
python trainer,py
After this the model will be save in GPyT directory for next stages.
An example of model check is available in model directory.
Here is an example output:
Made By Amirhossein Abaskohi
