Github Crawler

Project Description

The crawler implements the GitHub search and returns all the links from the search result.

The input is a JSON object, containing:

search keywords
list of proxies
type

Example:

{
  "keywords": [
    "openstack",
    "nova",
    "css"
  ],
  "proxies": [
    "194.126.37.94:8080",
    "13.78.125.167:8080"
  ],
  "type": "Repositories"
}

The output is a JSON object, containing:

list of links
author and language statistics for repositories

Example:

[
  {
    "url": "https://github.com/atuldjadhav/DropBox-Cloud-Storage",
    "extra": {
      "owner": "atuldjadhav",
      "language_stats": {
        "CSS": 52,
        "JavaScript": 47.2,
        "HTML": 0.8
      }
    }
  }
 ]

Getting Started

Prerequisites

Python 3
requests
beautifulsoup4
pytest

Installation

Download repository and if needed add more input data. Currently there is one test file test_input.JSON for testing purposes.

Running

Github Crawl and Extra Information

Type in the command prompt:

python main.py [input] [output]

where input is the name of the JSON file (just the name)containing the input search terms you want to return the links for, and output is the name of the JSON file you want the resulting links to be saved in.

Testing

There is a pytest module in the tests folder. Change the current working directory to it and type:

pytest -q

for running all tests, or

pytest -q test_module.py

where test_module.py is the specific module you want to test.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tests		tests
README.md		README.md
githubcrawler.py		githubcrawler.py
input.json		input.json
main.py		main.py
requesthandler.py		requesthandler.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Github Crawler

Project Description

Getting Started

Prerequisites

Installation

Running

Github Crawl and Extra Information

Testing

About

Releases

Packages

Languages

kss149/github-crawler

Folders and files

Latest commit

History

Repository files navigation

Github Crawler

Project Description

Getting Started

Prerequisites

Installation

Running

Github Crawl and Extra Information

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages