SearchSECO-Crawler

The Crawler, a key component of the SearchSECO project, is designed for identifying relevant projects for further processing. It operates by exploring code repositories and returning URLs, which are subsequently processed by the Spider. In addition, the Crawler retrieves metadata related to each project, such as the project owner's name and email.

Functionality

The primary responsibilities of the Crawler are:

Identifying Repositories: The Crawler finds repositories by querying GitHub, sorted by the number of stars, and excluding forks. It collects a specified number of repositories per page, and continues this process over a certain number of pages.
Crawling Repositories: For each found repository, the Crawler fetches the repository's URL, its importance (currently determined by the star count), and a unique ID. The Crawler also fetches the programming languages used in each repository, to better facilitate the process of categorizing and processing projects.
Fetching Project Metadata: For each repository, the Crawler extracts valuable metadata including the project's ID, last updated time, latest commit hash, license, name, URL, owner's username, owner's email (if publicly available), and default branch.

Usage

The Crawler primarily serves as a submodule within the Miner. For detailed instructions on integrating and utilizing the Crawler in your system, please refer to the SearchSECO Miner documentation.

License

This project is licensed under the MIT license. See LICENSE for more info.

This program has been developed by students from the bachelor Computer Science at Utrecht University within the Software Project course. © Copyright Utrecht University (Department of Information and Computing Sciences)

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
cpp		cpp
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
jest.config.ts		jest.config.ts
main.ts		main.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SearchSECO-Crawler

Functionality

Usage

License

About

Releases

Packages

Contributors 5

Languages

License

SecureSECO/searchSECO-crawler

Folders and files

Latest commit

History

Repository files navigation

SearchSECO-Crawler

Functionality

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages