Python web crawler

Recursive WebCrawler, Feed it one URL and the crawler will return all the websites that are related to that address at a chosen depth

Kubernetes Crawler

K8s Flavor of web crawler is custom built to serve as a native k8s app, its provide scalability, stability and high performance.

Installation prerequisites

Apt packages: setfacl, git
Kubectl cli
Helm cli
Single/multi node K8s cluster (tested on Kubeadm)
Root access (Duh :))

How to install

Clone this repo using

git clone https://github.com/bluedotiya/web_crawler.git

Change to the new git directory

cd web_crawler

Run bash install & wait for installation to complete

bash installer.sh -o install

Installation complete you should be able to access your neo4j DB.

Example: Deployment Done you can connect Neo4j Browser on: http://<YOUR_K8S_NODE_IP_HERE>:30074
Example: Database Port is: 30087

How to use

To init a search run the following query (you can replace url & depth values to your own)

curl -X POST http://<YOUR_K8S_NODE_IP_HERE>:30080 -H 'Content-Type: application/json' -d '{"url":"https://www.google.com","depth":2}'

You can now see your data from the native neo4j browser or your favorite Neo4j DB Viewer app

Recommendation

Use Neo4j Desktop app along side GraphXR for the best graph viewing and search experience

GraphXR Visualization:

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
feeder		feeder
legacy_crawler		legacy_crawler
manager		manager
neo4j/k8s		neo4j/k8s
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
installer.sh		installer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python web crawler

Kubernetes Crawler

Installation prerequisites

How to install

How to use

Recommendation

About

Uh oh!

Packages

Uh oh!

Languages

License

bluedotiya/web_crawler

Folders and files

Latest commit

History

Repository files navigation

Python web crawler

Kubernetes Crawler

Installation prerequisites

How to install

How to use

Recommendation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Languages

Packages