Rent House Crawler

Description

This is a distributed web crawler using Scrapy, Redis, and Selenium. It is designed to handle various types of websites, including static, AJAX, and dynamic pages. By leveraging a distributed setup with docker compose, the system can be deployed across multiple machines to enhance crawling speed and efficiency.

Note:

ddroom -> Ajax
housefun -> dynamic (with selenium)
rakuya -> static

Architecture Overview:

The architecture features a central queue managed by Redis, which distributes tasks to multiple Scrapy crawlers. The crawlers process the tasks and store the collected data in MongoDB.

Prerequest

There is no MongoDB container in the docker-compose. You need to rewrite the docker-compose or set up MongoDB locally.

Setup MonogoDB locally or modify the docker-compose.
Adjust the environment variable to make the project find your MongoDB database.

Install

There are two ways to set up.

local set up
1. Pip install
```
pip install -r requirements.txt
```
2. Push url to the redis
3. scrapy crawl [ddroom/housefun/rakuya]
Adopt the docker-compose
1. Build the docker image
```
docker build -t scrapy_rent_crawler .
```
2. Run the docker compose
```
docker compose up -d
```
  if want to debug, remove the -d flag.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
RentHouseWebCrawler		RentHouseWebCrawler
static		static
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
RentHouseInfo.json		RentHouseInfo.json
docker-compose.yml		docker-compose.yml
master_node.py		master_node.py
requirements.txt		requirements.txt
run_spiders.py		run_spiders.py
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rent House Crawler

Description

Architecture Overview:

Prerequest

Install

About

Releases

Packages

Languages

JeffBla/rent-house-crawler

Folders and files

Latest commit

History

Repository files navigation

Rent House Crawler

Description

Architecture Overview:

Prerequest

Install

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages