Rent House Crawler

Description

This is a distributed web crawler using Scrapy, Redis, and Selenium. It is designed to handle various types of websites, including static, AJAX, and dynamic pages. By leveraging a distributed setup with docker compose, the system can be deployed across multiple machines to enhance crawling speed and efficiency.

Note:

ddroom -> Ajax
housefun -> dynamic (with selenium)
rakuya -> static

Architecture Overview:

The architecture features a central queue managed by Redis, which distributes tasks to multiple Scrapy crawlers. The crawlers process the tasks and store the collected data in MongoDB.

Prerequest

There is no MongoDB container in the docker-compose. You need to rewrite the docker-compose or set up MongoDB locally.

Setup MonogoDB locally or modify the docker-compose.
Adjust the environment variable to make the project find your MongoDB database.

Install

There are two ways to set up.

local set up
1. Pip install
```
pip install -r requirements.txt
```
2. Push url to the redis
3. scrapy crawl [ddroom/housefun/rakuya]
Adopt the docker-compose
1. Build the docker image
```
docker build -t scrapy_rent_crawler .
```
2. Run the docker compose
```
docker compose up -d
```
  if want to debug, remove the -d flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Rent House Crawler

Description

Architecture Overview:

Prerequest

Install

Files

README.md

Latest commit

History

README.md

File metadata and controls

Rent House Crawler

Description

Architecture Overview:

Prerequest

Install