Skip to content

A set of crawler/scraper that feeds the flatcrawl application with data.

License

Notifications You must be signed in to change notification settings

floschnell/flatcrawl-crawlers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

flatcrawl-crawlers

This repository is part of my flatcrawl project. It contains a Rust implementation of crawlers/scrapers for different real estate websites. It will scan those websites in scheduled cycles and extract information on new flats. Those new flats are then parsed into a consistent layout and sent away for further processing.

I chose Rust for this project, because I wanted to learn the language and also it seemed to be a good fit because of its capabilities like speed and thread safetiness.

The flatcrawl project

The purpose of the project is to collect flats from different rental sites and expose them in a consistent shape. Eventually it lets users define custom searches and provides them with instant updates on new matching flats.

Clarification: flats are not stored on the server. The purpose is not to create a competing portal, but to extend usability and help users find the right flat quickly by receiving updates from several sites without the hassle to setup and maintain different searches.

Infrastructure

Flats that are found by this tool and its set of crawlers will be transmitted via AMQP to a message broker (in my case RabbitMQ) to be picked up by different processors. Those processors can be found in their own repository and can be anything from email notifications to instant messaging bots. Currently there's only an implementation of a telegram bot, but you could imagine all kinds of different services that will listen to the queue and push new flats to interested users.

Setup & Requirements

The application can be setup easily, all you will have to do is to copy the config.sample.toml to a file called config.toml. Now you can edit the settings within the file. The thread_count will specify how many threads will be used for the different crawlers and indirectly how many TCP connections will be created in parallel. The amqp section defines the endpoint where the message broker can be found. I simply ran an existing docker image on my domain with some PLAIN authetication.

To actually run the application on your machine, you will need to compile it first. Installing Rust is quite easy, find the instructions on their website.

Run

Once Rust is installed and the program is configured via the config.toml, you can start it up via

cargo run

On the first run it will download and compile all the dependencies as well. This might take up to a few minutes even.

About

A set of crawler/scraper that feeds the flatcrawl application with data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published