Bot which searches real estate websites for mentions of "garage".
It uses a real estate agency network as a starting ground for searching approximately 200 real estate agencies in the Berlin-Brandenburg area of Germany.
Find real estate agencies which offer garages within their portfolio or actual offers of houses/apartments with a garage. It is useful if you are looking to buy, rent or sell a garage or if you want to find a housing offer which includes a garage. Furthermore, it can be used as a lead aggregator in case such information may be of interest for your business.
The bot is designed for going through subpages of the agencies where it can expect the offers or descriptions of the real estate agencies portfolio.
It is then looking for the keyword garage
mentioned in these sites.
The essential information you get is the target_url
entries as here is where the garage
keyword was found.
The following data is being listed in a csv
spreadsheet:
- crawling depth
- referer url
- referred url (which is
target_url
) - domain of referred url
Generally, the bot can easily be modified for other searches. For that, the search keyword or phrase as well as the start hub aggregator can be exchanged for your desired aim.
After the development setup has been established (see below), go to the spiders
directory and run with
scrapy runspider garagecrawler.py
The result will be saved under garagecrawl-result.csv
Required is
- Scrapy: https://github.com/scrapy/scrapy
- tldextract: https://github.com/john-kurkowski/tldextract
pip install scrapy
pip install tldextract
Author: Jonas Dossmann
Distributed under the AGPL-3.0 license.