This is a Ruby script to scrape products from https://www.coldwellbankerhomes.com.
Scraped states, regions, products data then saved to output/
directory in
files with CSV format.
- Static HTML (DOM) parsing for links/general info
- Semantic annotation recognizing in product/residence Microformat for parsing estate-specific data embedded in the product pages
- Service Object Pattern which provide one public method -
#call
- Ruby executable script
- All required gems installed with
Bundler
curl
support with Curb for getting pages HTML- Nokogiri for HTML parsing with XPath and CSS selector support.
- CSV export via CSV Ruby class
- Logging via Logger Ruby class
- Code style is provided via RuboCop
- Ruby code quality reporter via RubyCritic
- System: Linux, Mac
- Git
- Ruby version manager (
rbenv
orRVM
) - Ruby 2.5.0
Bundler
- Gems installed via Bundler Gemfile
Clone with SSH:
$ git clone git@github.com:alex-petr/coldwell_banker_scraper.git
Or clone with HTTPS:
$ git clone https://github.com/alex-petr/coldwell_banker_scraper.git
$ cd coldwell_banker_scraper/ && brew install rbenv
$ rbenv install 2.5.0
$ gem install bundler && bundle
No test suite is available. To ensure that this scraper works run it and check
output in terminal and output/
directory for CSV files.
$ bin/scraper
After running script will generate a bunch of CSV files inside output/
directory.