A library for performing simple web scraping of a search engine's results page for data analysis tasks. (Note: For personal non-commercial use only. Follow all web scraping guidelines, before getting started. Be kind to servers.)
Requies Python version 3.6 or greater.
This library is intended for personal use only to get search results from a search engine for downstream analysis.
git clone https://github.com/meads2/scraper.git
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pwd
python scraper 'my favorite team'
You can use additional flags for various functionality if desired, some default assumptions are assumed.
terms - String value of search terms to pass to scraper engine. (ex. 'Python Tips and Tricks')
--selfie - If present selenium will take a screenshot of the browser search window returned.
--dest (FUTURE) - If specified will save results to defined location
--showme (FUTURE) - If present browser window will open at runtime to see execution, useful for debugging.
--engine (FUTURE) - If specified will use that search engine, defaults to Google. ['Bing' - Microsoft Bing, 'duck' - DuckDuckGo, 'google' - Google, 'Yahoo'-Yahoo]
python scraper 'daily news near me'
### ... running and scraping quietly
### Check your downloads for a surprise!
python scraper 'daily news near me' --selfie
### ... running and scraping quietly
### Check your downloads for a surprise!
python scraper 'daily news near me' --showme
### ... running and scraping right before your eyes
### Check your downloads for a surprise
python scraper 'daily news near me' --dest '../some/location/'
### ... running and scraping quietly to your defined location
### Check your downloads for a surprise!