Scrape This Site - Sandbox

A collection of projects that we'll use to learn web scraping.

Countries of the World: A Simple Example

A single page that lists information about all the countries in the world.

There are following information that can be scraped...
- Country Name
- Country Capital
- Country Population
- Country Area
Hockey Teams: Forms, Searching and Pagination

Browse through a database of NHL team stats since 1990 and build a scraper that handles common website interface components.

There are following information that can be scraped...
- Team Name
- Year
- Wins
- Losses
- OT-Losses
- Win %
- Goals For (GF)
- Goals Against (GA)
- Difference (+ / -)
Oscar Winning Films: AJAX and Javascript

Click through a bunch of great films. Learn how content is added to the page asynchronously with Javascript and how you can scrape it.

There are following information that can be scraped...
- Title
- Nominations
- Awards
- Best Picture
Turtles All the Way Down: Frames & iFrames

Some older sites might still use frames to break up thier pages. Modern ones might be using iFrames to expose data. Learn about turtles as you scrape content inside frames.

There are following information that can be scraped...
- Species Name
- Discription
Spoofing Headers

Sometimes you need to make your web scraper appear to be making an HTTP requests as a browser in order to get the web server to return the same data that you see in your browser.
- returns "Headers properly spoofed, request appears to be coming from a browser :)" in the HTML.

How to Run?

Clone this repository.

git clone https://github.com/VIIVIIIIX/scrape-this-site-sandbox.git

Create a virtual environment.

cd scrape-this-site-sandbox
python3 -m venv .venv

Activate the virtual environment and install necessary libraries.

cd .venv
source ./bin/activate
cd ..
pip install -r requirements.txt

Countries Of the World

cd countries-of-the-world
python3 countries.py

Hockey Teams

cd hockey-teams
python3 hockey-teams.py

Oscar Winning Films

cd oscar-winning-films
python3 oscar.py

Spoofing Headers

cd spoofing-headers
python3 spoofing-headers.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
countries-of-the-world		countries-of-the-world
hockey-teams		hockey-teams
oscar-winning-films		oscar-winning-films
spoofing-headers		spoofing-headers
turtles-iframes		turtles-iframes
README.md		README.md
requirements.txt		requirements.txt