Skip to content

Latest commit

 

History

History
142 lines (100 loc) · 5.83 KB

README.md

File metadata and controls

142 lines (100 loc) · 5.83 KB

Find the Hotel's Average Room Price in Osaka

Showcase visualizations about the hotel's Average Room Price in Osaka.

Average Nightly Room Price for one adult, one room.

Price in USD.

Find the Hotel's Average Room Price in Japan

Showcase visualizations about the hotel's Average Room Price for all prefectures in Japan.

Average Nightly Room Price for one adult, one room.

Price in USD.

Built on top of Find the Hotel's Average Room Price in Osaka project.

Status

CodeQL

Scraper Test

Scrape

Visualizations

Average Room Price in Osaka:

Click here for visualizations of this project.

Average Room Price for all Prefectures in Japan:

Project Details

Find the Hotel's Average Room Price in Osaka Project

  • Collect Osaka hotel property data from Booking.com

  • Data collecting period for Year 2025: 4 Sep 2024—Present

  • Consists of room price from 4 Sep 2024—Present

  • Data was collected daily using GitHub action.

  • Consists of Basic GraphQL and Whole-Month GraphQL scraper.

  • These scrapers can also be used to scrape data from other cities in Japan.

Find the Hotel's Average Room Price in Japan Project

  • Collect Japan hotel property data for all Prefectures from Booking.com

  • Data collecting dates: 23 Aug 2024.

  • Use Japan GraphQL scraper to scrape data.

Collected Data Archive

Click here to access the collected hotel data archive.

How to Scrape Hotel Data

Setup Project

Setup a Database

  • Download Docker Desktop
  • Ensure that Docker Desktop is running.
  • Run: export POSTGRES_DATA_PATH='<your_container_volume_path>' to set the container volume to the directory path of your choice.
  • Run: docker compose up -d

Find the Necessary Headers

  • Run: python get_auth_headers.py
    • It will write the headers to an .env file.

General Guidelines for Using the Scraper

  • To scrape only hotel properties, use --scrape_only_hotel argument.
  • Ensure that Docker Desktop and Postgres container are running.

To scrape using Whole-Month GraphQL Scraper:

  • Example usage, with only required arguments for Whole-Month GraphQL Scraper:
    python main.py --whole_mth --year=2024 --month=12 --city=Osaka
  • Scrape data start from the given day of the month to the end of the same month.
    • Default start day is 1.
    • Start day can be set with --start_day argument.

To scrape using Basic GraphQL Scraper:

  • Example usage, with only required arguments for Basic GraphQL Scraper:
    python main.py --city=Osaka --check_in=2024-12-25 --check_out=2024-12-26 --scraper              

To scrape using Japan GraphQL Scraper:

  • Example usage, with only required arguments for Japan GraphQL Scraper:
    python main.py --japan_hotel
  • Prefecture to scrape can be specified with --prefecture argument, for example:
    • python main.py --japan_hotel --prefecture Tokyo
    • If --prefecture argument is not specified, all prefectures will be scraped.
    • Multiple prefectures can be specified.
      • python main.py --japan_hotel --prefecture Tokyo Osaka
    • You can use the prefecture name on Booking.com as a reference.

If the not match error happened (SystemExit exception), please try running the scraper again.

Scraper's Arguments

Click here for Scraper's arguments details.

Find the missing dates in the database using Missing Date Checker

To ensure that all dates of the month were scraped, a function in check_missing_dates.py will check in the database to find the missing dates.

Made only for the Find the Hotel's Average Room Price in Osaka project which saves scraped data in HotelPrice table.

  • To check in the database, use the following command line as an example, only include required argument:
    python check_missing_dates.py --city=Osaka
  • If there are missing dates, a Basic Scraper will automatically start to scrape those dates.
    • Missing Date Checker shares arguments with Basic Scraper.
    • Arguments parsed to Missing Date Checker should be the same as used with Basic Scraper.
  • Only check the missing dates of the data that was scraped today in UTC time.
  • Only check the months that were scraped and loaded to the database.
  • Year of dates can be specified with --year
    • Default is the current year.

If the not match error happened (SystemExit exception), please try running the Missing Date Checker again.