Skip to content

kevin840720/google-map-spot-crawler

Repository files navigation

google-map-spot-crawler

A crawler designed to extract detailed information about tourist attractions from Google Maps, including ratings, reviews, business hours, and addresses.

Initialize the Database

Set up the necessary .env files in the project directory as described below:

Step 1: Configure Settings

Rename and modify the content of these example files by removing ".example" from the filenames:

  • ./database/.env.example
    • Set PostgreSQL configuration here; this .env is for Docker Compose.
  • ./database/docker-compose.yaml.example
    • Replace {{project_absolute_path}} with your project's absolute path.
  • ./.env.example
    • Set PostgreSQL configuration here; this .env is for Python scripts.

Step 2: Start the PostgreSQL Server

docker compose -f './database/docker-compose.yaml' up

Run the Application

Step 1: Update locations.yaml

Specify the locations for the crawler to target by updating locations.yaml. There are three kinds of location identifiers used to specify locations:

shared_code

Obtain this code from the Google Maps "share" button. After clicking the button, you will receive a URL of the format https://maps.app.goo.gl/c7Z7qpVrx7UujUFp8. Please copy the last identifier from this URL and add it to locations.yaml.

cid

Refers to the customer ID, which is an argument in your URL for Google Maps. If the URL contains the cid argument, use this as the identifier.

data_id

Also an argument in your URL that specifies the location. Include this identifier if available in your URL.

Step 2: Install Environment

Set up your environment and install dependencies using pip for a standard Python environment or pipenv for managing project-specific dependencies.

Using pip:

pip install -r requirements.txt

or using pipenv

pipenv install

Step 3: Run Application

Start the crawler by executing the main script. Make sure to activate the correct environment depending on your installation method.

python src/main.py

or using pipenv

pipenv run python src/main.py

Project Organization

. ├── .env.example - Environment settings for scripts. ├── CHANGELOG.md - Log of all notable changes made to the project. ├── conftest.py - Configuration for pytest. ├── data │ └── - Directory for output data and logs. ├── database │ ├── .env.example - Environment settings for Docker Compose. │ └── docker-compose.yaml.example - Sample Docker Compose configuration. ├── locations.yaml - Configurations for locations to be crawled. ├── Pipfile - Pipenv file for managing project dependencies. ├── Pipfile.lock - Lock file for dependencies, ensuring consistency. ├── requirements.txt - Python package requirements for environments not using Pipenv. ├── README.md - The README file for the project. ├── src │ ├── errors.py - Custom defined errors. │ ├── export.py - Handles data export to PostgreSQL. │ ├── main.py - Main script, run this to start the application. │ ├── objects.py - Definitions of custom objects used in the project. │ ├── util.py - Utility functions for general use across the project. │ ├── web_action.py - Manages Selenium actions on websites. │ └── web_parser.py - Parses content or objects crawled by Selenium. ├── tests │ ├── data │ │ ├── Taichung Intercontinental Baseball Stadium - Overview Page.html │ │ ├── Taichung Intercontinental Baseball Stadium - Review Page.html │ │ ├── Toyama Town Hotel - Overview Page.html │ │ └── Toyama Town Hotel - Review Page.html │ ├── test_parser.py - Tests for the parser functionalities. │ └── test_web_action.py - Tests for web action functionalities. └── version.py - Contains the version information of the project.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published