A crawler designed to extract detailed information about tourist attractions from Google Maps, including ratings, reviews, business hours, and addresses.
Set up the necessary .env files in the project directory as described below:
Rename and modify the content of these example files by removing ".example" from the filenames:
./database/.env.example- Set PostgreSQL configuration here; this
.envis for Docker Compose.
- Set PostgreSQL configuration here; this
./database/docker-compose.yaml.example- Replace
{{project_absolute_path}}with your project's absolute path.
- Replace
./.env.example- Set PostgreSQL configuration here; this
.envis for Python scripts.
- Set PostgreSQL configuration here; this
docker compose -f './database/docker-compose.yaml' upSpecify the locations for the crawler to target by updating locations.yaml. There are three kinds of location identifiers used to specify locations:
Obtain this code from the Google Maps "share" button. After clicking the button, you will receive a URL of the format https://maps.app.goo.gl/c7Z7qpVrx7UujUFp8. Please copy the last identifier from this URL and add it to locations.yaml.
Refers to the customer ID, which is an argument in your URL for Google Maps. If the URL contains the cid argument, use this as the identifier.
Also an argument in your URL that specifies the location. Include this identifier if available in your URL.
Set up your environment and install dependencies using pip for a standard Python environment or pipenv for managing project-specific dependencies.
Using pip:
pip install -r requirements.txtor using pipenv
pipenv installStart the crawler by executing the main script. Make sure to activate the correct environment depending on your installation method.
python src/main.pyor using pipenv
pipenv run python src/main.py. ├── .env.example - Environment settings for scripts. ├── CHANGELOG.md - Log of all notable changes made to the project. ├── conftest.py - Configuration for pytest. ├── data │ └── - Directory for output data and logs. ├── database │ ├── .env.example - Environment settings for Docker Compose. │ └── docker-compose.yaml.example - Sample Docker Compose configuration. ├── locations.yaml - Configurations for locations to be crawled. ├── Pipfile - Pipenv file for managing project dependencies. ├── Pipfile.lock - Lock file for dependencies, ensuring consistency. ├── requirements.txt - Python package requirements for environments not using Pipenv. ├── README.md - The README file for the project. ├── src │ ├── errors.py - Custom defined errors. │ ├── export.py - Handles data export to PostgreSQL. │ ├── main.py - Main script, run this to start the application. │ ├── objects.py - Definitions of custom objects used in the project. │ ├── util.py - Utility functions for general use across the project. │ ├── web_action.py - Manages Selenium actions on websites. │ └── web_parser.py - Parses content or objects crawled by Selenium. ├── tests │ ├── data │ │ ├── Taichung Intercontinental Baseball Stadium - Overview Page.html │ │ ├── Taichung Intercontinental Baseball Stadium - Review Page.html │ │ ├── Toyama Town Hotel - Overview Page.html │ │ └── Toyama Town Hotel - Review Page.html │ ├── test_parser.py - Tests for the parser functionalities. │ └── test_web_action.py - Tests for web action functionalities. └── version.py - Contains the version information of the project.