Before using the script, make sure you have the following installed:
This project consists of two Python scripts & bat file designed to scrape job listings from Indeed.com for various countries and job positions.
- Main.py
- job_scraper_utils.py
- Srape.bat
- Python 3.7+
- See requirements.txt for a full list of required packages
- Install the required dependencies:
pip install -r requirements.txt
- Ensure you have Chrome browser installed on your system.
-
main.py: This is the main script that you'll run to scrape job listings.
- It's currently set up to search for "Banker" jobs in "Melbourne", Australia.
- The script will create a 'csv_files' directory in the same location as the script.
- The scraped job data will be saved as a CSV file in this directory.
To run:
python main.py
To modify the search parameters, edit the following variables in the main() function:
- country = australia (Choose from the list of country variables at the top of the script)
- job_position = 'Banker'
- job_location = 'Melbourne'
- date_posted = 10 (Number of days to look back)
-
job_scraper_utils.py: This script contains utility functions used by main.py. It includes functions for:
- Configuring the webdriver
- Searching for jobs
- Scraping job data
- Cleaning and sorting the scraped data
You don't need to run this script directly, but you can modify its functions to change how the scraping works.
Output: The script will create a CSV file named in the format: {job_position}{job_location}{current_date}.csv
This file will contain the following information for each job listing:
- Link
- Job Title
- Company
- Date Posted
- Location
- Job Description
- Salary
- Search Query
Web scraping may be against the terms of service of some websites. Ensure you have permission to scrape data from Indeed.com and use the data responsibly. Be mindful of the rate at which you're making requests to avoid overloading the server.
The scrape.bat file, is to be used when you have multiple main.py for different positions and locations. The Bat file will search for *.py within the directory and execute.