jobs_scraper
is a simple job postings scraper for the website Indeed, it is written in Python and is based on the requests
and BeautifulSoup
libraries.
Run the following to install the package:
pip install jobs_scraper
To use jobs_scraper you need to create a new JobsScraper object and provide the following attributes to its constructor:
country
: prefix country.position
: job position.location
: job location.pages
: number of pages to be scraped.
from jobs_scraper import JobsScraper
# Let's create a new JobsScraper object and perform the scraping for a given query.
scraper = JobsScraper(country="nl", position="Data Engineer", location="Amsterdam", pages=3)
df = scraper.scrape()
In this way, the first three pages for the example query "Data Engineer" based in "Amsterdam" on the Dutch version of the portal Indeed get scraped.
The scrape
method returns a Pandas dataframe, therefore it is possible to export it into a csv file.
-
max_delay
: bearing in mind that this package is meant only for educational purposes, a delay in the requests can be provided. By settingmax_delay
in the constructor, every job posting will be randomly scraped in an interval between0
andmax_delay
seconds.scraper = JobsScraper(country="...", position="...", location="...", pages=..., max_delay=5)
-
full_urls
: since most of the scraped job urls are pretty long, the returned Pandas dataframe will truncate them, making it not simple to access. Settingfull_urls
toTrue
, the scraped urls will not be truncated.scraper = JobsScraper(country="...", position="...", location="...", pages=..., full_urls=True)
- Add rotating proxies to prevent the scraper from being blocked once too many requests are sent.