-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
284 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Frequently Asked Questions (FAQ) | ||
|
||
## General Questions | ||
|
||
1. Q: What is id-jobs? | ||
A: id-jobs is a tool that collects job listings from various Indonesian job boards and company websites, organizing them into one easy-to-access Google Sheet. | ||
|
||
2. Q: Is id-jobs free to use? | ||
A: Yes, id-jobs is free and open-source under the GPL-3.0 license. | ||
|
||
3. Q: How often is the job data updated? | ||
A: The job data is updated daily through our automated scraping process. | ||
|
||
4. Q: Can I use id-jobs for job markets outside of Indonesia? | ||
A: id-jobs is specifically designed for the Indonesian job market. While you could potentially modify it for other markets, it's not designed for that purpose out of the box. | ||
|
||
## Data and Usage | ||
|
||
5. Q: How can I access the job data? | ||
A: You can view the job data in our Google Sheet at [https://s.id/id-jobs-v2](https://s.id/id-jobs-v2). | ||
|
||
6. Q: Can I download the data for my own analysis? | ||
A: Yes, you can download the data from the Google Sheet. Please remember to credit id-jobs if you use the data in any public work. | ||
|
||
7. Q: What job boards does id-jobs currently scrape? | ||
A: We currently scrape data from Jobstreet Indonesia, Glints Indonesia, Kalibrr, TopKarir, Indeed Indonesia, and various company career pages. | ||
|
||
8. Q: Is the salary information always available? | ||
A: Not all job listings include salary information. We collect this data when it's available, but many listings do not provide it. | ||
|
||
## Technical Questions | ||
|
||
9. Q: What programming language is id-jobs written in? | ||
A: id-jobs is primarily written in Python, using libraries such as Scrapy and Playwright. | ||
|
||
10. Q: Can I contribute to the id-jobs project? | ||
A: Absolutely! We welcome contributions. Please see our Contributing Guidelines for more information. | ||
|
||
11. Q: I'm having trouble setting up the project locally. What should I do? | ||
A: First, make sure you've followed all steps in our QUICKSTART.md guide. If you're still having issues, please open a GitHub issue with details about the problem you're experiencing. | ||
|
||
12. Q: How can I add a new job board to scrape? | ||
A: To add a new job board, you'll need to create a new spider in the `spiders` directory. Please see our documentation on creating new spiders, or open an issue for guidance. | ||
|
||
## Legal and Ethical Questions | ||
|
||
13. Q: Is web scraping legal? | ||
A: Web scraping can be legal, but it depends on how it's done and the website's terms of service. id-jobs is designed to scrape responsibly and in accordance with each site's robots.txt file. | ||
|
||
14. Q: Do you store personal information from job listings? | ||
A: We only collect publicly available information from job listings. We do not collect or store personal information of job seekers or employers. | ||
|
||
15. Q: What should I do if I notice incorrect data in the job listings? | ||
A: Please open a GitHub issue with details about the incorrect data. We'll investigate and correct it as soon as possible. | ||
|
||
If you have a question that's not answered here, please feel free to open an issue on our GitHub repository. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
# Quick Start Guide for id-jobs | ||
|
||
This guide provides detailed instructions for setting up and running the id-jobs project on your local machine. These steps are suitable for beginners and cover macOS, Linux, and Windows operating systems. | ||
|
||
## Table of Contents | ||
1. [Prerequisites](#prerequisites) | ||
2. [Step 1: Clone the Project](#step-1-clone-the-project) | ||
3. [Step 2: Set Up Python Environment](#step-2-set-up-python-environment) | ||
4. [Step 3: Install Necessary Tools](#step-3-install-necessary-tools) | ||
5. [Step 4: Run the Scrapers](#step-4-run-the-scrapers) | ||
6. [Step 5: Explore and Contribute](#step-5-explore-and-contribute) | ||
7. [Project Structure](#project-structure) | ||
8. [Updating the Project](#updating-the-project) | ||
9. [Troubleshooting](#troubleshooting) | ||
10. [Common Issues and Solutions](#common-issues-and-solutions) | ||
11. [Responsible Scraping](#responsible-scraping) | ||
|
||
## Prerequisites | ||
|
||
- Git | ||
- Python 3.15 or higher | ||
|
||
## Step 1: Clone the Project | ||
|
||
1. Install Git from [git-scm.com](https://git-scm.com/downloads) if you haven't already. | ||
2. Open your terminal (Command Prompt on Windows, Terminal on macOS/Linux). | ||
3. Navigate to where you want to store the project: | ||
``` | ||
cd path/to/your/preferred/directory | ||
``` | ||
4. Clone the repository: | ||
``` | ||
git clone https://github.com/ceroberoz/id-jobs.git | ||
``` | ||
5. Move into the project directory: | ||
``` | ||
cd id-jobs | ||
``` | ||
|
||
## Step 2: Set Up Python Environment | ||
|
||
1. Install Python 3.15 or higher from [python.org](https://www.python.org/downloads/). | ||
2. Create a virtual environment: | ||
- On Windows: | ||
``` | ||
python -m venv venv | ||
``` | ||
- On macOS/Linux: | ||
``` | ||
python3 -m venv venv | ||
``` | ||
3. Activate the virtual environment: | ||
- On Windows: | ||
``` | ||
venv\Scripts\activate | ||
``` | ||
- On macOS/Linux: | ||
``` | ||
source venv/bin/activate | ||
``` | ||
## Step 3: Install Necessary Tools | ||
1. Upgrade pip (Python's package installer): | ||
``` | ||
pip install --upgrade pip | ||
``` | ||
2. Install project dependencies: | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
3. Install Playwright browsers: | ||
``` | ||
playwright install | ||
``` | ||
## Step 4: Run the Scrapers | ||
1. To see results in the terminal: | ||
``` | ||
scrapy crawl jobstreet | ||
``` | ||
2. To save results to a CSV file: | ||
``` | ||
scrapy crawl jobstreet -o jobstreet_jobs.csv -t csv | ||
``` | ||
Replace `jobstreet` with other scraper names to collect data from different sources. | ||
## Step 5: Explore and Contribute | ||
- Check the `spiders` folder to see all available scrapers. | ||
- To modify a scraper, open its file (e.g., `jobstreet.py`) in a text editor. | ||
- If you make improvements, consider contributing back to the project: | ||
1. Fork the repository on GitHub | ||
2. Create a new branch for your feature | ||
3. Commit your changes and push to your fork | ||
4. Open a pull request with a clear description of your changes | ||
## Project Structure | ||
Here's an overview of the main directories and files in the project: | ||
- `spiders/`: Contains individual scraper files for each job board | ||
- `items.py`: Defines the structure of scraped data | ||
- `pipelines.py`: Handles data processing and storage | ||
- `settings.py`: Contains project settings and configurations | ||
## Updating the Project | ||
To update your local copy of the project: | ||
1. Ensure you're in the project directory | ||
2. Pull the latest changes: | ||
``` | ||
git pull origin main | ||
``` | ||
3. Update dependencies: | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
## Troubleshooting | ||
- If you encounter any issues, first ensure your Python version is 3.15 or higher: | ||
``` | ||
python --version | ||
``` | ||
- For macOS/Linux users, you might need to use `python3` instead of `python` in commands. | ||
- If a website structure changes, the scraper might stop working. Check the project's issues on GitHub or report a new one. | ||
## Common Issues and Solutions | ||
1. **Scraper not working for a specific website** | ||
- Check if the website structure has changed | ||
- Verify your internet connection | ||
- Ensure you're using the latest version of the scraper | ||
2. **Import errors when running scrapers** | ||
- Make sure you've activated the virtual environment | ||
- Verify all dependencies are installed correctly | ||
3. **Permission issues (especially on Linux/macOS)** | ||
- Ensure you have the necessary permissions to write to the output directory | ||
- Try running the command with `sudo` (use with caution) | ||
## Responsible Scraping | ||
Remember to always scrape responsibly: | ||
- Respect each website's `robots.txt` file and terms of service. | ||
- Implement reasonable delays between requests to avoid overloading servers. | ||
- Only collect publicly available data that you have permission to access. | ||
For any questions or issues not covered here, please open an issue on the GitHub repository. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters