add better readme, quickstart & faq

ceroberoz · Aug 23, 2024 · 2e9c38a · 2e9c38a
1 parent 5f4261b
commit 2e9c38a
Show file tree

Hide file tree

Showing 3 changed files with 284 additions and 81 deletions.
diff --git a/FAQ.md b/FAQ.md
@@ -0,0 +1,56 @@
+# Frequently Asked Questions (FAQ)
+
+## General Questions
+
+1. Q: What is id-jobs?
+   A: id-jobs is a tool that collects job listings from various Indonesian job boards and company websites, organizing them into one easy-to-access Google Sheet.
+
+2. Q: Is id-jobs free to use?
+   A: Yes, id-jobs is free and open-source under the GPL-3.0 license.
+
+3. Q: How often is the job data updated?
+   A: The job data is updated daily through our automated scraping process.
+
+4. Q: Can I use id-jobs for job markets outside of Indonesia?
+   A: id-jobs is specifically designed for the Indonesian job market. While you could potentially modify it for other markets, it's not designed for that purpose out of the box.
+
+## Data and Usage
+
+5. Q: How can I access the job data?
+   A: You can view the job data in our Google Sheet at [https://s.id/id-jobs-v2](https://s.id/id-jobs-v2).
+
+6. Q: Can I download the data for my own analysis?
+   A: Yes, you can download the data from the Google Sheet. Please remember to credit id-jobs if you use the data in any public work.
+
+7. Q: What job boards does id-jobs currently scrape?
+   A: We currently scrape data from Jobstreet Indonesia, Glints Indonesia, Kalibrr, TopKarir, Indeed Indonesia, and various company career pages.
+
+8. Q: Is the salary information always available?
+   A: Not all job listings include salary information. We collect this data when it's available, but many listings do not provide it.
+
+## Technical Questions
+
+9. Q: What programming language is id-jobs written in?
+   A: id-jobs is primarily written in Python, using libraries such as Scrapy and Playwright.
+
+10. Q: Can I contribute to the id-jobs project?
+    A: Absolutely! We welcome contributions. Please see our Contributing Guidelines for more information.
+
+11. Q: I'm having trouble setting up the project locally. What should I do?
+    A: First, make sure you've followed all steps in our QUICKSTART.md guide. If you're still having issues, please open a GitHub issue with details about the problem you're experiencing.
+
+12. Q: How can I add a new job board to scrape?
+    A: To add a new job board, you'll need to create a new spider in the `spiders` directory. Please see our documentation on creating new spiders, or open an issue for guidance.
+
+## Legal and Ethical Questions
+
+13. Q: Is web scraping legal?
+    A: Web scraping can be legal, but it depends on how it's done and the website's terms of service. id-jobs is designed to scrape responsibly and in accordance with each site's robots.txt file.
+
+14. Q: Do you store personal information from job listings?
+    A: We only collect publicly available information from job listings. We do not collect or store personal information of job seekers or employers.
+
+15. Q: What should I do if I notice incorrect data in the job listings?
+    A: Please open a GitHub issue with details about the incorrect data. We'll investigate and correct it as soon as possible.
+
+If you have a question that's not answered here, please feel free to open an issue on our GitHub repository.
diff --git a/QUICKSTART.md b/QUICKSTART.md
@@ -0,0 +1,154 @@
+# Quick Start Guide for id-jobs
+
+This guide provides detailed instructions for setting up and running the id-jobs project on your local machine. These steps are suitable for beginners and cover macOS, Linux, and Windows operating systems.
+
+## Table of Contents
+1. [Prerequisites](#prerequisites)
+2. [Step 1: Clone the Project](#step-1-clone-the-project)
+3. [Step 2: Set Up Python Environment](#step-2-set-up-python-environment)
+4. [Step 3: Install Necessary Tools](#step-3-install-necessary-tools)
+5. [Step 4: Run the Scrapers](#step-4-run-the-scrapers)
+6. [Step 5: Explore and Contribute](#step-5-explore-and-contribute)
+7. [Project Structure](#project-structure)
+8. [Updating the Project](#updating-the-project)
+9. [Troubleshooting](#troubleshooting)
+10. [Common Issues and Solutions](#common-issues-and-solutions)
+11. [Responsible Scraping](#responsible-scraping)
+
+## Prerequisites
+
+- Git
+- Python 3.15 or higher
+
+## Step 1: Clone the Project
+
+1. Install Git from [git-scm.com](https://git-scm.com/downloads) if you haven't already.
+2. Open your terminal (Command Prompt on Windows, Terminal on macOS/Linux).
+3. Navigate to where you want to store the project:
+   ```
+   cd path/to/your/preferred/directory
+   ```
+4. Clone the repository:
+   ```
+   git clone https://github.com/ceroberoz/id-jobs.git
+   ```
+5. Move into the project directory:
+   ```
+   cd id-jobs
+   ```
+
+## Step 2: Set Up Python Environment
+
+1. Install Python 3.15 or higher from [python.org](https://www.python.org/downloads/).
+2. Create a virtual environment:
+   - On Windows:
+     ```
+     python -m venv venv
+     ```
+   - On macOS/Linux:
+     ```
+     python3 -m venv venv
+     ```
+3. Activate the virtual environment:
+   - On Windows:
+     ```
+     venv\Scripts\activate
+     ```
+   - On macOS/Linux:
+     ```
+     source venv/bin/activate
+     ```
+
+## Step 3: Install Necessary Tools
+
+1. Upgrade pip (Python's package installer):
+   ```
+   pip install --upgrade pip
+   ```
+2. Install project dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+3. Install Playwright browsers:
+   ```
+   playwright install
+   ```
+
+## Step 4: Run the Scrapers
+
+1. To see results in the terminal:
+   ```
+   scrapy crawl jobstreet
+   ```
+2. To save results to a CSV file:
+   ```
+   scrapy crawl jobstreet -o jobstreet_jobs.csv -t csv
+   ```
+
+Replace `jobstreet` with other scraper names to collect data from different sources.
+
+## Step 5: Explore and Contribute
+
+- Check the `spiders` folder to see all available scrapers.
+- To modify a scraper, open its file (e.g., `jobstreet.py`) in a text editor.
+- If you make improvements, consider contributing back to the project:
+  1. Fork the repository on GitHub
+  2. Create a new branch for your feature
+  3. Commit your changes and push to your fork
+  4. Open a pull request with a clear description of your changes
+
+## Project Structure
+
+Here's an overview of the main directories and files in the project:
+
+- `spiders/`: Contains individual scraper files for each job board
+- `items.py`: Defines the structure of scraped data
+- `pipelines.py`: Handles data processing and storage
+- `settings.py`: Contains project settings and configurations
+
+## Updating the Project
+
+To update your local copy of the project:
+
+1. Ensure you're in the project directory
+2. Pull the latest changes:
+   ```
+   git pull origin main
+   ```
+3. Update dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+
+## Troubleshooting
+
+- If you encounter any issues, first ensure your Python version is 3.15 or higher:
+  ```
+  python --version
+  ```
+- For macOS/Linux users, you might need to use `python3` instead of `python` in commands.
+- If a website structure changes, the scraper might stop working. Check the project's issues on GitHub or report a new one.
+
+## Common Issues and Solutions
+
+1. **Scraper not working for a specific website**
+   - Check if the website structure has changed
+   - Verify your internet connection
+   - Ensure you're using the latest version of the scraper
+
+2. **Import errors when running scrapers**
+   - Make sure you've activated the virtual environment
+   - Verify all dependencies are installed correctly
+
+3. **Permission issues (especially on Linux/macOS)**
+   - Ensure you have the necessary permissions to write to the output directory
+   - Try running the command with `sudo` (use with caution)
+
+## Responsible Scraping
+
+Remember to always scrape responsibly:
+- Respect each website's `robots.txt` file and terms of service.
+- Implement reasonable delays between requests to avoid overloading servers.
+- Only collect publicly available data that you have permission to access.
+
+For any questions or issues not covered here, please open an issue on the GitHub repository.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# id-jobs: Your One-Stop Shop for Indonesian Job Market Data (Now with 50% Less Frustration!)
+# id-jobs: Your One-Stop Shop for Indonesian Job Market Data
 
 [![Scrape and Upload to Google Sheets](https://github.com/ceroberoz/id-jobs/actions/workflows/scrape.yml/badge.svg)](https://github.com/ceroberoz/id-jobs/actions/workflows/scrape.yml)
 [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
@@ -7,99 +7,92 @@
 ![Made with Scrapy](https://img.shields.io/badge/Made%20with-Scrapy-green.svg)
 ![Made with Playwright](https://img.shields.io/badge/Made%20with-Playwright-orange.svg)
 
+## Table of Contents
+1. [Overview](#overview)
+2. [Why id-jobs?](#why-id-jobs)
+3. [How it Works](#how-it-works)
+4. [Data Sources](#data-sources)
+5. [Using the Data](#using-the-data)
+6. [Features](#features)
+7. [Quick Start Guide for Technical Users](#quick-start-guide-for-technical-users)
+8. [Work in Progress](#work-in-progress)
+9. [Contributing](#contributing)
+10. [Support](#support)
+11. [License](#license)
+
+## Overview
+
 📊 **View Job Data:** [https://s.id/id-jobs-v2](https://s.id/id-jobs-v2)
 
-id-jobs: Because searching for a job shouldn't feel like finding a needle in a haystack... made of more needles.
+id-jobs is a tool that collects job listings from various Indonesian job boards and company websites, organizing them into one easy-to-access Google Sheet.
 
-🇮🇩 **Important:** id-jobs is specifically designed for the Indonesian job market. If you're looking for jobs in Antarctica, you're in the wrong place (but we admire your adventurous spirit!).
+🇮🇩 **Important:** id-jobs is specifically designed for the Indonesian job market.
 
 ## Why id-jobs?
 
-Ever felt like finding a job in Indonesia is like trying to herd cats while juggling flaming torches? Say no more! id-jobs gathers job listings faster than you can say "Aku butuh pekerjaan!"
+Finding the right job in Indonesia can be challenging, with information scattered across multiple websites. id-jobs simplifies this process by gathering job information from various Indonesian sources into one place, making it easier for job seekers to find opportunities.
 
 ## How it Works
 
-Picture id-jobs as your personal job-hunting ninja. It stealthily visits Indonesian job websites, grabs the goods, and presents it all in one tidy spreadsheet. No black outfit or throwing stars required!
-
-We currently infiltrate... err, collect data from:
-- Jobstreet Indonesia (not to be confused with Sesame Street)
-- Glints Indonesia (shinier than regular Indonesia)
-- Kalibrr (fun to say, even more fun to find jobs on)
-- And more local sources than you can shake a stick at!
-
-## Making Sense of the Data
-
-With all this Indonesian job data at your fingertips, you can:
-- Spot trends faster than a teenager spots a new TikTok dance
-- Create charts so beautiful, they'll make Excel jealous
-- Use spreadsheet functions to analyze the job market (Warning: May cause unexpected bouts of "I love data!")
-
-## Get Involved (for Brave Beginners and Curious Coders)
-
-Ready to dip your toes into the exciting world of web scraping? Don't worry, we won't throw you into the deep end without a floatie! Here's how to get started:
-
-1. **Clone the project:**
-   - Think of this as making a copy of our digital treasure map.
-   - Visit the [GitHub page](https://github.com/ceroberoz/id-jobs)
-   - Click the green "Code" button and copy the URL
-   - Open your computer's terminal (Don't panic! It's just a text-based adventure game)
-   - Type `git clone [paste URL here]` and press Enter
-   - Congratulations! You've just performed your first tech heist (legally, of course)
-
-2. **Set up your Python playground:**
-   - Install Python from python.org (It's like LEGO for grown-ups)
-   - In the terminal, navigate to the project folder:
-     ```
-     cd id-jobs
-     ```
-   - Create a virtual environment (It's like a sandbox, but without the cats):
-     ```
-     python -m venv venv
-     ```
-   - Activate your new digital fort:
-     - Windows: `venv\Scripts\activate`
-     - Mac/Linux: `source venv/bin/activate`
-   - You're now in a magical realm where code reigns supreme!
-
-3. **Summon the necessary tools:**
-   - Install required packages:
-     ```
-     pip install -r requirements.txt
-     ```
-   - Install Playwright (our robotic browser puppet):
-     ```
-     playwright install
-     ```
-   - You're now armed with everything you need to conquer the job data world!
-
-4. **Run your first scraper:**
-   - To see results appear like magic in your terminal:
-     ```
-     scrapy crawl jobstreet
-     ```
-   - To capture your prey (data) in a file:
-     ```
-     scrapy crawl jobstreet -o jobstreet_jobs.csv -t csv
-     ```
-   - Watch in awe as job listings materialize before your very eyes!
-
-5. **Experiment and explore:**
-   - Try different scrapers by replacing 'jobstreet' with other scraper names
-   - Modify the scrapers in the `spiders` folder (Caution: Here be dragons... and exciting possibilities!)
-   - Break things, fix things, learn things – it's all part of the coding adventure!
-
-Remember, every master coder started as a beginner. So don't be afraid to make mistakes – they're just plot twists in your coding journey!
-
-Need help? Feel free to open an issue on GitHub. We promise not to laugh... too hard. 😉
-
-Now go forth and scrape responsibly, young data padawan!
+id-jobs automatically visits Indonesian job websites, collects relevant information, and organizes it all in one spreadsheet. This process, called web scraping, is done respectfully and in line with each website's terms of service.
+
+## Data Sources
+
+We currently collect data from:
+- Jobstreet Indonesia
+- Glints Indonesia
+- Kalibrr
+- TopKarir
+- Indeed Indonesia
+- Various company career pages
+
+## Using the Data
+
+With all this Indonesian job data in one place, you can:
+- Identify trends in job titles, salaries, or locations across Indonesia
+- Create visual charts and graphs using tools like Google Looker Studio or Tableau
+- Use spreadsheet functions to filter and analyze the Indonesian job market
+
+## Features
+
+- Automated daily data collection
+- Centralized data storage in Google Sheets
+- Easy-to-use interface for data analysis
+- Support for multiple job boards and company websites
+- Open-source codebase for community contributions
+
+## Quick Start Guide for Technical Users
+
+If you're interested in running the scrapers yourself or contributing to the project, we've prepared a detailed guide to help you get started quickly.
+
+For step-by-step instructions on how to:
+- Clone the project
+- Set up your Python environment
+- Install necessary tools
+- Run the scrapers
+- Explore and contribute to the project
+
+Please refer to our [Quick Start Guide](QUICKSTART.md).
+
+This guide provides beginner-friendly instructions for users on macOS, Linux, and Windows.
 
 ## Work in Progress
 
-Some job sources might be playing hard to get. Don't worry, our code cupids are on the case!
+We're continuously updating our system to ensure we capture the most current job data. Some job sources might be temporarily unavailable as websites change, but we're working to keep the data flow consistent.
+
+## Contributing
+
+We welcome contributions from the community! Whether it's adding new features, fixing bugs, or improving documentation, your help is appreciated. Please see our [Contribution Guidelines](CONTRIBUTING.md) for more information.
+
+## Support
+
+If you encounter any issues or have questions, please:
+1. Check our [FAQ](FAQ.md) for common questions and answers
+2. Open an issue on our GitHub repository
+3. Contact us at support@id-jobs.com
 
 ## License
 
-id-jobs is open source under the GPL-3.0 license. Use it, tweak it, share it – just keep it open source. It's like a potluck, but for code!
+id-jobs is open source under the GPL-3.0 license. You're free to use, modify, and share the code, as long as you keep it open source too.
 
-Remember: Always scrape responsibly. No website feelings were harmed in the making of this project!
+Remember: We always respect website terms of service when collecting data.