This is an Instagram hashtag and user reels scraping tool 🤖 that allows you to extract data from Instagram's search results and user profiles. It utilizes Selenium with ChromeDriver 🚗 for web automation and BeautifulSoup 🥣 for parsing HTML content.
- Search Results Scraping: Extract post links based on specific hashtags 🔍.
- User Reels Scraping: Scrape reel links from user profiles 🎬.
- Configurable Options: Customize behavior via
config.json
(e.g., headless mode, timeouts) ⚙️. - Error Handling: Robust error handling for network issues, login failures, and unexpected changes in Instagram's layout 🚨.
- Logging: Detailed logs for debugging and tracking progress 📝.
- Progress Visualization: Real-time progress updates using the
rich
library 📊.
To run this scraper, ensure you have the following installed:
- Python 3.8 or higher 🐍
- Google Chrome (latest version recommended) 🌐
- ChromeDriver (matching your Chrome version) 🚗
Run the following command to install the required Python packages:
pip install -r requirements.txt
Dependencies include:
selenium
beautifulsoup4
rich
webdriver-manager
(optional, for automatic driver management)
-
Clone the Repository 📂
git clone https://github.com/xlastfire/Instagram-Hashtag-Scraper.git cd Instagram-Hashtag-Scraper
-
Install Dependencies 💻
pip install -r requirements.txt
-
Download ChromeDriver 🚗
- Download the ChromeDriver executable that matches your installed version of Google Chrome from the official site.
- Place the
chromedriver
executable in a directory of your choice and update the path inconfig.json
.
-
Update Configuration 📝 Edit the
config.json
file to include:- Your Instagram credentials (
username
andpassword
) 🔑. - Desired queries (hashtags or usernames) under the
queries
section.
- Your Instagram credentials (
-
Run the Script
▶️ Execute the scraper:python scraper.py
The config.json
file contains all configurable options. Below is an example configuration:
{
"username": "your_instagram_username",
"password": "your_instagram_password",
"driver_executable_path": "/path/to/chromedriver",
"headless": true,
"disable_images": true,
"disable_videos": false,
"default_timeout": 10,
"search_post_count": 100,
"follow_user_reels": false,
"log_file": "logs/instagram.log",
"search_path": "data/search/",
"user_reels_path": "data/user_reels/",
"queries": {
"#nature": "search",
"natgeo": "user_reels"
},
"completed_queries": {}
}
username
,password
: Your Instagram login credentials 🔑.driver_executable_path
: Path to the ChromeDriver executable 🚗.headless
: Run browser in headless mode (no GUI) 👻.disable_images
,disable_videos
: Optimize performance by disabling media loading 🖼️🎬.search_post_count
: Target number of posts to scrape per search query 🔢.follow_user_reels
: Automatically follow users when scraping their reels 👥.queries
: Dictionary of queries to process, where keys are hashtags or usernames and values are either"search"
or"user_reels"
.
Add your desired hashtags to the queries
section in config.json
with the value "search"
. For example:
"queries": {
"#nature": "search",
"#travel": "search"
}
Run the script:
python scraper.py
Scraped data will be saved in the directory specified by search_path
.
Add usernames to the queries
section with the value "user_reels"
. For example:
"queries": {
"natgeo": "user_reels",
"bbcearth": "user_reels"
}
Run the script:
python scraper.py
Reel links will be saved in the directory specified by user_reels_path
.
Detailed logs are stored in the file specified by log_file
. Use these logs to debug issues or track progress.
Contributions are welcome! If you'd like to contribute, please follow these steps:
- Fork the repository 🍴.
- Create a new branch for your feature or bug fix 🌿.
- Submit a pull request with a clear description of your changes 📝.
Please ensure your code adheres to the project's coding standards and includes appropriate documentation.
This project is licensed under the MIT License. See the LICENSE file for details.
This tool is intended for educational and research purposes only. Ensure you comply with Instagram's Terms of Service before using it. The author is not responsible for any misuse or violations of terms resulting from the use of this script.