This repository contains scripts and notebooks for scraping Instagram data, downloading images and videos, and processing the collected data. The project uses the Bright Data API for data collection and includes functionality for triggering data collection, fetching snapshot statuses, and downloading media content.
.env
.gitignore
data/
demo_table.xlsx
...
images/
instagram_scraper/
__init__.py
download_images.py
download_videos.py
fetch_snapshot_data.py
get_snapshots_status.py
trigger_data_collection.py
notebook.ipynb
requirements.txt
videos/
- Clone the repository
git clone https://github.com/leokinzinger/BrightData-Instagram-Scraper.git
cd BrightData-Instagram-Scraper- Create a virtual environment and activate it
conda create -n instagram-scraper python=3.10
conda activate instagram-scraper- Install the required packages
pip install -r requirements.txt- Create a
.envfile and add the following environment variables
BRIGHT_DATA_API_KEY=your_bright_data_api_key
- Initialse the Bright Data API Key in python
import os
import dotenv
load_dotenv()
api_token = os.getenv("BRIGHT_DATA_API_KEY")Use the trigger_data_collection.py function to trigger data collection for a list of URLs.
from instagram_scraper import trigger_data_collection
dataset_id = "your_dataset_id"
urls = ["url1", "url2", ...]
response = trigger_data_collection(api_token, dataset_id, urls)
print(response)Use the get_snapshots_status function to fetch the status of all snapshots for a given dataset ID.
from instagram_scraper import get_snapshots_status
api_token = "your_api_token"
dataset_id = "your_dataset_id"
status = get_snapshots_status(api_token, dataset_id)
print(status)Use the fetch_snapshot_data function to fetch data for a specific snapshot ID and add a UUID column.
from instagram_scraper import fetch_snapshot_data
snapshot_id = "your_snapshot_id"
data = fetch_snapshot_data(api_token, snapshot_id)
print(data)The notebook.ipynb file contains code for processing the collected data, downloading images and videos, and saving the results to Excel files. Open the notebook in Jupyter to explore and run the code.