Skip to content

leokinzinger/BrightData-Instagram-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instagram Scraper

This repository contains scripts and notebooks for scraping Instagram data, downloading images and videos, and processing the collected data. The project uses the Bright Data API for data collection and includes functionality for triggering data collection, fetching snapshot statuses, and downloading media content.

Project Structure

.env
.gitignore
data/
    demo_table.xlsx
    ...
images/
instagram_scraper/
    __init__.py
    download_images.py
    download_videos.py
    fetch_snapshot_data.py
    get_snapshots_status.py
    trigger_data_collection.py
notebook.ipynb
requirements.txt
videos/

Installation

  1. Clone the repository
git clone https://github.com/leokinzinger/BrightData-Instagram-Scraper.git
cd BrightData-Instagram-Scraper
  1. Create a virtual environment and activate it
conda create -n instagram-scraper python=3.10
conda activate instagram-scraper
  1. Install the required packages
pip install -r requirements.txt
  1. Create a .env file and add the following environment variables
BRIGHT_DATA_API_KEY=your_bright_data_api_key
  1. Initialse the Bright Data API Key in python
import os
import dotenv
load_dotenv()

api_token = os.getenv("BRIGHT_DATA_API_KEY")

Usage

Trigger Data Collection

Use the trigger_data_collection.py function to trigger data collection for a list of URLs.

from instagram_scraper import trigger_data_collection

dataset_id = "your_dataset_id"
urls = ["url1", "url2", ...]

response = trigger_data_collection(api_token, dataset_id, urls)
print(response)

Fetch Snapshot Status

Use the get_snapshots_status function to fetch the status of all snapshots for a given dataset ID.

from instagram_scraper import get_snapshots_status

api_token = "your_api_token"
dataset_id = "your_dataset_id"

status = get_snapshots_status(api_token, dataset_id)
print(status)

Fetch Snapshot Data

Use the fetch_snapshot_data function to fetch data for a specific snapshot ID and add a UUID column.

from instagram_scraper import fetch_snapshot_data

snapshot_id = "your_snapshot_id"

data = fetch_snapshot_data(api_token, snapshot_id)
print(data)

Jupyter Notebook

The notebook.ipynb file contains code for processing the collected data, downloading images and videos, and saving the results to Excel files. Open the notebook in Jupyter to explore and run the code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors