Instagram Scraper

This repository contains scripts and notebooks for scraping Instagram data, downloading images and videos, and processing the collected data. The project uses the Bright Data API for data collection and includes functionality for triggering data collection, fetching snapshot statuses, and downloading media content.

Project Structure

.env
.gitignore
data/
    demo_table.xlsx
    ...
images/
instagram_scraper/
    __init__.py
    download_images.py
    download_videos.py
    fetch_snapshot_data.py
    get_snapshots_status.py
    trigger_data_collection.py
notebook.ipynb
requirements.txt
videos/

Installation

Clone the repository

git clone https://github.com/leokinzinger/BrightData-Instagram-Scraper.git
cd BrightData-Instagram-Scraper

Create a virtual environment and activate it

conda create -n instagram-scraper python=3.10
conda activate instagram-scraper

Install the required packages

pip install -r requirements.txt

Create a .env file and add the following environment variables

BRIGHT_DATA_API_KEY=your_bright_data_api_key

Initialse the Bright Data API Key in python

import os
import dotenv
load_dotenv()

api_token = os.getenv("BRIGHT_DATA_API_KEY")

Usage

Trigger Data Collection

Use the trigger_data_collection.py function to trigger data collection for a list of URLs.

from instagram_scraper import trigger_data_collection

dataset_id = "your_dataset_id"
urls = ["url1", "url2", ...]

response = trigger_data_collection(api_token, dataset_id, urls)
print(response)

Fetch Snapshot Status

Use the get_snapshots_status function to fetch the status of all snapshots for a given dataset ID.

from instagram_scraper import get_snapshots_status

api_token = "your_api_token"
dataset_id = "your_dataset_id"

status = get_snapshots_status(api_token, dataset_id)
print(status)

Fetch Snapshot Data

Use the fetch_snapshot_data function to fetch data for a specific snapshot ID and add a UUID column.

from instagram_scraper import fetch_snapshot_data

snapshot_id = "your_snapshot_id"

data = fetch_snapshot_data(api_token, snapshot_id)
print(data)

Jupyter Notebook

The notebook.ipynb file contains code for processing the collected data, downloading images and videos, and saving the results to Excel files. Open the notebook in Jupyter to explore and run the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instagram Scraper

Project Structure

Installation

Usage

Trigger Data Collection

Fetch Snapshot Status

Fetch Snapshot Data

Jupyter Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
instagram_scraper		instagram_scraper
.gitignore		.gitignore
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Instagram Scraper

Project Structure

Installation

Usage

Trigger Data Collection

Fetch Snapshot Status

Fetch Snapshot Data

Jupyter Notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages