DigiMoviez Scraper

DigiMoviez Movie Scraper

A robust Rust-based web scraper designed to collect movie information and download links from DigiMoviez.com, storing the data in MongoDB.

Overview

This scraper is built to systematically collect movie metadata, including titles, ratings, cast information, and download links. It features robust error handling, rate limiting, and progress tracking to ensure reliable data collection.

Features

Scrapes comprehensive movie metadata (title, IMDB rating, duration, genres, etc.)
Collects download links with quality and size information
Stores data in MongoDB with upsert functionality

Prerequisites

Rust
MongoDB
Environment variables configuration (.env)

Required Environment Variables

MONGO_URI=your_mongodb_connection_string(example: mongodb://localhost:27017)
DB_NAME=your_database_name(example: "digimoviez")
DM_COOKIE_NAME=your_cookie_name(your cookie name from digimoviez.com)
DM_COOKIE_VALUE=your_cookie_value(your cookie value from digimoviez.com)
DM_COOKIE_EXPIRES=your_cookie_expiration(your cookie value from digimoviez.com example:"2025-02-19T06:11:29.470Z")

Data Structure

Movie Schema

struct Movie {
    title: String,
    imdb_id: String,
    imdb_rating: String,
    duration: String,
    genres: Vec<String>,
    director: String,
    stars: Vec<String>,
    country: String,
    description: String,
    metacritic_score: String,
    awards: String,
    image_url: String,
    has_subtitle: bool,
    trailer_link: String,
    page_number: u32,
    content_type: String,
    slug: Option<String>,
    source: String
}

Download Links Schema

struct DownloadLinks {
    imdb_id: String,
    slug: String,
    last_updated: DateTime,
    sections: Vec<DownloadSection>,
    source: String
}

Authentication Setup

To run the scraper, you need valid authentication cookies from DigiMoviez.com. Follow these steps:

Log in to DigiMoviez.com with your account
Open browser Developer Tools:
- Chrome/Edge: Press F12 or Ctrl+Shift+I
- Firefox: Press F12 or Ctrl+Shift+I
- Safari: Enable developer menu in Preferences → Advanced
Navigate to:
- Chrome/Edge: Application → Cookies
- Firefox: Storage → Cookies
- Safari: Storage → Cookies
Find the "wordpress_logged_in" cookie

Extract the following information:

Cookie Name example: wordpress_logged_in_d13b2bvd21d06301434df5f427acb040
Cookie Value example: your-user-name-on-digi-i-think%7C1739974278%7C182Q7p5IpD7eQ8gDwqNEYdAk21wsXtPwLJcxlUb656v%7C0263e859b2eefcf214d19ce002445da249116a01b792dbc06bfa4cbd6e0325d8
Cookie Expiration example: Thu, 20 Feb 2025 02:11:18 GMT

Installation

Clone the repository:

git clone [repository-url]

Install dependencies:

cargo build

Set up environment variables in a .env file
Run the scraper:

cargo run

How It Works

Progress Tracking: The scraper starts from the last scraped page (defaults to 889 if no progress is found)
Movie Collection:
- Fetches movie metadata from each page
- Extracts download links for each movie
Data Storage:
- Stores movie data in the movies collection
- Stores download links in the download_links collection, you can query on it by "imdb_id" or "slug"

Features in Detail

Rate Limiting

1-second delay between successful requests
5-second delay after errors

Progress Tracking

Stores last scraped page in MongoDB
Enables resume functionality
Updates progress after successful page processing

MongoDB Collections

movies: Stores movie metadata
download_links: Stores download links and quality information
progress: Tracks scraping progress

Dependencies

tokio: Async runtime
reqwest: HTTP client
mongodb: MongoDB driver
scraper: HTML parsing
serde: Serialization/Deserialization
lazy_static: Static initialization
dotenv: Environment variable management

Limitations

Dependent on site structure stability
Requires valid cookie credentials
Sequential page processing
Single-threaded operation

Future Improvements

Implement parallel processing
Add proxy support
Enhance error recovery
Add data validation
Implement retry queues
Add metrics collection
Implement backup functionality

Author

Created and maintained by "PocketJack (Rez Khaleghi)"

GitHub: https://github.com/rezkhaleghi
Email: rezaxkhaleghi@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.env		.env
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DigiMoviez Scraper

Overview

Features

Prerequisites

Required Environment Variables

Data Structure

Movie Schema

Download Links Schema

Authentication Setup

Installation

How It Works

Features in Detail

Rate Limiting

Progress Tracking

MongoDB Collections

Dependencies

Limitations

Future Improvements

Author

About

Releases

Packages

Languages

rezkhaleghi/digimovie-scraper

Folders and files

Latest commit

History

Repository files navigation

DigiMoviez Scraper

Overview

Features

Prerequisites

Required Environment Variables

Data Structure

Movie Schema

Download Links Schema

Authentication Setup

Installation

How It Works

Features in Detail

Rate Limiting

Progress Tracking

MongoDB Collections

Dependencies

Limitations

Future Improvements

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages