Skip to content

Erinable/podcast_crawler

Repository files navigation

Podcast Crawler 🎙️

Overview

Podcast Crawler is an advanced, async Rust-based podcast management and crawling system designed for efficient podcast data retrieval, storage, and analysis.

Features

  • 🚀 Asynchronous Rust implementation
  • 📦 Diesel ORM for PostgreSQL database interactions
  • 🔍 Flexible podcast and episode crawling
  • 📊 Advanced querying capabilities
  • 🛡️ Robust error handling
  • 📝 Comprehensive logging

Technology Stack

  • Language: Rust (Edition 2021)
  • Async Runtime: Tokio
  • ORM: Diesel
  • Web Framework: Actix Web
  • Logging: Tracing

Prerequisites

  • Rust 1.67+ (stable)
  • PostgreSQL 12+
  • Cargo
  • diesel_cli

Installation

1. Clone the Repository

git clone https://github.com/Erinable/podcast_crawler.git
cd podcast_crawler

2. Install Dependencies

cargo install diesel_cli --no-default-features --features postgres

3. Database Setup

# Create databases
createdb podcast
createdb podcast_test

# Run migrations
diesel migration run

4. Configuration

Copy .env.example to .env and configure your settings:

cp .env.example .env

5. Build and Run

# Development build
cargo run

# Release build
cargo run --release

6. Running Tests

cargo test
cargo clippy

Makefile Tools 🛠️

The project includes a comprehensive Makefile with various utility commands:

Development Commands

  • make run: Run the project in development mode

    # Run in dev mode (default)
    make run
    
    # Run in release mode
    make run BUILD_TYPE=--release

Log Analysis

  • make average: Calculate average duration from the most recent log file

    make average

Quality Checks

  • make pre-commit: Run pre-commit checks to ensure code quality

    make pre-commit

Maintenance Commands

  • make clean: Clean project build artifacts
  • make test: Run project tests
  • make doc: Generate project documentation

Pro Tips 💡

  • Use BUILD_TYPE=--release for optimized performance
  • Pre-commit checks help maintain code quality
  • Log analysis provides insights into crawler performance

Performance Optimization

  • Uses native CPU target optimizations
  • Async design for high concurrency
  • Connection pooling
  • Efficient database queries

Security

  • Environment-based configuration
  • Secret detection in pre-commit hooks
  • Dependency vulnerability scanning

Contributing

  1. Fork the repository
  2. Create your feature branch
  3. Commit with conventional commits
  4. Run pre-commit hooks
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Roadmap

  • Enhanced podcast discovery
  • Machine learning recommendations
  • Advanced analytics
  • Multi-database support

Metrics

Build Status Coverage Downloads

Contact

Arrow Tunner - Mr.han76@outlook.com

Project Link: https://github.com/Erinable/podcast_crawler