Skip to content
/ spire Public

🗼 A flexible async framework for building high-performance crawlers and scrapers, designed for developers who need extensible pipelines, strong concurrency, and robust middleware support.

License

Notifications You must be signed in to change notification settings

spire-rs/spire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spire

Build Status Crate Docs Crate Version Crate Coverage

Check out other spire projects here.

The flexible crawler & scraper framework powered by tokio and tower.

Overview

Spire is a modular web scraping and crawling framework for Rust that combines the power of async/await with the composability of tower's middleware ecosystem. It supports both HTTP-based scraping and browser automation through pluggable backends.

Features

  • Multiple Backends: HTTP (reqwest) and browser automation (thirtyfour) support
  • Tower Integration: Composable middleware using the tower ecosystem
  • Type-Safe Routing: Tag-based routing with compile-time guarantees
  • Ergonomic Extractors: Clean, type-safe data extraction from requests
  • Async/Await: Built on tokio for high-performance concurrent scraping
  • Observability: Optional tracing and metrics support
  • Graceful Shutdown: Proper resource cleanup and cancellation support

Quick Start

Add spire to your Cargo.toml:

[dependencies]
spire = { version = "0.2.0", features = ["reqwest"] }

Basic HTTP scraping example:

use spire::prelude::*;
use spire::extract::Text;
use spire::context::{RequestQueue, Tag};
use spire::reqwest_backend::HttpClient;
use spire::dataset::InMemDataset;

async fn handler(Text(html): Text) -> Result<(), Box<dyn std::error::Error>> {
    println!("Scraped {} bytes", html.len());
    Ok(())
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let router = Router::new()
        .route(Tag::new("main"), handler);

    let backend = HttpClient::default();
    let client = Client::new(backend, router)
        .with_request_queue(InMemDataset::stack())
        .with_dataset(InMemDataset::<String>::new());

    client.queue()
        .push(Tag::new("main"), "https://example.com")
        .await?;

    client.run().await?;
    Ok(())
}

See the main crate documentation for more examples and detailed usage.

Contributing

We welcome contributions! Please read our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

🗼 A flexible async framework for building high-performance crawlers and scrapers, designed for developers who need extensible pipelines, strong concurrency, and robust middleware support.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages