Skip to content

AM10101010/redquill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Redquill

Open-source .NET SDK for AI-powered contract analysis using Retrieval-Augmented Generation (RAG).

Alpha — under active development. APIs will change. Not for production use yet.

Redquill turns legal contracts into structured, searchable, AI-queryable data. Point it at a PDF, ask a question in plain English, and get a grounded answer that cites specific sections and pages.

# Ingest a contract (parse, chunk, embed, store)
redquill ingest ./nda-acme-2025.pdf

# Ask questions — answers cite specific sections
redquill ask --query "What are the termination conditions?"
redquill ask --query "How is confidential information defined?"

Demo

Querying an ingested NDA from the command line — the answer cites specific contract sections.

dotnet run --project Redquill.Cli -- ask --query "What are the termination conditions?" --db ../redquill.db

Redquill ask command demo

Overview

Redquill is a .NET SDK and CLI tool for building applications that read and query legal contracts. It handles the RAG pipeline end-to-end — parsing PDFs, splitting them into clause-aligned chunks, embedding those chunks into vector space, storing them persistently, and retrieving the right context to ground LLM-generated answers.

The SDK is the product. The CLI is a thin wrapper that demonstrates what the SDK can do. Everything is consumable as a library via a single AddRedquill() registration call.

The Problem

LLMs are unreliable at contract analysis out of the box. A Stanford/Yale study found that LLMs hallucinated between 69–88% of the time on legal citation tasks. The CUAD benchmark documents how poorly general-purpose models perform on contract understanding without structured retrieval.

The root issues are structural. Contracts have numbered clauses, nested sub-sections, cross-references, and defined terms. Naive text chunking — splitting at every 500 tokens — tears these structures apart, causing the LLM to lose the context it needs to answer accurately. A chunk boundary in the middle of an indemnification clause, or a definition separated from the clause that references it, degrades retrieval quality significantly.

How Redquill Solves This

Redquill takes a legal-domain-specific approach to every stage of the RAG pipeline.

Clause-aware chunking. Instead of splitting on fixed token windows, Redquill splits at natural contract boundaries — section headings, numbered provisions, definition blocks. Each chunk maps to a logical unit of the contract. When the LLM receives context, it gets a complete clause about a single topic, not a fragment spanning two unrelated sections.

Structural detection. PdfPig's layout analysis identifies headings, sub-headings, definitions, signature blocks, and body text based on formatting heuristics (font size, position, weight). This structural map drives the chunker's boundary decisions and annotates every chunk with its section reference.

Grounded answers with mandatory citations. The query pipeline instructs the LLM to answer only from provided contract excerpts, cite specific section references for every claim, and explicitly state when the contract doesn't contain the requested information.

Persistent storage. Contracts are ingested once into a local SQLite database. Subsequent queries run against the stored embeddings without re-parsing or re-embedding — making interactive exploration practical even for long documents.

Pluggable architecture. Every component — parser, chunker, embedding service, vector store, chat service — is defined as an interface. Swap OpenAI for Azure OpenAI, or SQLite for Qdrant, by changing a single DI registration. No pipeline code changes.

Quick Start

Prerequisites

CLI Usage

# Set your API key
export OPENAI_API_KEY="sk-..."

# Ingest a contract (persists to local SQLite database)
redquill ingest ./contract.pdf

# Ask questions — fast, no re-ingestion
redquill ask --query "What are the termination conditions?"
redquill ask --query "Who bears the indemnification obligations?"

# Inspect document structure without AI
redquill parse ./contract.pdf --show-blocks

SDK Usage

using Redquill.Core;
using Microsoft.Extensions.DependencyInjection;

// Register all services with one call
var services = new ServiceCollection();
services.AddRedquill(opts =>
{
    opts.ApiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
    opts.StoragePath = "./data/redquill.db"; // Persistent storage
});

var provider = services.BuildServiceProvider();

// Ingest a contract
var ingestion = provider.GetRequiredService<IIngestionPipeline>();
var result = await ingestion.IngestAsync("./contract.pdf");
// → "Ingested 30 chunks from 12 pages in 4.2s"

// Ask a question — grounded answer with citations
var query = provider.GetRequiredService<IQueryPipeline>();
var answer = await query.AskAsync("What is the referral fee?");

Console.WriteLine(answer.Answer);
foreach (var source in answer.SourceChunks)
{
    Console.WriteLine($"  [{source.Score:F2}] {source.Chunk.SectionReference}");
}

Swap Providers

// Use defaults, then override what you need
services.AddRedquill(opts => { opts.ApiKey = "..."; });

// Swap embedding provider
services.AddSingleton<IEmbeddingService, MyAzureEmbeddingService>();

// Swap vector store
services.AddSingleton<IVectorStore, MyQdrantVectorStore>();

Architecture

graph LR
    subgraph Ingestion["<b>Ingestion Pipeline</b> — redquill ingest"]
        direction LR
        PDF["PDF Contract"] --> PARSE["Parse<br/><i>PdfPig</i>"]
        PARSE --> CHUNK["Chunk<br/><i>Clause-Based</i>"]
        CHUNK --> EMBED_I["Embed<br/><i>OpenAI</i>"]
        EMBED_I --> STORE["Store"]
    end

    subgraph Query["<b>Query Pipeline</b> — redquill ask"]
        direction LR
        Q["Question"] --> EMBED_Q["Embed"]
        EMBED_Q --> SEARCH["Search<br/><i>Cosine Similarity</i>"]
        SEARCH --> GROUND["Ground<br/><i>Build Cited Prompt</i>"]
        GROUND --> LLM["LLM<br/><i>gpt-4o-mini</i>"]
        LLM --> ANSWER["Cited Answer"]
    end

    DB[("SQLite<br/>redquill.db")]
    STORE --> DB
    DB --> SEARCH
Loading

Every component is defined as an interface and wired via AddRedquill(). Swap the embedding provider, vector store, or chat model by changing a single DI registration — no pipeline code changes. See docs/architecture.mermaid for the full component diagram.

Project Structure

src/
  Redquill.Core/         # SDK library — all business logic
  Redquill.Cli/          # CLI tool — thin wrapper, no logic
tests/
  Redquill.Tests/        # xUnit, FluentAssertions, NSubstitute
specs/
  002-embed-and-query/   # Sprint 2 specification (complete)
  003-persist-and-search/# Sprint 3 specification (complete)

What Works Today

Redquill can currently parse a PDF contract, split it into clause-aligned chunks, embed those chunks via OpenAI, store them in a local SQLite database, and answer natural-language questions with section-cited responses. It has been tested against real-world contracts from the CUAD dataset (Contract Understanding Atticus Dataset).

It is not yet packaged for distribution (no NuGet package). You build it from source.

What's Next

Sprint Status Focus
1. Parse & Chunk Complete PDF parsing, clause-based chunking, structural detection
2. Embed & Query Complete Semantic Kernel integration, RAG pipeline, in-memory search
3. Persist & Search Complete SQLite persistence, ingest command, chunker quality fix
4. Package & Ship Planned NuGet package, CI/CD, documentation

Longer-term directions include hybrid search (combining vector similarity with keyword matching), DOCX support, multi-document cross-querying, and a web API host. These are tracked in the specs directory but are not yet scheduled.

Tech Stack

Redquill is built on .NET 9.0 with Microsoft Semantic Kernel for LLM orchestration, PdfPig for PDF text extraction with layout analysis, SQLite (via Microsoft.Data.Sqlite) for persistent vector storage, System.CommandLine for the CLI, and Spectre.Console for terminal output.

Contributing

This is an early-stage project with a solo maintainer. If you're interested in contributing, start by opening an issue to discuss your idea. The specs directory contains detailed feature specifications that explain the design decisions behind the current implementation.

License

Apache 2.0 — see LICENSE for details.

About

RAG pipeline for legal contracts — clause-aware chunking, grounded answers with citations. .NET 9 + Semantic Kernel.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages