Redquill

Open-source .NET SDK for AI-powered contract analysis using Retrieval-Augmented Generation (RAG).

Alpha — under active development. APIs will change. Not for production use yet.

Redquill turns legal contracts into structured, searchable, AI-queryable data. Point it at a PDF, ask a question in plain English, and get a grounded answer that cites specific sections and pages.

# Ingest a contract (parse, chunk, embed, store)
redquill ingest ./nda-acme-2025.pdf

# Ask questions — answers cite specific sections
redquill ask --query "What are the termination conditions?"
redquill ask --query "How is confidential information defined?"

Demo

Querying an ingested NDA from the command line — the answer cites specific contract sections.

dotnet run --project Redquill.Cli -- ask --query "What are the termination conditions?" --db ../redquill.db

Overview

Redquill is a .NET SDK and CLI tool for building applications that read and query legal contracts. It handles the RAG pipeline end-to-end — parsing PDFs, splitting them into clause-aligned chunks, embedding those chunks into vector space, storing them persistently, and retrieving the right context to ground LLM-generated answers.

The SDK is the product. The CLI is a thin wrapper that demonstrates what the SDK can do. Everything is consumable as a library via a single AddRedquill() registration call.

The Problem

LLMs are unreliable at contract analysis out of the box. A Stanford/Yale study found that LLMs hallucinated between 69–88% of the time on legal citation tasks. The CUAD benchmark documents how poorly general-purpose models perform on contract understanding without structured retrieval.

The root issues are structural. Contracts have numbered clauses, nested sub-sections, cross-references, and defined terms. Naive text chunking — splitting at every 500 tokens — tears these structures apart, causing the LLM to lose the context it needs to answer accurately. A chunk boundary in the middle of an indemnification clause, or a definition separated from the clause that references it, degrades retrieval quality significantly.

How Redquill Solves This

Redquill takes a legal-domain-specific approach to every stage of the RAG pipeline.

Clause-aware chunking. Instead of splitting on fixed token windows, Redquill splits at natural contract boundaries — section headings, numbered provisions, definition blocks. Each chunk maps to a logical unit of the contract. When the LLM receives context, it gets a complete clause about a single topic, not a fragment spanning two unrelated sections.

Structural detection. PdfPig's layout analysis identifies headings, sub-headings, definitions, signature blocks, and body text based on formatting heuristics (font size, position, weight). This structural map drives the chunker's boundary decisions and annotates every chunk with its section reference.

Grounded answers with mandatory citations. The query pipeline instructs the LLM to answer only from provided contract excerpts, cite specific section references for every claim, and explicitly state when the contract doesn't contain the requested information.

Persistent storage. Contracts are ingested once into a local SQLite database. Subsequent queries run against the stored embeddings without re-parsing or re-embedding — making interactive exploration practical even for long documents.

Pluggable architecture. Every component — parser, chunker, embedding service, vector store, chat service — is defined as an interface. Swap OpenAI for Azure OpenAI, or SQLite for Qdrant, by changing a single DI registration. No pipeline code changes.

Quick Start

Prerequisites

.NET 9.0 SDK
An OpenAI API key with access to text-embedding-3-small and gpt-4o-mini

CLI Usage

# Set your API key
export OPENAI_API_KEY="sk-..."

# Ingest a contract (persists to local SQLite database)
redquill ingest ./contract.pdf

# Ask questions — fast, no re-ingestion
redquill ask --query "What are the termination conditions?"
redquill ask --query "Who bears the indemnification obligations?"

# Inspect document structure without AI
redquill parse ./contract.pdf --show-blocks

SDK Usage

using Redquill.Core;
using Microsoft.Extensions.DependencyInjection;

// Register all services with one call
var services = new ServiceCollection();
services.AddRedquill(opts =>
{
    opts.ApiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
    opts.StoragePath = "./data/redquill.db"; // Persistent storage
});

var provider = services.BuildServiceProvider();

// Ingest a contract
var ingestion = provider.GetRequiredService<IIngestionPipeline>();
var result = await ingestion.IngestAsync("./contract.pdf");
// → "Ingested 30 chunks from 12 pages in 4.2s"

// Ask a question — grounded answer with citations
var query = provider.GetRequiredService<IQueryPipeline>();
var answer = await query.AskAsync("What is the referral fee?");

Console.WriteLine(answer.Answer);
foreach (var source in answer.SourceChunks)
{
    Console.WriteLine($"  [{source.Score:F2}] {source.Chunk.SectionReference}");
}

Swap Providers

// Use defaults, then override what you need
services.AddRedquill(opts => { opts.ApiKey = "..."; });

// Swap embedding provider
services.AddSingleton<IEmbeddingService, MyAzureEmbeddingService>();

// Swap vector store
services.AddSingleton<IVectorStore, MyQdrantVectorStore>();

Architecture

graph LR
    subgraph Ingestion["<b>Ingestion Pipeline</b> — redquill ingest"]
        direction LR
        PDF["PDF Contract"] --> PARSE["Parse<br/><i>PdfPig</i>"]
        PARSE --> CHUNK["Chunk<br/><i>Clause-Based</i>"]
        CHUNK --> EMBED_I["Embed<br/><i>OpenAI</i>"]
        EMBED_I --> STORE["Store"]
    end

    subgraph Query["<b>Query Pipeline</b> — redquill ask"]
        direction LR
        Q["Question"] --> EMBED_Q["Embed"]
        EMBED_Q --> SEARCH["Search<br/><i>Cosine Similarity</i>"]
        SEARCH --> GROUND["Ground<br/><i>Build Cited Prompt</i>"]
        GROUND --> LLM["LLM<br/><i>gpt-4o-mini</i>"]
        LLM --> ANSWER["Cited Answer"]
    end

    DB[("SQLite<br/>redquill.db")]
    STORE --> DB
    DB --> SEARCH

Every component is defined as an interface and wired via AddRedquill(). Swap the embedding provider, vector store, or chat model by changing a single DI registration — no pipeline code changes. See docs/architecture.mermaid for the full component diagram.

Project Structure

src/
  Redquill.Core/         # SDK library — all business logic
  Redquill.Cli/          # CLI tool — thin wrapper, no logic
tests/
  Redquill.Tests/        # xUnit, FluentAssertions, NSubstitute
specs/
  002-embed-and-query/   # Sprint 2 specification (complete)
  003-persist-and-search/# Sprint 3 specification (complete)

What Works Today

Redquill can currently parse a PDF contract, split it into clause-aligned chunks, embed those chunks via OpenAI, store them in a local SQLite database, and answer natural-language questions with section-cited responses. It has been tested against real-world contracts from the CUAD dataset (Contract Understanding Atticus Dataset).

It is not yet packaged for distribution (no NuGet package). You build it from source.

What's Next

Sprint	Status	Focus
1. Parse & Chunk	Complete	PDF parsing, clause-based chunking, structural detection
2. Embed & Query	Complete	Semantic Kernel integration, RAG pipeline, in-memory search
3. Persist & Search	Complete	SQLite persistence, `ingest` command, chunker quality fix
4. Package & Ship	Planned	NuGet package, CI/CD, documentation

Longer-term directions include hybrid search (combining vector similarity with keyword matching), DOCX support, multi-document cross-querying, and a web API host. These are tracked in the specs directory but are not yet scheduled.

Tech Stack

Redquill is built on .NET 9.0 with Microsoft Semantic Kernel for LLM orchestration, PdfPig for PDF text extraction with layout analysis, SQLite (via Microsoft.Data.Sqlite) for persistent vector storage, System.CommandLine for the CLI, and Spectre.Console for terminal output.

Contributing

This is an early-stage project with a solo maintainer. If you're interested in contributing, start by opening an issue to discuss your idea. The specs directory contains detailed feature specifications that explain the design decisions behind the current implementation.

License

Apache 2.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.claude		.claude
.vscode		.vscode
src		src
tests/Redquill.Tests		tests/Redquill.Tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
Redquill.sln		Redquill.sln
Screen_Recording.gif		Screen_Recording.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redquill

Demo

Overview

The Problem

How Redquill Solves This

Quick Start

Prerequisites

CLI Usage

SDK Usage

Swap Providers

Architecture

Project Structure

What Works Today

What's Next

Tech Stack

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Redquill

Demo

Overview

The Problem

How Redquill Solves This

Quick Start

Prerequisites

CLI Usage

SDK Usage

Swap Providers

Architecture

Project Structure

What Works Today

What's Next

Tech Stack

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages