Open-source .NET SDK for AI-powered contract analysis using Retrieval-Augmented Generation (RAG).
Alpha — under active development. APIs will change. Not for production use yet.
Redquill turns legal contracts into structured, searchable, AI-queryable data. Point it at a PDF, ask a question in plain English, and get a grounded answer that cites specific sections and pages.
# Ingest a contract (parse, chunk, embed, store)
redquill ingest ./nda-acme-2025.pdf
# Ask questions — answers cite specific sections
redquill ask --query "What are the termination conditions?"
redquill ask --query "How is confidential information defined?"Querying an ingested NDA from the command line — the answer cites specific contract sections.
dotnet run --project Redquill.Cli -- ask --query "What are the termination conditions?" --db ../redquill.dbRedquill is a .NET SDK and CLI tool for building applications that read and query legal contracts. It handles the RAG pipeline end-to-end — parsing PDFs, splitting them into clause-aligned chunks, embedding those chunks into vector space, storing them persistently, and retrieving the right context to ground LLM-generated answers.
The SDK is the product. The CLI is a thin wrapper that demonstrates what the SDK can do. Everything is consumable as a library via a single AddRedquill() registration call.
LLMs are unreliable at contract analysis out of the box. A Stanford/Yale study found that LLMs hallucinated between 69–88% of the time on legal citation tasks. The CUAD benchmark documents how poorly general-purpose models perform on contract understanding without structured retrieval.
The root issues are structural. Contracts have numbered clauses, nested sub-sections, cross-references, and defined terms. Naive text chunking — splitting at every 500 tokens — tears these structures apart, causing the LLM to lose the context it needs to answer accurately. A chunk boundary in the middle of an indemnification clause, or a definition separated from the clause that references it, degrades retrieval quality significantly.
Redquill takes a legal-domain-specific approach to every stage of the RAG pipeline.
Clause-aware chunking. Instead of splitting on fixed token windows, Redquill splits at natural contract boundaries — section headings, numbered provisions, definition blocks. Each chunk maps to a logical unit of the contract. When the LLM receives context, it gets a complete clause about a single topic, not a fragment spanning two unrelated sections.
Structural detection. PdfPig's layout analysis identifies headings, sub-headings, definitions, signature blocks, and body text based on formatting heuristics (font size, position, weight). This structural map drives the chunker's boundary decisions and annotates every chunk with its section reference.
Grounded answers with mandatory citations. The query pipeline instructs the LLM to answer only from provided contract excerpts, cite specific section references for every claim, and explicitly state when the contract doesn't contain the requested information.
Persistent storage. Contracts are ingested once into a local SQLite database. Subsequent queries run against the stored embeddings without re-parsing or re-embedding — making interactive exploration practical even for long documents.
Pluggable architecture. Every component — parser, chunker, embedding service, vector store, chat service — is defined as an interface. Swap OpenAI for Azure OpenAI, or SQLite for Qdrant, by changing a single DI registration. No pipeline code changes.
- .NET 9.0 SDK
- An OpenAI API key with access to
text-embedding-3-smallandgpt-4o-mini
# Set your API key
export OPENAI_API_KEY="sk-..."
# Ingest a contract (persists to local SQLite database)
redquill ingest ./contract.pdf
# Ask questions — fast, no re-ingestion
redquill ask --query "What are the termination conditions?"
redquill ask --query "Who bears the indemnification obligations?"
# Inspect document structure without AI
redquill parse ./contract.pdf --show-blocksusing Redquill.Core;
using Microsoft.Extensions.DependencyInjection;
// Register all services with one call
var services = new ServiceCollection();
services.AddRedquill(opts =>
{
opts.ApiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
opts.StoragePath = "./data/redquill.db"; // Persistent storage
});
var provider = services.BuildServiceProvider();
// Ingest a contract
var ingestion = provider.GetRequiredService<IIngestionPipeline>();
var result = await ingestion.IngestAsync("./contract.pdf");
// → "Ingested 30 chunks from 12 pages in 4.2s"
// Ask a question — grounded answer with citations
var query = provider.GetRequiredService<IQueryPipeline>();
var answer = await query.AskAsync("What is the referral fee?");
Console.WriteLine(answer.Answer);
foreach (var source in answer.SourceChunks)
{
Console.WriteLine($" [{source.Score:F2}] {source.Chunk.SectionReference}");
}// Use defaults, then override what you need
services.AddRedquill(opts => { opts.ApiKey = "..."; });
// Swap embedding provider
services.AddSingleton<IEmbeddingService, MyAzureEmbeddingService>();
// Swap vector store
services.AddSingleton<IVectorStore, MyQdrantVectorStore>();graph LR
subgraph Ingestion["<b>Ingestion Pipeline</b> — redquill ingest"]
direction LR
PDF["PDF Contract"] --> PARSE["Parse<br/><i>PdfPig</i>"]
PARSE --> CHUNK["Chunk<br/><i>Clause-Based</i>"]
CHUNK --> EMBED_I["Embed<br/><i>OpenAI</i>"]
EMBED_I --> STORE["Store"]
end
subgraph Query["<b>Query Pipeline</b> — redquill ask"]
direction LR
Q["Question"] --> EMBED_Q["Embed"]
EMBED_Q --> SEARCH["Search<br/><i>Cosine Similarity</i>"]
SEARCH --> GROUND["Ground<br/><i>Build Cited Prompt</i>"]
GROUND --> LLM["LLM<br/><i>gpt-4o-mini</i>"]
LLM --> ANSWER["Cited Answer"]
end
DB[("SQLite<br/>redquill.db")]
STORE --> DB
DB --> SEARCH
Every component is defined as an interface and wired via AddRedquill(). Swap the embedding provider, vector store, or chat model by changing a single DI registration — no pipeline code changes. See docs/architecture.mermaid for the full component diagram.
src/
Redquill.Core/ # SDK library — all business logic
Redquill.Cli/ # CLI tool — thin wrapper, no logic
tests/
Redquill.Tests/ # xUnit, FluentAssertions, NSubstitute
specs/
002-embed-and-query/ # Sprint 2 specification (complete)
003-persist-and-search/# Sprint 3 specification (complete)
Redquill can currently parse a PDF contract, split it into clause-aligned chunks, embed those chunks via OpenAI, store them in a local SQLite database, and answer natural-language questions with section-cited responses. It has been tested against real-world contracts from the CUAD dataset (Contract Understanding Atticus Dataset).
It is not yet packaged for distribution (no NuGet package). You build it from source.
| Sprint | Status | Focus |
|---|---|---|
| 1. Parse & Chunk | Complete | PDF parsing, clause-based chunking, structural detection |
| 2. Embed & Query | Complete | Semantic Kernel integration, RAG pipeline, in-memory search |
| 3. Persist & Search | Complete | SQLite persistence, ingest command, chunker quality fix |
| 4. Package & Ship | Planned | NuGet package, CI/CD, documentation |
Longer-term directions include hybrid search (combining vector similarity with keyword matching), DOCX support, multi-document cross-querying, and a web API host. These are tracked in the specs directory but are not yet scheduled.
Redquill is built on .NET 9.0 with Microsoft Semantic Kernel for LLM orchestration, PdfPig for PDF text extraction with layout analysis, SQLite (via Microsoft.Data.Sqlite) for persistent vector storage, System.CommandLine for the CLI, and Spectre.Console for terminal output.
This is an early-stage project with a solo maintainer. If you're interested in contributing, start by opening an issue to discuss your idea. The specs directory contains detailed feature specifications that explain the design decisions behind the current implementation.
Apache 2.0 — see LICENSE for details.
