A high-performance Go implementation of the llms.txt generator that uses Firecrawl to map and scrape websites, and any LLM providers (Currently only supports OpenAI and Anthropic) to generate concise titles and descriptions for creating structured llms.txt files.
Important
This project is in the alpha stage.
Flags, configuration, behavior, and design may change significantly.
- Overview
- Features
- What is llms.txt?
- Installation
- Prerequisites
- Quick Start
- Configuration
- Usage Examples
- Output Format
- Performance
- API Documentation
- Troubleshooting
- Contributing
- Acknowledgments
- License
llmstxt-generator
is a command-line tool that automatically generates llms.txt
and llms-full.txt
files from any website.
It intelligently crawls websites, extracts content, and uses AI to create meaningful summaries that help LLMs understand and navigate your site's structure.
- Automated Discovery: Automatically maps your entire website structure
- AI-Powered Summaries: Uses OpenAI or Anthropic to generate concise, meaningful descriptions
- Performance Optimized: Concurrent processing with configurable batching and rate limiting
- Flexible Output: Generates both summary (
llms.txt
) and full content (llms-full.txt
) versions
- 🚀 High-Performance Concurrent Processing: Process multiple URLs simultaneously with configurable worker pools
- 🤖 Multiple AI Model Support: Compatible with GPT-4, Claude Opus, and other OpenAI and Anthropic models
- 📊 Intelligent Batching: Process URLs in configurable batches with automatic rate limiting
- 🔧 Highly Configurable: Extensive CLI flags and environment variable support
- 📝 Dual Output Formats: Generate both concise summaries and full-text versions
- 🛡️ Robust Error Handling: Graceful failure recovery and comprehensive error reporting
- 🔍 Smart Content Extraction: Focuses on main content while filtering out navigation and boilerplate
- ⏱️ Timeout Management: Configurable timeouts for reliable processing of large sites
- 📈 Progress Tracking: Real-time progress updates with detailed logging options
The llms.txt
format is a structured way to help Large Language Models (LLMs) understand and navigate websites more effectively. It provides:
- llms.txt: A concise index with titles, URLs, and brief descriptions
- llms-full.txt: Complete content from all pages for comprehensive context
This standardized format enables LLMs to quickly understand site structure, find relevant information, and provide better assistance to users asking about your website.
- The /llms.txt file – llms-txt
- Official llms.txt specification
- AnswerDotAI/llms-txt: The /llms.txt file, helping language models use your website
# Requires Go 1.24 or higher
go install github.com/zchee/llmstxt-generator@latest
git clone https://github.com/zchee/llmstxt-generator.git
cd llmstxt-generator
go build -o llmstxt-generator
Before using llmstxt-generator, you'll need:
- Firecrawl API Key: Sign up at firecrawl.dev to get your API key
- OpenAI API Key: OpenAI Platform or Anthropic API Key: anthropic Console
- Go 1.24+: Required if building from source
Set your API keys as environment variables:
export FIRECRAWL_API_KEY="your-firecrawl-api-key"
export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
Or pass them directly via command-line flags.
Generate llms.txt files for a website:
# Basic usage
llmstxt-generator https://example.com
# With custom output directory
llmstxt-generator https://example.com --output-dir ./output
# Process more URLs with higher concurrency
llmstxt-generator https://example.com --max-urls 100 --max-workers 10
# Use a specific OpenAI model
llmstxt-generator https://example.com --model gpt-4-turbo-preview
Flag | Description | Default |
---|---|---|
--model |
OpenAI model for generating summaries | gpt-4.1-mini , claude-opus-4-1 |
--max-urls |
Maximum number of URLs to process | 20 |
--output-dir |
Directory to save output files | . (current) |
--firecrawl-api-key |
Firecrawl API key | $FIRECRAWL_API_KEY |
--api-key |
OpenAI or Anthropic API key | $OPENAI_API_KEY or $ANTHROPIC_API_KEY |
--no-full-text |
Skip generating llms-full.txt | false |
--verbose |
Enable verbose logging | false |
--batch-size |
Number of URLs per batch | 10 |
--max-workers |
Maximum concurrent workers | 5 |
--batch-delay |
Delay between batches | 1s |
--timeout |
Timeout for URL processing | 30s |
--max-content-length |
Max content length for OpenAI | 4000 |
FIRECRAWL_API_KEY
: Your Firecrawl API keyOPENAI_API_KEY
: Your OpenAI API keyANTHROPIC_API_KEY
: Your Anthropic API key
# Generate files for a simple website
llmstxt-generator https://myblog.com
# Process up to 500 URLs with increased concurrency
llmstxt-generator https://docs.example.com \
--max-urls 500 \
--max-workers 20 \
--batch-size 50 \
--output-dir ./documentation \
--verbose
# Production settings with timeouts and rate limiting
llmstxt-generator https://enterprise.example.com \
--model gpt-4-turbo-preview \
--max-urls 1000 \
--max-workers 10 \
--batch-size 25 \
--batch-delay 2s \
--timeout 45s \
--max-content-length 8000 \
--output-dir /var/www/llms-files \
--verbose
# https://example.com llms.txt
- [Homepage](https://example.com): Welcome to Example.com - Your trusted source for examples
- [About Us](https://example.com/about): Learn about our mission, team, and company history
- [Products](https://example.com/products): Browse our complete catalog of innovative products
- [Contact](https://example.com/contact): Get in touch with our support team today
# https://example.com llms-full.txt
<|firecrawl-page-1-lllmstxt|>
## Homepage
Welcome to Example.com! We are the leading provider of example services...
[Full page content]
<|firecrawl-page-2-lllmstxt|>
## About Us
Founded in 2020, Example.com has grown to become...
[Full page content]
- CLI Layer (
cmd/
): Handles command-line parsing and user interaction - Configuration (
config/
): Manages settings, validation, and defaults - Generator (
generator/
): Core business logic for content generation - API Clients (
gollm/
): Abstracted interfaces for OpenAI and Anthropic services
- Concurrent Processing: Utilizes Go's goroutines for parallel URL processing
- Intelligent Batching: Reduces API overhead by processing URLs in batches
- Rate Limiting: Prevents API throttling with configurable delays
- Memory Efficiency: Pre-allocated buffers and efficient string building
- Context Cancellation: Proper cleanup and resource management
Processing performance varies based on website size and API response times:
- Small sites (< 50 pages): ~1-2 minutes
- Medium sites (50-200 pages): ~5-10 minutes
- Large sites (200-1000 pages): ~15-30 minutes
Note: Actual performance depends on API rate limits and network conditions
The main generator provides a simple API for programmatic use:
package main
import (
"github.com/zchee/llmstxt-generator/generator"
)
func main() {
// Create firecrawlClient, openaiClient and options...
// .
// .
// .
// Create a new generator
gen := generator.NewLLMsTxtGenerator(
firecrawlClient,
openaiClient,
options,
)
// Generate llms.txt files
result, err := gen.GenerateLLMsTXT(ctx, "https://example.com")
if err != nil {
log.Fatal(err)
}
// Access generated content
fmt.Println(result.LLMsTxt)
fmt.Println(result.LLMsFullTxt)
}
Error: Firecrawl API key not provided
Ensure your API keys are set correctly:
export FIRECRAWL_API_KEY="your-key"
export OPENAI_API_KEY="your-key"
Error: API rate limit exceeded
Increase batch delay or reduce worker count:
llmstxt-generator https://example.com --batch-delay 5s --max-workers 3
Error: Context deadline exceeded
Increase timeout duration:
llmstxt-generator https://example.com --timeout 60s
For very large sites, consider:
- Processing in smaller batches with
--max-urls
- Reducing concurrent workers with
--max-workers
- Increasing
--max-content-length
for better summaries
Enable verbose logging for detailed troubleshooting:
llmstxt-generator https://example.com --verbose
We welcome contributions! Please follow these guidelines:
- Fork the repository and create your feature branch
- Write tests for new functionality
- Follow Go conventions and run
go fmt
- Update documentation for user-facing changes
- Submit a pull request with a clear description
# Clone the repository
git clone https://github.com/zchee/llmstxt-generator.git
cd llmstxt-generator
# Install dependencies
go mod download
# Run tests
go test ./...
# Build and run locally
go build -o llmstxt-generator
./llmstxt-generator https://example.com
- mendableai/create-llmstxt-py
- This library is a Go port of this repository by Firecrawl by Mendable. Special thanks to the original author for creating such a useful tool.
- The Go community for excellent libraries and tools
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.