llmstxt-generator

A high-performance Go implementation of the llms.txt generator that uses Firecrawl to map and scrape websites, and any LLM providers (Currently only supports OpenAI and Anthropic) to generate concise titles and descriptions for creating structured llms.txt files.

Important

This project is in the alpha stage.

Flags, configuration, behavior, and design may change significantly.

Overview

llmstxt-generator is a command-line tool that automatically generates llms.txt and llms-full.txt files from any website.

It intelligently crawls websites, extracts content, and uses AI to create meaningful summaries that help LLMs understand and navigate your site's structure.

Key Benefits

Automated Discovery: Automatically maps your entire website structure
AI-Powered Summaries: Uses OpenAI or Anthropic to generate concise, meaningful descriptions
Performance Optimized: Concurrent processing with configurable batching and rate limiting
Flexible Output: Generates both summary (llms.txt) and full content (llms-full.txt) versions

Features

🚀 High-Performance Concurrent Processing: Process multiple URLs simultaneously with configurable worker pools
🤖 Multiple AI Model Support: Compatible with GPT-4, Claude Opus, and other OpenAI and Anthropic models
📊 Intelligent Batching: Process URLs in configurable batches with automatic rate limiting
🔧 Highly Configurable: Extensive CLI flags and environment variable support
📝 Dual Output Formats: Generate both concise summaries and full-text versions
🛡️ Robust Error Handling: Graceful failure recovery and comprehensive error reporting
🔍 Smart Content Extraction: Focuses on main content while filtering out navigation and boilerplate
⏱️ Timeout Management: Configurable timeouts for reliable processing of large sites
📈 Progress Tracking: Real-time progress updates with detailed logging options

What is llms.txt?

The llms.txt format is a structured way to help Large Language Models (LLMs) understand and navigate websites more effectively. It provides:

llms.txt: A concise index with titles, URLs, and brief descriptions
llms-full.txt: Complete content from all pages for comprehensive context

This standardized format enables LLMs to quickly understand site structure, find relevant information, and provide better assistance to users asking about your website.

The /llms.txt file – llms-txt
- Official llms.txt specification
- AnswerDotAI/llms-txt: The /llms.txt file, helping language models use your website

Installation

From Source

# Requires Go 1.24 or higher
go install github.com/zchee/llmstxt-generator@latest

Build from Repository

git clone https://github.com/zchee/llmstxt-generator.git
cd llmstxt-generator
go build -o llmstxt-generator

Prerequisites

Before using llmstxt-generator, you'll need:

Firecrawl API Key: Sign up at firecrawl.dev to get your API key
OpenAI API Key: OpenAI Platform or Anthropic API Key: anthropic Console
Go 1.24+: Required if building from source

Setting up API Keys

Set your API keys as environment variables:

export FIRECRAWL_API_KEY="your-firecrawl-api-key"
export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"

Or pass them directly via command-line flags.

Quick Start

Generate llms.txt files for a website:

# Basic usage
llmstxt-generator https://example.com

# With custom output directory
llmstxt-generator https://example.com --output-dir ./output

# Process more URLs with higher concurrency
llmstxt-generator https://example.com --max-urls 100 --max-workers 10

# Use a specific OpenAI model
llmstxt-generator https://example.com --model gpt-4-turbo-preview

Configuration

Command-Line Flags

Flag	Description	Default
`--model`	OpenAI model for generating summaries	`gpt-4.1-mini`, `claude-opus-4-1`
`--max-urls`	Maximum number of URLs to process	`20`
`--output-dir`	Directory to save output files	`.` (current)
`--firecrawl-api-key`	Firecrawl API key	`$FIRECRAWL_API_KEY`
`--api-key`	OpenAI or Anthropic API key	`$OPENAI_API_KEY` or `$ANTHROPIC_API_KEY`
`--no-full-text`	Skip generating llms-full.txt	`false`
`--verbose`	Enable verbose logging	`false`
`--batch-size`	Number of URLs per batch	`10`
`--max-workers`	Maximum concurrent workers	`5`
`--batch-delay`	Delay between batches	`1s`
`--timeout`	Timeout for URL processing	`30s`
`--max-content-length`	Max content length for OpenAI	`4000`

Environment Variables

FIRECRAWL_API_KEY: Your Firecrawl API key
OPENAI_API_KEY: Your OpenAI API key
ANTHROPIC_API_KEY: Your Anthropic API key

Usage Examples

Basic Website Processing

# Generate files for a simple website
llmstxt-generator https://myblog.com

Large Website with Custom Settings

# Process up to 500 URLs with increased concurrency
llmstxt-generator https://docs.example.com \
  --max-urls 500 \
  --max-workers 20 \
  --batch-size 50 \
  --output-dir ./documentation \
  --verbose

Production Deployment

# Production settings with timeouts and rate limiting
llmstxt-generator https://enterprise.example.com \
  --model gpt-4-turbo-preview \
  --max-urls 1000 \
  --max-workers 10 \
  --batch-size 25 \
  --batch-delay 2s \
  --timeout 45s \
  --max-content-length 8000 \
  --output-dir /var/www/llms-files \
  --verbose

Output Format

llms.txt Example

# https://example.com llms.txt

- [Homepage](https://example.com): Welcome to Example.com - Your trusted source for examples
- [About Us](https://example.com/about): Learn about our mission, team, and company history
- [Products](https://example.com/products): Browse our complete catalog of innovative products
- [Contact](https://example.com/contact): Get in touch with our support team today

llms-full.txt Example

# https://example.com llms-full.txt

<|firecrawl-page-1-lllmstxt|>
## Homepage
Welcome to Example.com! We are the leading provider of example services...
[Full page content]

<|firecrawl-page-2-lllmstxt|>
## About Us
Founded in 2020, Example.com has grown to become...
[Full page content]

Key Components

CLI Layer (cmd/): Handles command-line parsing and user interaction
Configuration (config/): Manages settings, validation, and defaults
Generator (generator/): Core business logic for content generation
API Clients (gollm/): Abstracted interfaces for OpenAI and Anthropic services

Performance

Optimization Strategies

Concurrent Processing: Utilizes Go's goroutines for parallel URL processing
Intelligent Batching: Reduces API overhead by processing URLs in batches
Rate Limiting: Prevents API throttling with configurable delays
Memory Efficiency: Pre-allocated buffers and efficient string building
Context Cancellation: Proper cleanup and resource management

Benchmarks

Processing performance varies based on website size and API response times:

Small sites (< 50 pages): ~1-2 minutes
Medium sites (50-200 pages): ~5-10 minutes
Large sites (200-1000 pages): ~15-30 minutes

Note: Actual performance depends on API rate limits and network conditions

API Documentation

Generator Package

The main generator provides a simple API for programmatic use:

package main

import (
	"github.com/zchee/llmstxt-generator/generator"
)

func main() {
    // Create firecrawlClient, openaiClient and options...
    // .
    // .
    // .
	// Create a new generator
	gen := generator.NewLLMsTxtGenerator(
		firecrawlClient,
		openaiClient,
		options,
	)
	
	// Generate llms.txt files
	result, err := gen.GenerateLLMsTXT(ctx, "https://example.com")
	if err != nil {
		log.Fatal(err)
	}
	
	// Access generated content
	fmt.Println(result.LLMsTxt)
	fmt.Println(result.LLMsFullTxt)
}

Troubleshooting

Common Issues

API Key Errors

Error: Firecrawl API key not provided

Solution

Ensure your API keys are set correctly:

export FIRECRAWL_API_KEY="your-key"
export OPENAI_API_KEY="your-key"

Rate Limiting

Error: API rate limit exceeded

Solution

Increase batch delay or reduce worker count:

llmstxt-generator https://example.com --batch-delay 5s --max-workers 3

Timeout Errors

Error: Context deadline exceeded

Solution

Increase timeout duration:

llmstxt-generator https://example.com --timeout 60s

Memory Issues

For very large sites, consider:

Processing in smaller batches with --max-urls
Reducing concurrent workers with --max-workers
Increasing --max-content-length for better summaries

Debug Mode

Enable verbose logging for detailed troubleshooting:

llmstxt-generator https://example.com --verbose

Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository and create your feature branch
Write tests for new functionality
Follow Go conventions and run go fmt
Update documentation for user-facing changes
Submit a pull request with a clear description

Development Setup

# Clone the repository
git clone https://github.com/zchee/llmstxt-generator.git
cd llmstxt-generator

# Install dependencies
go mod download

# Run tests
go test ./...

# Build and run locally
go build -o llmstxt-generator
./llmstxt-generator https://example.com

Acknowledgments

mendableai/create-llmstxt-py
- This library is a Go port of this repository by Firecrawl by Mendable. Special thanks to the original author for creating such a useful tool.
The Go community for excellent libraries and tools

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
cmd		cmd
config		config
examples		examples
generator		generator
gollm		gollm
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Uh oh!

License

zchee/llmstxt-generator

Folders and files

Latest commit

History

Repository files navigation

llmstxt-generator

Table of Contents

Overview

Key Benefits

Features

What is llms.txt?

Installation

From Source

Build from Repository

Prerequisites

Setting up API Keys

Quick Start

Configuration

Command-Line Flags

Environment Variables

Usage Examples

Basic Website Processing

Large Website with Custom Settings

Production Deployment

Output Format

llms.txt Example

llms-full.txt Example

Key Components

Performance

Optimization Strategies

Benchmarks

API Documentation

Generator Package

Troubleshooting

Common Issues

API Key Errors

Solution

Rate Limiting

Solution

Timeout Errors

Solution

Memory Issues

Debug Mode

Contributing

Development Setup

Acknowledgments

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages