Discourse llms.txt Generator

Automatically generates llms.txt files for LLM optimization (GEO) on Discourse forums

📚 About llms.txt

This project implements the llms.txt standard for Discourse forums, making your community content discoverable and accessible to Large Language Models (LLMs) and AI systems.

What is llms.txt?

The llms.txt standard is a proposed convention (September 2024, Jeremy Howard from Answer.AI) for providing LLM-friendly content from websites. Think of it as "robots.txt for AI" - a standardized way for websites to expose their content structure to AI systems.

Thousands of sites—including many of the world’s largest and most respected tech companies—have already implemented the llms.txt standard on their own domains. Examples include:

✅ Amazon AWS — https://docs.aws.amazon.com/llms.txt
✅ Cloudflare — https://developers.cloudflare.com/llms.txt
✅ Stripe — https://stripe.com/llms.txt
✅ Angular — https://angular.dev/llms.txt
✅ Redis — https://redis.io/llms.txt
✅ Docker — https://docs.docker.com/llms.txt
✅ Model Context Protocol — https://modelcontextprotocol.io/llms-full.txt

When industry giants adopt a standard at scale—long before it becomes “official”—it’s a clear signal that llms.txt solves a real and urgent problem. Such companies never roll out sitewide initiatives lightly; they always have a solid strategic reason. The rapid, large-scale embrace of llms.txt across the tech industry shows just how important structured content for AI has become, and that the industry itself is driving this adoption forward—even faster than formal standards bodies.

Learn More:

📋 Table of Contents

🎯 What This Does

This Discourse plugin automatically generates LLM-friendly documentation files for your forum:

1. Main Navigation File (for AI discovery)

/llms.txt - A structured overview helping LLMs understand your forum's organization, categories, and latest discussions.

2. Full Content Index (for AI training)

/llms-full.txt - Complete forum index with all topics, categorized and ready for LLM consumption.

3. Dynamic Resource Files (for targeted content)

Generate llms.txt for any category, topic, or tag on-demand:

/c/category-name/123/llms.txt - All topics in a category
/t/topic-slug/456/llms.txt - Complete topic with all posts
/tag/tutorial/llms.txt - All topics with specific tag

4. Sitemap Index (for crawler discovery)

/sitemaps.txt - Complete list of all llms.txt URLs for efficient AI crawler indexing.

The Result: Your forum content becomes discoverable by ChatGPT, Claude, and other AI systems, improving GEO (Generative Engine Optimization) and increasing visibility in AI-generated responses.

💡 Why You Need This

The Problem Without This

Before:

AI systems can't efficiently understand your forum structure
LLMs parse HTML pages (slow, inefficient, error-prone)
Your valuable community knowledge stays hidden from AI
AI chatbots can't cite or reference your discussions
No control over how AI systems access your content

The Solution With This

After:

✅ Clean, structured, LLM-friendly content format
✅ AI systems understand your forum organization instantly
✅ Your content appears in ChatGPT, Claude, and other AI responses
✅ Control what AI systems see (bot blocking, content filtering)
✅ Better GEO (Generative Engine Optimization) for AI discovery

Real-World Impact

Before (without llms.txt):

User: "How do I install XYZ on Ubuntu?"
AI: "I don't have specific information about XYZ installation..."

After (with llms.txt):

User: "How do I install XYZ on Ubuntu?"
AI: "According to the XYZ Forum, here are the installation steps:
     [Detailed answer from your forum]
     Source: https://your-forum.com/t/ubuntu-install/123"

Your forum gets:

🎯 Increased visibility in AI responses
🔗 Direct attribution and backlinks
📈 More traffic from AI-powered search
🌟 Recognition as authoritative source

📦 Installation

Quick Install (5 minutes)

Step 1: Add plugin to Discourse

For Docker installations (recommended), edit containers/app.yml:

hooks:
  after_code:
    - exec:
        cd: $home/plugins
        cmd:
          - git clone https://github.com/kaktaknet/discourse-llms-txt-generator.git

Step 2: Rebuild container

cd /var/discourse
./launcher rebuild app

Step 3: Verify installation

After rebuild completes (~5 minutes), check:

curl https://your-forum.com/llms.txt

Done! You should see your forum's llms.txt navigation file.

Manual Installation (Alternative)

For non-Docker or development installations:

cd /var/www/discourse/plugins
git clone https://github.com/kaktaknet/discourse-llms-txt-generator.git
cd /var/www/discourse
bundle exec rake plugin:install

Restart Discourse:

systemctl restart discourse

🌟 Key Features

Feature 1: Automatic Generation

What it does: Dynamically generates llms.txt files on-demand without pre-generation or manual updates. Files are created in real-time when requested.

When to use: Always enabled - files appear automatically after installation.

Example:

GET /llms.txt → Generated instantly with current forum state
GET /c/support/2/llms.txt → Category-specific file created on-demand

Benefits:

✅ No maintenance required
✅ Always up-to-date
✅ Zero storage overhead

Feature 2: Dynamic Per-Resource llms.txt

What it does: Creates virtual llms.txt files for any category, topic, or tag in your forum without physically storing them.

When to use:

AI needs specific category content
Developers want targeted topic information
Crawlers request granular data

Example:

# Request category llms.txt
GET /c/support/2/llms.txt

# Returns:
# Support Category
> Category: My Forum

Get help with installation and troubleshooting.

## Topics
- [How to install on Ubuntu](url) (1523 views)
- [Common errors](url) (892 views)

Benefits:

✅ Granular content control
✅ Faster AI indexing
✅ Better topic discovery
✅ Reduced bandwidth

Feature 3: Smart Caching

What it does: Intelligent hourly cache that only regenerates when new content is created, not on every request.

When to use: Automatic - runs in background every hour.

Example:

Hour 1: New topic created → Cache regenerated
Hour 2: No new content → Cache reused (saves resources)
Hour 3: Post edited → Cache regenerated

Benefits:

✅ Faster response times (<50ms vs 1-2 seconds)
✅ Reduced server load
✅ Content stays fresh (max 1 hour old)

Feature 4: Bot Control

What it does: Block specific AI crawler bots from accessing llms.txt files while allowing forum access.

When to use:

Block low-quality AI scrapers
Control which AI services use your content
Reduce bandwidth from aggressive crawlers

Example:

# Configuration
llms_txt_blocked_user_agents: "Omgilibot, ChatGPT-User"

# Generates in robots.txt:
User-agent: Omgilibot
Disallow: /llms.txt
Disallow: /llms-full.txt

Benefits:

✅ Quality control over AI attribution
✅ Bandwidth reduction
✅ Selective AI access

Feature 5: SEO Integration

What it does: Automatically integrates with robots.txt and sitemap.xml, includes canonical URLs to prevent duplicate content penalties.

When to use: Always active - automatic SEO protection.

Example:

# HTTP Response
Link: <https://forum.com/t/topic/123>; rel="canonical"

# Content Footer
**Canonical:** https://forum.com/t/topic/123
**Original content:** https://forum.com/t/topic/123

Benefits:

✅ No SEO penalties
✅ Proper attribution
✅ Search engines index canonical URLs
✅ Complies with RFC 6596

Generated Files

`/llms.txt` - Navigation File (Lightweight)

Provides a structured overview of your forum:

Site metadata and description
Introduction text
Categories with Subcategories - Hierarchical tree structure
Latest 50 Topics - Most recent discussions (configurable)
Links to additional resources
Link to full content file

Example structure:

# My Forum
> Forum description

Introduction text...

## Categories and Subcategories
### [General Discussion](link)
Description of category

- [Help & Support](link): Support subcategory
- [Feature Requests](link): Requests subcategory

## Latest Topics
- [Topic Title](link) - Category Name (2025-11-09)
- [Another Topic](link) - Another Category (2025-11-08)
...

`/llms-full.txt` - Full Content File

Contains complete forum index in simplified format:

Custom forum description (optional, appears first)
Categories and subcategories with detailed descriptions
Topic list in format: **Category** - [Title](link)
Optional post excerpts (up to 500 characters) - disabled by default

Why simplified format? LLMs can follow links to read full topic content when needed. This approach:

Reduces file size significantly
Speeds up generation (no post processing)
Prevents overwhelming LLMs with too much text
Allows LLMs to selectively read topics of interest

Example format without excerpts (default):

**[General Discussion](url)** - [Welcome to our community](url)
**[Help & Support](url)** - [How to install on Ubuntu](url)

Example format with excerpts enabled:

**[General Discussion](url)** - [Welcome to our community](url)
  > Welcome everyone! This is a place where you can introduce yourself...

**[Help & Support](url)** - [How to install on Ubuntu](url)
  > This guide will walk you through installing our software...

`/sitemaps.txt` - Index of All llms.txt Files

Contains a complete list of all available llms.txt URLs:

https://forum.com/llms.txt
https://forum.com/llms-full.txt
https://forum.com/c/general/1/llms.txt
https://forum.com/c/support/2/llms.txt
https://forum.com/t/welcome-post/123/llms.txt
https://forum.com/t/installation-guide/456/llms.txt
https://forum.com/tag/announcement/llms.txt
...

Purpose:

Helps AI crawlers discover all llms.txt resources
Listed in robots.txt as Sitemap: directive
Automatically updated when content changes
Respects same privacy and blocking rules

Dynamic llms.txt Files

One of the most powerful features: virtual llms.txt files for any resource.

How It Works

These files don't physically exist on your server - they're generated on-demand when requested:

URL Pattern	Description	Example
`/c/{category-slug}/{id}/llms.txt`	Category with all its topics	`/c/general-discussion/1/llms.txt`
`/c/{parent}/{child}/{id}/llms.txt`	Subcategory	`/c/support/installation/15/llms.txt`
`/t/{topic-slug}/{id}/llms.txt`	Complete topic with all posts	`/t/how-to-install/123/llms.txt`
`/tag/{tag-name}/llms.txt`	All topics with specific tag	`/tag/tutorial/llms.txt`

Example: Category llms.txt

Request: https://forum.com/c/support/2/llms.txt

Generates:

# Support
> Category: My Forum

Get help with installation, configuration, and troubleshooting.

**Category URL:** https://forum.com/c/support/2
**Canonical:** https://forum.com/c/support/2
**Original content:** https://forum.com/c/support/2

## Subcategories

- [Installation Help](https://forum.com/c/install/10): Installation guides and issues
- [Configuration](https://forum.com/c/config/11): Configuration questions

## Topics

- [How to install on Ubuntu](https://forum.com/t/ubuntu-install/456) (1523 views, 45 replies)
- [Common installation errors](https://forum.com/t/install-errors/457) (892 views, 23 replies)
...

Example: Topic llms.txt

Request: https://forum.com/t/installation-guide/456/llms.txt

Generates:

# Complete Installation Guide

**Category:** [Support](https://forum.com/c/support/2)
**Created:** 2025-11-09 10:30 UTC
**Views:** 1523
**Replies:** 12
**URL:** https://forum.com/t/installation-guide/456
**Canonical:** https://forum.com/t/installation-guide/456
**Original content:** https://forum.com/t/installation-guide/456

---

## Post #1 by @admin

This guide will walk you through installing the software...

[Full post content in Markdown]

---

## Post #2 by @user123

Thanks for this guide! I had an issue with...

[Full post content]

---

...

Example: Tag llms.txt

Request: https://forum.com/tag/tutorial/llms.txt

Generates:

# Tag: tutorial
> My Forum

**Tag URL:** https://forum.com/tag/tutorial
**Canonical:** https://forum.com/tag/tutorial
**Original content:** https://forum.com/tag/tutorial

## Topics with this tag

- [Getting Started Tutorial](https://forum.com/t/getting-started/123) - General (450 views)
- [Advanced Configuration Tutorial](https://forum.com/t/advanced-config/124) - Configuration (320 views)
...

Why This Is Powerful

For AI Crawlers:

Can request specific content without parsing entire forum
Get exactly the context they need
Reduce bandwidth usage
Faster, more targeted indexing

For LLM Understanding:

Deep dive into specific discussions
Get full context of conversations
Access individual topic threads
Better comprehension of specific subjects

For Your SEO:

Every resource has its own llms.txt
Granular control over what AI systems see
Better topic-level discovery
Improved GEO (Generative Engine Optimization)

Performance Notes

No physical files created - all generated on-demand
No pre-generation needed - created only when requested
Smart caching - sitemaps.txt is cached
Permission-aware - respects Discourse visibility rules
404 for private content - hidden topics/categories return 404

⚙️ Configuration

Navigate to Admin → Settings → Plugins → discourse-llms-txt-generator

Main Settings

Setting	Default	Description
`llms_txt_enabled`	`true`	Enable/disable the plugin
`llms_txt_allow_indexing`	`true`	Allow AI crawlers (affects robots.txt)
`llms_txt_blocked_user_agents`	`""`	Comma-separated bot names to block
`llms_txt_intro_text`	Custom text	Introduction for llms.txt file
`llms_txt_full_description`	`""`	Custom description for llms-full.txt

Content Settings

Setting	Default	Description
`llms_txt_min_views`	`50`	Minimum topic views for inclusion in llms-full.txt
`llms_txt_posts_limit`	`medium`	Topics count (500/2500/5000/all)
`llms_txt_include_excerpts`	`false`	Include post excerpts in llms-full.txt
`llms_txt_post_excerpt_length`	`500`	Maximum excerpt length in characters (100-5000)
`llms_txt_latest_topics_count`	`50`	Latest topics count (max 50 recommended)

⚠️ WARNING: Enabling llms_txt_include_excerpts with llms_txt_posts_limit set to "all" may cause:

Extremely large file sizes (potentially 10-100+ MB)
High server load during generation
Long generation times (30+ seconds)
Memory issues on large forums

Recommended: Only enable excerpts with small or medium post limits. If you have a large forum (10,000+ topics), keep excerpts disabled or use small limit.

Performance Settings

Setting	Default	Description
`llms_txt_cache_minutes`	`60`	Cache duration for navigation file

🔬 Advanced Topics

SEO & Canonical URLs

We've taken care of search engine duplicate content concerns:

Every dynamic llms.txt file includes canonical URL information in two ways:

1. HTTP Link Header:

Link: <https://forum.com/t/topic-slug/123>; rel="canonical"

The server automatically sends a Link header with rel="canonical" pointing to the original forum resource URL. Search engines and AI crawlers recognize this standard header and understand that:

The llms.txt file is derivative/supplementary content
The canonical (original) content is at the forum URL
They should attribute content to the forum URL, not the llms.txt URL

2. Content Footer:

**Canonical:** https://forum.com/t/topic-slug/123
**Original content:** https://forum.com/t/topic-slug/123

At the bottom of each dynamic llms.txt file, we explicitly state the canonical URL and original content location. This helps:

AI systems understand content provenance
Search engines avoid duplicate content penalties
Users and developers see the source of truth

Benefits:

✅ No SEO penalty for duplicate content
✅ Proper attribution to your forum
✅ Search engines index the canonical URL
✅ AI systems link back to original content
✅ Complies with web standards (RFC 6596)

Bot Control & Blocking

What it is: A setting that allows you to block specific AI crawler bots from accessing your llms.txt and llms-full.txt files, while still allowing them to access your main forum.

Why would you want to block bots?

Quality Control: Some AI bots provide poor attribution or misrepresent content
Competitive Reasons: You might not want certain AI services using your content
Bandwidth: Reduce load from aggressive crawlers
Testing: Block bots during testing phase before opening to all

How it works:

The plugin automatically generates robots.txt rules for each blocked bot:

# LLM Documentation Files
Allow: /llms.txt
Allow: /llms-full.txt
Allow: /sitemaps.txt
Allow: /c/*/llms.txt
Allow: /t/*/llms.txt
Allow: /tag/*/llms.txt

Sitemap: https://your-forum.com/sitemaps.txt

# Blocked bots for llms.txt files
User-agent: Omgilibot
Disallow: /llms.txt
Disallow: /llms-full.txt
Disallow: /sitemaps.txt
Disallow: /c/*/llms.txt
Disallow: /t/*/llms.txt
Disallow: /tag/*/llms.txt

User-agent: ChatGPT-User
Disallow: /llms.txt
Disallow: /llms-full.txt
Disallow: /sitemaps.txt
Disallow: /c/*/llms.txt
Disallow: /t/*/llms.txt
Disallow: /tag/*/llms.txt

Important: These bots can still crawl and index your main forum content - they're only blocked from the llms.txt files.

How to configure:

Navigate to: Admin → Settings → Plugins → discourse-llms-txt-generator
Find: llms_txt_blocked_user_agents
Enter bot names separated by commas: Omgilibot, ChatGPT-User, AnotherBot
Save settings
Check your /robots.txt to verify rules were added

Common bots you might want to block:

Omgilibot - Omgili web crawler
ChatGPT-User - OpenAI's ChatGPT crawler (if you prefer API access only)
CCBot - Common Crawl bot
anthropic-ai - Anthropic's crawler
Google-Extended - Google's AI training crawler

Note: Bot blocking is advisory - well-behaved bots will respect robots.txt, but malicious crawlers might ignore it.

Smart Cache Management

How it works:

The plugin uses intelligent caching to balance performance and freshness:

Automatic Hourly Checks

Every hour, a background job runs and checks:

Was there new content?
- Checks if any topics were created since last update
- Checks if any categories were updated since last update
If YES → Update cache:
- Clears old cached navigation
- Regenerates llms.txt with new content
- Updates timestamp
- Logs: [llms.txt] Updating cache due to new content
If NO → Skip update:
- Keeps existing cache
- Saves server resources
- Logs: [llms.txt] No new content, skipping cache update

Manual Cache Clear

Cache is also cleared immediately when:

A new post is created
A post is edited
Settings are changed

Why This Matters

Old approach (every request):

User requests /llms.txt
  → Generate file (slow, 1-2 seconds)
  → Return to user
Every request = regeneration!

New approach (smart caching):

Hourly job runs:
  → Check: "New topics since last hour?"
  → NO: Skip regeneration (save resources)
  → YES: Regenerate and cache

User requests /llms.txt:
  → Return cached version (instant, <50ms)

Benefits:

✅ Files stay fresh (max 1 hour old)
✅ Faster response times (cached)
✅ Less server load (only regenerate when needed)
✅ Automatic updates (no manual intervention)

Monitoring Cache Updates

Check logs to see cache activity:

tail -f /var/www/discourse/logs/production.log | grep llms.txt

You'll see:

[llms.txt] Updating cache due to new content
[llms.txt] Cache updated successfully

or

[llms.txt] No new content, skipping cache update

Performance Optimization

Caching

Navigation file (llms.txt) is cached for 60 minutes by default
Full content file is generated on-demand (not cached due to size)
Cache is automatically cleared when posts are created/edited

Optimization Tips

Set appropriate limits: Don't include all content if you have a large forum
Adjust minimum views: Filter out low-quality topics
Monitor access: Check analytics to see how often files are accessed
Use CDN: Consider CDN caching for frequently accessed files

Resource Usage

Small forum (<1000 topics): Negligible impact
Medium forum (1000-10000 topics): ~1-2 seconds generation time for full file
Large forum (>10000 topics): Use "small" or "medium" posts_limit setting

Custom Forum Description

What is it? An optional text field that appears at the top of your llms-full.txt file, right after the site title and description.

Why do you need it? LLMs work best when they have clear context about what your forum is about. This field allows you to provide:

The main purpose of your forum
What topics are discussed
Who your target audience is
Any special focus areas or expertise

What to write: ✅ GOOD examples:

This forum is dedicated to discussing Python programming, with focus on
web development, data science, and machine learning. Our community includes
beginners and experienced developers sharing practical solutions and best practices.

A technical support community for XYZ Software users. We help troubleshoot
installation issues, configuration problems, and provide guides for advanced features.
Members range from new users to certified administrators.

❌ BAD examples (avoid these):

🎉 Join the BEST community ever! 🚀 Amazing discussions!
Limited time offer - sign up now! [This is marketing spam]

We are the world's leading #1 forum with millions of experts!
[This is exaggeration/false claims]

How LLMs use this: When an LLM reads your llms-full.txt, it first reads this description to understand the context. This helps it:

Give more accurate answers about your forum
Better match user queries to your content
Understand the expertise level of discussions

Configuration:

Navigate to: Admin → Settings → Plugins → discourse-llms-txt-generator
Find: llms_txt_full_description
Enter: 2-4 sentences describing your forum factually
Leave empty if you don't need it (optional field)

Integration with Discourse

Robots.txt

The plugin automatically adds entries to robots.txt via view connector:

# LLM Documentation Files
Allow: /llms.txt
Allow: /llms-full.txt
Allow: /sitemaps.txt
Allow: /c/*/llms.txt
Allow: /t/*/llms.txt
Allow: /tag/*/llms.txt

Sitemap: https://your-forum.com/sitemaps.txt

Or if indexing is disabled:

# LLM Documentation Files
Disallow: /llms.txt
Disallow: /llms-full.txt
Disallow: /sitemaps.txt
Disallow: /c/*/llms.txt
Disallow: /t/*/llms.txt
Disallow: /tag/*/llms.txt

Sitemap.xml

Automatically adds entries to sitemap:

<url>
  <loc>https://your-forum.com/llms.txt</loc>
  <priority>1.0</priority>
  <changefreq>daily</changefreq>
</url>
<url>
  <loc>https://your-forum.com/llms-full.txt</loc>
  <priority>0.9</priority>
  <changefreq>weekly</changefreq>
</url>
<url>
  <loc>https://your-forum.com/sitemaps.txt</loc>
  <priority>0.8</priority>
  <changefreq>weekly</changefreq>
</url>

Analytics

The plugin tracks:

Access count for each file
Last access timestamp
User agent (via server logs)

Access stats via Rails console:

PluginStore.get("discourse-llms-txt-generator", "access_count_index")
PluginStore.get("discourse-llms-txt-generator", "access_count_full")
PluginStore.get("discourse-llms-txt-generator", "last_access_index")

🔒 Privacy & Security

Private Content Protection

Your private categories and topics are SAFE:

✅ Private categories (read_restricted: true) are completely excluded from:

/llms.txt - Latest topics list
/llms-full.txt - Full content index
/sitemaps.txt - Sitemap index
/tag/{name}/llms.txt - Tag-based topic lists

✅ Topics from private categories will never appear in public llms.txt files

✅ Dynamic files (per-category/topic) use Guardian permission checks:

/c/{category}/llms.txt → Returns 404 for unauthorized users
/t/{topic}/llms.txt → Returns 404 for unauthorized users

✅ Automatic updates: When you change a category from private to public, topics automatically appear in public files (within cache refresh time)

Security Features

Respects Discourse Permissions: Only includes publicly accessible content
No Authentication Required: Public files work like sitemaps
No Personal Data: Only publicly visible forum content
CSRF Safe: All security checks properly handled
No XSS Risk: Content is served as plain text
Category-Level Filtering: SQL-level filtering ensures private topics never leak

🐛 Troubleshooting

Issue: Files not accessible

Symptoms:

/llms.txt returns 404
/llms-full.txt not found

Solutions:

Check plugin is enabled:

Admin → Settings → Plugins → llms_txt_enabled = true

Verify indexing is allowed:

Admin → Settings → llms_txt_allow_indexing = true

Check plugin installation:

Admin → Plugins → Look for "discourse-llms-txt-generator"

Check logs:

tail -f /var/www/discourse/logs/production.log | grep llms

Issue: Empty or incomplete content

Symptoms:

Files exist but show no topics
Missing categories

Solutions:

Verify you have public topics with sufficient views:
- Topics must have at least llms_txt_min_views views (default 50)
Check minimum views setting:
```
Admin → Settings → llms_txt_min_views
```
If too high, lower it to include more topics
Check posts limit:
```
Admin → Settings → llms_txt_posts_limit
```
Try changing to "medium" or "all"
Ensure categories are public:
- Private categories (read_restricted: true) are excluded
- Check: Admin → Categories → [Category] → Security

Issue: Performance issues

Symptoms:

Slow generation times
High server load
Timeouts

Solutions:

Reduce posts limit:

Admin → Settings → llms_txt_posts_limit = "small" or "medium"

Increase minimum views:
```
Admin → Settings → llms_txt_min_views = 100
```
Filter out low-quality topics

Disable excerpts:

Admin → Settings → llms_txt_include_excerpts = false

Increase cache duration:

Admin → Settings → llms_txt_cache_minutes = 120

Issue: robots.txt doesn't show llms.txt entries

Symptoms:

curl https://forum.com/robots.txt doesn't show "LLM Documentation Files"

Solutions:

Clear robots.txt cache:

cd /var/www/discourse
rails runner "Rails.cache.delete('robots_txt')"

Use plugin rake task:
```
bundle exec rake llms_txt:refresh
```

Restart Discourse:

cd /var/discourse
./launcher restart app

Verify after 30 seconds:

curl https://your-forum.com/robots.txt | grep -A 10 "LLM Documentation"

🤝 Support & Contributing

Getting Help

📧 Email: support@kaktak.net
💬 Discourse Meta: Plugin Topic
🐛 Issues: GitHub Issues

Contributing

See CONTRIBUTING.md for:

Development setup and testing
Architecture and how the plugin works internally
Code guidelines and standards
How to submit changes

Quick contribution guide:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

📝 Changelog

Version 1.2.0 (2025-11-11)

Bug Fixes:

🐛 Fixed critical robots.txt integration bug - replaced non-existent :robots_txt event with view connector
🐛 Fixed URL encoding for Cyrillic and international characters (RFC 3986 compliance)
🐛 Fixed compatibility with Discourse 3.6.0.beta3 and different versions

Major Features:

✨ Added canonical URLs in HTTP Link headers (RFC 6596)
✨ Added canonical URL footers in dynamic llms.txt files
✨ Proper SEO protection against duplicate content penalties

Improvements:

🔧 Improved sitemap integration using DiscourseEvent hooks
🔧 Added rake tasks for maintenance (llms_txt:refresh, llms_txt:check)
🔧 Enhanced CONTRIBUTING.md with sitemap/robots testing documentation

Version 1.0.0 (2025-11-08)

Initial release
Basic llms.txt navigation generation
Full llms-full.txt content generation
Configurable settings
Robots.txt and sitemap.xml integration
Caching support
Analytics tracking
Multi-language foundation (English)

📄 License

MIT License

This is free, open-source software. Use it, modify it, share it!

See LICENSE file for details.

🌟 Credits

Author: KakTak.net
Standard: llms.txt by Jeremy Howard (Answer.AI)
Platform: Discourse

Made with ❤️ for the Discourse community

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
config		config
lib		lib
spec/requests/discourse_llms_txt		spec/requests/discourse_llms_txt
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.ru.md		README.ru.md
plugin.rb		plugin.rb

License

kaktaknet/discourse-llms-txt-generator

Folders and files

Latest commit

History

Repository files navigation

Discourse llms.txt Generator

📚 About llms.txt

📋 Table of Contents

Getting Started

Core Features

Configuration

Advanced Topics

Resources

🎯 What This Does

1. Main Navigation File (for AI discovery)

2. Full Content Index (for AI training)

3. Dynamic Resource Files (for targeted content)

4. Sitemap Index (for crawler discovery)

💡 Why You Need This

The Problem Without This

The Solution With This

Real-World Impact

📦 Installation

Quick Install (5 minutes)

Manual Installation (Alternative)

🌟 Key Features

Feature 1: Automatic Generation

Feature 2: Dynamic Per-Resource llms.txt

Feature 3: Smart Caching

Feature 4: Bot Control

Feature 5: SEO Integration

Generated Files

/llms.txt - Navigation File (Lightweight)

/llms-full.txt - Full Content File

/sitemaps.txt - Index of All llms.txt Files

Dynamic llms.txt Files

How It Works

Example: Category llms.txt

Example: Topic llms.txt

Example: Tag llms.txt

Why This Is Powerful

Performance Notes

⚙️ Configuration

Main Settings

Content Settings

Performance Settings

🔬 Advanced Topics

SEO & Canonical URLs

Bot Control & Blocking

Smart Cache Management

Automatic Hourly Checks

Manual Cache Clear

Why This Matters

Monitoring Cache Updates

Performance Optimization

Caching

Optimization Tips

Resource Usage

Custom Forum Description

Integration with Discourse

Robots.txt

Sitemap.xml

Analytics

🔒 Privacy & Security

Private Content Protection

Security Features

🐛 Troubleshooting

Issue: Files not accessible

Issue: Empty or incomplete content

Issue: Performance issues

Issue: robots.txt doesn't show llms.txt entries

🤝 Support & Contributing

Getting Help

Contributing

📝 Changelog

Version 1.2.0 (2025-11-11)

Version 1.0.0 (2025-11-08)

📄 License

🌟 Credits

`/llms.txt` - Navigation File (Lightweight)

`/llms-full.txt` - Full Content File

`/sitemaps.txt` - Index of All llms.txt Files

Packages