Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@ Ralph reads these specs and builds the entire project autonomously.
|----------|-------------|------------|
| [react-native-app](specs/mobile/react-native-app.md) | Cross-platform mobile app | Intermediate |

### SEO & AEO
| Template | Description | Difficulty |
|----------|-------------|------------|
| [aeo-toolkit](specs/seo/aeo-toolkit.md) | Answer Engine Optimization with llms.txt, AI crawlers, citations | Advanced |
| [seo-toolkit](specs/seo/seo-toolkit.md) | Technical SEO with metadata, sitemaps, Core Web Vitals | Intermediate |

### Tools
| Template | Description | Difficulty |
|----------|-------------|------------|
Expand Down
159 changes: 159 additions & 0 deletions specs/seo/aeo-toolkit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# AEO Toolkit — Answer Engine Optimization

Build Answer Engine Optimization into an existing project so AI crawlers (GPTBot, ClaudeBot, PerplexityBot) can discover, parse, and cite the site's content.

## Overview

An AEO (Answer Engine Optimization) implementation that ralph drops into the user's existing project. Ralph first reads `package.json` to detect the framework (Next.js, Nuxt, Astro, Remix, Express, or static), then generates only the files that match that stack. The toolkit covers the core AEO primitives: `robots.txt` with AI crawler directives, `llms.txt` / `llms-full.txt` per the llmstxt.org spec, AI-optimized sitemaps, structured data (JSON-LD) tuned for answer extraction, and a CLI auditor that scores the site's AEO readiness.

AI crawlers do not execute JavaScript — all critical content must be in the initial HTML response. This is the fundamental constraint the toolkit addresses.

## Features

- Framework auto-detection from `package.json`
- `robots.txt` with AI crawler directives (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot)
- `llms.txt` and `llms-full.txt` generation per llmstxt.org spec
- AI-optimized XML sitemap with priority scoring
- JSON-LD structured data (Article, FAQ, HowTo, Product, Organization)
- Bot detection middleware (identifies AI crawlers by user-agent)
- Markdown endpoint support (`.md` versions of pages for LLM consumption)
- AEO audit CLI that scores a site 0-100
- Entity consistency checker

## Tasks

### Task 1: Detect Framework and Setup

- [ ] Read `package.json` to detect framework (next, nuxt, astro, remix, express)
- [ ] Create `aeo.config.ts` with Zod-validated schema (site name, URL, crawler policies)
- [ ] Create `src/lib/aeo/` directory for core utilities
- [ ] Install dependencies (zod, unified/remark, xml2js)

### Task 2: robots.txt with AI Crawler Directives

- [ ] Generate `robots.txt` using the detected framework's routing pattern
- [ ] Include directives for: GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, PerplexityBot, Google-Extended, CCBot, Meta-ExternalAgent, Bytespider, Applebot-Extended
- [ ] Support three policies via config: `allow-all`, `block-training`, `selective`
- [ ] Add Sitemap and Crawl-delay directives
- [ ] Write tests

### Task 3: llms.txt and llms-full.txt

- [ ] Generate `llms.txt` from config content map (H1 title, blockquote summary, H2 link sections)
- [ ] Generate `llms-full.txt` with full site documentation (company overview, products, audience)
- [ ] Serve both files via the framework's routing system
- [ ] Add `lastUpdated` timestamp
- [ ] Write tests

### Task 4: AI-Optimized Sitemap

- [ ] Generate XML sitemap with priority scoring (landing pages > docs > blog > archives)
- [ ] Add `lastmod` timestamps and `changefreq` hints
- [ ] Reference sitemap in robots.txt
- [ ] Write tests

### Task 5: Structured Data (JSON-LD)

- [ ] Create JSON-LD generator functions: Article, FAQ, HowTo, Product, Organization, BreadcrumbList
- [ ] Create component/helper to inject `<script type="application/ld+json">` into pages
- [ ] Add schema validator utility
- [ ] Write tests for each schema type

### Task 6: Bot Detection Middleware

- [ ] Create `isAIBot(userAgent)` utility with patterns for all known AI crawlers
- [ ] Create middleware that tags AI bot requests (sets header or context flag)
- [ ] Add HTML meta tags: `<meta name="robots" content="max-snippet:-1, max-image-preview:large">`
- [ ] Add canonical URL helper
- [ ] Write tests

### Task 7: Markdown Endpoints

- [ ] Create middleware/route to serve `.md` versions of pages
- [ ] Convert HTML to clean markdown (strip nav, footer, boilerplate)
- [ ] Add front-matter metadata (title, description, date)
- [ ] Write tests

### Task 8: AEO Audit CLI

- [ ] Create `scripts/aeo-audit.ts` runnable via `npx tsx scripts/aeo-audit.ts`
- [ ] Check: robots.txt has AI bot directives, llms.txt exists and is valid, structured data present, critical content is in static HTML
- [ ] Score 0-100 with actionable recommendations
- [ ] Output as terminal table
- [ ] Write tests

### Task 9: Entity Consistency Checker

- [ ] Define entity registry in config (brand names with canonical forms and variants)
- [ ] Scan source files for inconsistent naming
- [ ] Report with file locations and suggested fixes
- [ ] Write tests

## Tech Stack

- TypeScript
- Zod (config validation)
- Unified / Remark (markdown processing)
- xml2js (XML generation)
- Vitest (testing)
- Framework detected at generation time from `package.json`

## Files to Create

- `aeo.config.ts`
- `src/lib/aeo/config.ts`
- `src/lib/aeo/robots.ts`
- `src/lib/aeo/llms-txt.ts`
- `src/lib/aeo/sitemap.ts`
- `src/lib/aeo/structured-data.ts`
- `src/lib/aeo/bot-detector.ts`
- `src/lib/aeo/markdown-endpoints.ts`
- `src/lib/aeo/entity-checker.ts`
- `scripts/aeo-audit.ts`
- `tests/aeo/robots.test.ts`
- `tests/aeo/llms-txt.test.ts`
- `tests/aeo/structured-data.test.ts`
- `tests/aeo/bot-detector.test.ts`
- `tests/aeo/audit.test.ts`

## Files to Modify

- `package.json` — Add dependencies
- Framework-specific routing files (detected at generation time) — Wire robots.txt, llms.txt, sitemap routes

## Configuration

### aeo.config.ts

```typescript
import { defineAEOConfig } from './src/lib/aeo/config';

export default defineAEOConfig({
site: {
name: 'My Company',
url: 'https://example.com',
description: 'Short description for llms.txt blockquote',
},
robots: {
policy: 'allow-all', // 'allow-all' | 'block-training' | 'selective'
},
llmsTxt: {
sections: [
{ title: 'Documentation', pages: ['/docs', '/guides'] },
{ title: 'Blog', pages: ['/blog'] },
],
},
entities: {
'Next.js': ['NextJS', 'Next JS', 'Nextjs'],
},
});
```

## Notes

- Ralph reads `package.json` to detect the framework and generates routes/middleware using that framework's patterns
- AI crawlers do NOT execute JavaScript — all critical content must be in the initial HTML
- `llms.txt` is an emerging standard (not yet formally adopted by major AI companies) but is low-risk to implement
- `robots.txt` remains the primary mechanism for AI crawler control
- ChatGPT crawls ~8x more than Googlebot; Perplexity ~3x (Conductor research)
- Requires Node.js 18+
172 changes: 172 additions & 0 deletions specs/seo/seo-toolkit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# SEO Toolkit — Technical SEO

Build technical SEO into an existing project covering metadata, structured data, sitemaps, performance monitoring, and on-page optimization.

## Overview

A Technical SEO implementation that ralph drops into the user's existing project. Ralph first reads `package.json` to detect the framework (Next.js, Nuxt, Astro, Remix, Express, or static), then generates only the files that match that stack. Covers every pillar of modern SEO: metadata management, Open Graph / Twitter Cards, canonical URLs, structured data (JSON-LD), XML sitemaps, RSS feeds, robots.txt, and an SEO audit CLI that scores the site.

While the AEO Toolkit focuses on AI answer engines, this toolkit targets traditional search engines (Google, Bing) and social platforms (Facebook, Twitter/X, LinkedIn).

## Features

- Framework auto-detection from `package.json`
- Metadata helpers (title templates, OG tags, Twitter Cards, canonical URLs, hreflang)
- JSON-LD structured data (Article, Product, Organization, BreadcrumbList, LocalBusiness, FAQPage, WebSite with SearchAction)
- XML sitemap generation
- RSS / Atom feed generation
- Dynamic `robots.txt`
- Heading hierarchy validator
- Image alt text auditor
- Internal linking analyzer with orphan page detection
- SEO audit CLI that scores a site 0-100

## Tasks

### Task 1: Detect Framework and Setup

- [ ] Read `package.json` to detect framework (next, nuxt, astro, remix, express)
- [ ] Create `seo.config.ts` with Zod-validated schema (site name, URL, social accounts, defaults)
- [ ] Create `src/lib/seo/` directory for core utilities
- [ ] Install dependencies (zod, xml2js, feed, cheerio)

### Task 2: Metadata Helpers

- [ ] Create `generateMetaTags(page, config)` that returns title, description, OG, and Twitter Card tags
- [ ] Support title templates (`%s | Site Name`)
- [ ] Add canonical URL generation with trailing slash normalization
- [ ] Add hreflang generation for multi-language pages
- [ ] Create framework-specific integration (Next.js Metadata API, Nuxt useHead, etc.)
- [ ] Write tests

### Task 3: Structured Data (JSON-LD)

- [ ] Create JSON-LD generator functions: WebSite, Organization, Article, Product, BreadcrumbList, LocalBusiness, FAQPage
- [ ] Create component/helper to inject `<script type="application/ld+json">` into pages
- [ ] Add JSON-LD validator utility
- [ ] Write tests for each schema type

### Task 4: XML Sitemap and RSS Feed

- [ ] Generate XML sitemap with `lastmod`, `priority`, and `changefreq`
- [ ] Support sitemap index for large sites (>50,000 URLs)
- [ ] Generate RSS feed and Atom feed for content pages
- [ ] Serve via the framework's routing system
- [ ] Write tests

### Task 5: robots.txt and Crawl Control

- [ ] Generate `robots.txt` with per-section Allow/Disallow rules
- [ ] Support per-environment rules (block crawlers on staging)
- [ ] Add `Sitemap` directive
- [ ] Add `noindex` meta tag helper for excluded pages
- [ ] Write tests

### Task 6: Heading Hierarchy Validator

- [ ] Create `validateHeadings(html)` — checks single H1, proper nesting, empty headings
- [ ] Return structured report with issues and locations
- [ ] Write tests

### Task 7: Image Alt Text Auditor

- [ ] Create `auditImages(html)` — detects missing or generic alt text
- [ ] Flag missing `width`/`height` attributes (CLS risk)
- [ ] Generate image sitemap entries
- [ ] Write tests

### Task 8: Internal Linking Analyzer

- [ ] Create `analyzeLinkGraph(pages)` — build link graph from page list
- [ ] Detect orphan pages (no internal links pointing to them)
- [ ] Calculate link depth from homepage
- [ ] Flag broken internal links (404s)
- [ ] Write tests

### Task 9: SEO Audit CLI

- [ ] Create `scripts/seo-audit.ts` runnable via `npx tsx scripts/seo-audit.ts`
- [ ] Check: meta tags, OG tags, canonical URLs, structured data, headings, images, robots.txt
- [ ] Category scores: Metadata, Structured Data, Content, Crawlability
- [ ] Score 0-100 with prioritized recommendations (critical, warning, info)
- [ ] Output as terminal table and optional JSON
- [ ] Write tests

## Tech Stack

- TypeScript
- Zod (config validation)
- Cheerio (HTML parsing)
- xml2js (XML generation)
- feed (RSS/Atom)
- Vitest (testing)
- Framework detected at generation time from `package.json`

## Files to Create

- `seo.config.ts`
- `src/lib/seo/config.ts`
- `src/lib/seo/metadata.ts`
- `src/lib/seo/structured-data.ts`
- `src/lib/seo/sitemap.ts`
- `src/lib/seo/feed.ts`
- `src/lib/seo/robots.ts`
- `src/lib/seo/heading-validator.ts`
- `src/lib/seo/image-audit.ts`
- `src/lib/seo/link-analyzer.ts`
- `scripts/seo-audit.ts`
- `tests/seo/metadata.test.ts`
- `tests/seo/structured-data.test.ts`
- `tests/seo/sitemap.test.ts`
- `tests/seo/robots.test.ts`
- `tests/seo/heading-validator.test.ts`
- `tests/seo/audit.test.ts`

## Files to Modify

- `package.json` — Add dependencies
- Framework-specific routing files (detected at generation time) — Wire sitemap, robots.txt, feed routes
- Framework-specific layout/head (detected at generation time) — Integrate metadata helpers

## Configuration

### seo.config.ts

```typescript
import { defineSEOConfig } from './src/lib/seo/config';

export default defineSEOConfig({
site: {
name: 'My Company',
url: 'https://example.com',
description: 'We build great software.',
locale: 'en_US',
twitter: '@mycompany',
},
metadata: {
titleTemplate: '%s | My Company',
defaultOgImage: '/og-default.png',
},
structuredData: {
organization: {
name: 'My Company',
logo: 'https://example.com/logo.png',
sameAs: ['https://twitter.com/mycompany', 'https://github.com/mycompany'],
},
},
sitemap: {
exclude: ['/admin/*', '/api/*'],
},
redirects: [
{ source: '/old-page', destination: '/new-page', permanent: true },
],
});
```

## Notes

- Ralph reads `package.json` to detect the framework and generates routes/components using that framework's patterns
- Core Web Vitals thresholds: LCP < 2.5s, CLS < 0.1, INP < 200ms
- The internal link analyzer runs on-demand via CLI — not in production
- Structured data is validated against schema.org specs
- Requires Node.js 18+
23 changes: 23 additions & 0 deletions templates.json
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,24 @@
"tags": ["chrome", "extension", "react", "typescript", "manifest-v3", "browser"],
"difficulty": "intermediate",
"path": "specs/tools/chrome-extension.md"
},
{
"id": "aeo-toolkit",
"name": "AEO Toolkit",
"description": "Build an Answer Engine Optimization toolkit with llms.txt, AI crawler management, structured data, and citation tracking.",
"category": "seo",
"tags": ["aeo", "llms-txt", "robots-txt", "structured-data", "ai-crawlers", "typescript"],
"difficulty": "advanced",
"path": "specs/seo/aeo-toolkit.md"
},
{
"id": "seo-toolkit",
"name": "SEO Toolkit",
"description": "Build a comprehensive technical SEO toolkit with metadata, sitemaps, Core Web Vitals, and audit dashboard.",
"category": "seo",
"tags": ["seo", "metadata", "sitemap", "structured-data", "json-ld", "typescript"],
"difficulty": "intermediate",
"path": "specs/seo/seo-toolkit.md"
}
],
"categories": [
Expand Down Expand Up @@ -180,6 +198,11 @@
"id": "tools",
"name": "Tools",
"description": "CLI tools and browser extensions"
},
{
"id": "seo",
"name": "SEO & AEO",
"description": "Search engine and answer engine optimization toolkits"
}
]
}