Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .env.local.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# SEO Indexing Control
# Set to 'true' to allow search engines to index the site
# Set to 'false' to block search engines (useful for staging/development)
NEXT_PUBLIC_ALLOW_INDEXING=false

# Site URL for robots.txt sitemap reference
NEXT_PUBLIC_SITE_URL=https://help.vtex.com

# Add your other environment variables here
# GITHUB_APPID=your_github_app_id
# GITHUB_INSTALLATIONID=your_installation_id
# ISR_REVALIDATE_SECONDS=600
108 changes: 108 additions & 0 deletions docs/seo-indexing-control.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# SEO Indexing Control

This feature allows you to control search engine indexing behavior using environment variables. It's particularly useful for preventing staging or development environments from being indexed by search engines.

## How It Works

The system implements a multi-layer approach to SEO control:

1. **Meta Robots Tags**: Conditionally adds `noindex, nofollow` meta tags to HTML pages
2. **Dynamic robots.txt**: Generates robots.txt file that either allows or disallows all crawlers
3. **Environment-based**: Uses `NEXT_PUBLIC_ALLOW_INDEXING` environment variable

## Environment Variables

### `NEXT_PUBLIC_ALLOW_INDEXING`
- **Type**: String (`'true'` or `'false'`)
- **Default**: `'true'`
- **Description**: Controls whether search engines can index the site

### `NEXT_PUBLIC_SITE_URL`
- **Type**: String (URL)
- **Default**: `'https://help.vtex.com'`
- **Description**: The canonical URL for sitemap reference in robots.txt

## Configuration Examples

### Development Environment (`.env.local`)
```bash
NEXT_PUBLIC_ALLOW_INDEXING=false
NEXT_PUBLIC_SITE_URL=http://localhost:3000
```

### Staging Environment
```bash
NEXT_PUBLIC_ALLOW_INDEXING=false
NEXT_PUBLIC_SITE_URL=https://staging.help.vtex.com
```

### Production Environment
```bash
NEXT_PUBLIC_ALLOW_INDEXING=true
NEXT_PUBLIC_SITE_URL=https://help.vtex.com
```

## Components

### `SEOControl` Component
- Located: `src/components/seo-control.tsx`
- Purpose: Adds robots meta tags when indexing is disabled
- Usage: Automatically included in `_app.tsx`

### Dynamic `robots.txt`
- Located: `src/pages/robots.txt.tsx`
- Purpose: Generates robots.txt based on environment settings
- URL: `/robots.txt`

## Behavior

### When `NEXT_PUBLIC_ALLOW_INDEXING=false`:
- Adds `<meta name="robots" content="noindex, nofollow">` to all pages
- Adds `<meta name="googlebot" content="noindex, nofollow">` for Google-specific control
- Generates robots.txt that disallows all crawlers
- Includes `nosnippet` directive to prevent AI Overview usage

### When `NEXT_PUBLIC_ALLOW_INDEXING=true`:
- No restrictive meta tags are added
- Generates robots.txt that allows crawling with sensible disallows for admin/API routes
- Includes sitemap reference in robots.txt

## Testing

### Local Testing
1. Set `NEXT_PUBLIC_ALLOW_INDEXING=false` in `.env.local`
2. Run `npm run dev`
3. Visit `http://localhost:3000` and check page source for robots meta tags
4. Visit `http://localhost:3000/robots.txt` to verify robots.txt content

### Production Testing
1. Deploy with `NEXT_PUBLIC_ALLOW_INDEXING=true`
2. Check that no noindex meta tags are present
3. Verify robots.txt allows crawling and includes sitemap

## Best Practices

1. **Always disable indexing for non-production environments**
2. **Enable indexing only for the main production domain**
3. **Use staging environments to test SEO changes before production**
4. **Monitor robots.txt accessibility after deployment**

## Troubleshooting

### Meta tags not appearing
- Check that `NEXT_PUBLIC_ALLOW_INDEXING` is properly set
- Verify the environment variable is prefixed with `NEXT_PUBLIC_`
- Restart the development server after changing environment variables

### robots.txt not working
- Ensure the route `/robots.txt` is accessible
- Check server logs for any errors in the robots.txt.tsx file
- Verify environment variables are available at build time

## SEO Impact

This implementation follows current best practices:
- **Meta robots tags are directives** (always respected by search engines)
- **Compatible with AI Mode** and Google's AI Overviews
- **Covers multiple crawlers** including Googlebot and others
- **Provides redundancy** with both meta tags and robots.txt
4 changes: 4 additions & 0 deletions next.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ const nextConfig = {
contentOrg: '',
contentRepo: '',
contentBranch: '',
NEXT_PUBLIC_ALLOW_INDEXING:
process.env.NEXT_PUBLIC_ALLOW_INDEXING || 'true',
NEXT_PUBLIC_SITE_URL:
process.env.NEXT_PUBLIC_SITE_URL || 'https://help.vtex.com',
},
async redirects() {
return []
Expand Down
36 changes: 36 additions & 0 deletions src/components/seo-control.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import Head from 'next/head'

interface SEOControlProps {
allowIndexing?: boolean
}

/**
* SEO Control Component
*
* Conditionally adds robots meta tags to prevent search engine indexing
* based on environment variables or props. This is particularly useful for
* staging environments where you don't want content to be indexed.
*
* @param allowIndexing - Optional override for indexing permission
*/
export default function SEOControl({ allowIndexing }: SEOControlProps) {
// Check environment variable, defaulting to true if not set
const envAllowIndexing = process.env.NEXT_PUBLIC_ALLOW_INDEXING === 'true'

// Use prop override if provided, otherwise use environment variable
const shouldIndex =
allowIndexing !== undefined ? allowIndexing : envAllowIndexing

// Only render noindex tags if indexing should be disabled
if (shouldIndex) {
return null
}

return (
<Head>
<meta name="robots" content="noindex, nofollow" />
<meta name="googlebot" content="noindex, nofollow, nosnippet" />
{/* Prevent content from being used in AI Overviews and AI Mode */}
</Head>
)
}
2 changes: 2 additions & 0 deletions src/pages/_app.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import '@vtexdocs/components/dist/index.css'
import '@fortawesome/fontawesome-free/css/all.css'

import Layout from 'components/layout'
import SEOControl from 'components/seo-control'

type Props = AppProps & {
Component: Page
Expand Down Expand Up @@ -39,6 +40,7 @@ function MyApp({ Component, pageProps }: Props) {
content={pageProps.locale || currentLocale}
/>
</Head>
<SEOControl />
<PreviewContextProvider>
<Layout
// ❌ REMOVED: sidebarfallback (now loaded client-side)
Expand Down
53 changes: 53 additions & 0 deletions src/pages/robots.txt.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import { GetServerSideProps } from 'next'

/**
* Dynamic robots.txt Generator
*
* This page generates a robots.txt file dynamically based on environment variables.
* When NEXT_PUBLIC_ALLOW_INDEXING is false (staging/dev), it disallows all crawlers.
* When true (production), it allows crawling and includes sitemap reference.
*/
export default function Robots() {
// This component will never be rendered as we handle everything in getServerSideProps
return null
}

export const getServerSideProps: GetServerSideProps = async ({ res }) => {
// Check if indexing is allowed via environment variable
const allowIndexing = process.env.NEXT_PUBLIC_ALLOW_INDEXING === 'true'

// Get the site URL from environment or use default
const siteUrl = process.env.NEXT_PUBLIC_SITE_URL || 'https://help.vtex.com'

// Generate robots.txt content based on indexing permission
const robotsTxt = allowIndexing
? `# Robots.txt - Production (Indexing Allowed)
User-agent: *
Allow: /

# Disallow admin and API routes
Disallow: /api/
Disallow: /admin/
Disallow: /_next/
Disallow: /editor/

# Sitemap location
Sitemap: ${siteUrl}/sitemap.xml`
: `# Robots.txt - Development/Staging (Indexing Disabled)
User-agent: *
Disallow: /

# Block all crawlers in non-production environments`

// Set appropriate headers
res.setHeader('Content-Type', 'text/plain; charset=utf-8')
res.setHeader('Cache-Control', 'public, max-age=3600, s-maxage=3600')

// Send the robots.txt content
res.write(robotsTxt)
res.end()

return {
props: {},
}
}