-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix frontend robots.txt #4186
Fix frontend robots.txt #4186
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work figuring out what changed with the traffic!
` | ||
: `# Block crawlers from the staging site | ||
: `# Block everyone from the staging site |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I suppose we should have a similar one for the staging API docs, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left some non-blocking comments and questions inline.
Note on the testing instructions: you need to run DEPLOYMENT_ENV=production just frontend/run dev:only
to see the production values, otherwise you'll see the "block everyone" robots.txt :)
@@ -1,5 +1,45 @@ | |||
const { LOCAL, PRODUCTION } = require("../constants/deploy-env") | |||
|
|||
const AI_ROBOTS_CONTENT = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to maintain a list of user-agent names, and create the block string using it:
uaList.map(ua => `User-agent: ${ua}\nDisallow: /\n`).join("\n")
@@ -10,13 +50,17 @@ export default function robots(_, res) { | |||
deployEnv === PRODUCTION | |||
? `# Block search result pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? `# Block search result pages | |
? `# Block search result pages and single result pages |
Disallow: /search/audio/ | ||
Disallow: /search/image/ | ||
Disallow: /search/ | ||
Disallow: /image/ | ||
Disallow: /audio/ | ||
|
||
crawl-delay: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of crawl-delay
with no value here? Could you add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo!
Fixes
Fixes a regression introduced in #4077
Description
This PR returns to using a sever middleware for robots.txt in the Openverse frontend. It also adds robots.txt to the frontend gitignore to help prevent us from accidentally adding a static robots.txt file in the future.
It also adds Crawl-delay to slow down bing and other search engines when they crawl the pages we do allow them to crawl.
Testing Instructions
Run the frontend locally with
just frontend/run dev
and verify that you see the correct file contents as defined in /frontend/src/server-middleware/robots.jsChecklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin