Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collection redirect page and consolidate htaccess configuration files #261

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

oraNod
Copy link
Collaborator

@oraNod oraNod commented Nov 11, 2024

This PR makes the following changes:

  • Updates https://docs.ansible.com/collections.html to act as a stub page for redirecting plugin and module pages that existed prior to collections.
  • Adds rules to redirect users from plugin and module pages to the new collections.html page.
  • Consolidates all the dynamic redirects, except for version 2.9, in the ansible subdirectory.
  • Removes all consolidated htaccess configuration files in the ansible subdirectory.

As a result of this change users that access plugin and module pages will be redirected to collections.html. For example, this link is available from an ansible.com blog post: https://docs.ansible.com/ansible/latest/modules/k8s_module.html

A rule exists to redirect this page to the corresponding collection page, which has been moved and results in a 404 when the redirect is followed:

RedirectMatch "^(/ansible/[^/]+)/modules/k8s_module.html" "$1/collections/community/kubernetes/k8s_module.html"

Instead of having to maintain all the rules in the htaccess configuration files between releases, we can redirect users to a single page. While true that this will require users to search for the appropriate documentation, it does avoid 404s and any resulting SEO degradation.

For any reviewers, here's a link to the collections.html page on the RTD preview build: https://ansible--261.org.readthedocs.build/collections.html

To evaluate these changes on the test server, use this branch in my fork because it tests the collections.html page on the test server. Note to say that we can't actually evaluate these changes with the test server because the rsync command that the jenkins job uses doesn't include the --delete flag so the old .htaccess configuration files aren't getting removed. They aren't playing nicely with the redirects here. Plan B is to stand up a test server somewhere. I don't want to prune old files off the actual test server just to validate these redirects.

<ul class="fa-ul">
<li>
<span class="fa-li"><i class="fas fa-book"></i></span>
<a href="{{ index.quicklinks.collection_index.link }}" target="_blank">{{ index.quicklinks.collection_index.label }}</a>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builtin modules are by far the most popular. Can we add a quick link here for https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html#plugins-in-ansible-builtin

@samccann
Copy link
Collaborator

samccann commented Nov 11, 2024

I copied the .httaccess to https://htaccess.madewithlove.com/ and typed in a bunch of urls to see what it would redirect to:

  • Verify all test cases from Documentation Checklist: Ansible 9 release ansible-documentation#428 (comment) point to the new collection stub page.
  • Verify 2.9 module pages do NOT redirect
  • Verify Ansible 9 and latest pages do NOT redirect
  • Verify 2.3-2.6 module pages redirect to the new collection stub page
  • Verify 2.3, 2.4, 2,5 and 2.6 specific redirects still work (not related to vault, guides, or modules - see next comment for details)
  • Verify top 20 pages based on web analytics work as expected.

So ansible (precollections) 2.3 through 2.6 redirect today to /latest/ and we republished those guides to an archive site so those who do still need them can fine them. We didn't do this with 2.7 and 2.8 because I couldn't republish them (jenkins updates made it too difficult so I gave up). Just putting that here for history on why these proposed new redirects don't go beyond 2.6.

@samccann
Copy link
Collaborator

Some problems (running same test as prior comment):

@samccann
Copy link
Collaborator

My general thoughts - though it does 'lower' the user experience ( aka I used to get to the correct module page from an ancient stack overflow page that predated collections), it is acceptable since collections are now 4+ years old. I agree with @oraNod that this is necessary both from a maintenance point of view (see k8s module currently 404s because we can't test/update 1k redirects to keep them all accurate), and from the overall stratey to move to RTD (which cannot handle 1k redirects anyway).

@samccann
Copy link
Collaborator

oh last thoughts - once we get enough approvers, we should blast out a warning in matrix before we merge (and don't merge on a Friday lol) in case it all blows up...

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

oh last thoughts - once we get enough approvers, we should blast out a warning in matrix before we merge (and don't merge on a Friday lol) in case it all blows up...

@samccann I plan to create a forum post on this but would like to get some thoughts from folks on the review list first. I'll also announce this on the bullhorn for at least two editions.

@oraNod oraNod requested a review from gundalow November 11, 2024 20:09
@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

Some problems (running same test as prior comment):

* 2.3 - 2.6 guides redirect to a strange location. (Prior to this PR, they would redirect to their related pages on /latest/) https://docs.ansible.com/ansible/2.6/installation_guide/intro_configuration.html is one example I used.

* vault redirect (2.3, 2.4) doesn't seem to work https://docs.ansible.com//ansible/2.4/vault.html

* vault redirects (2.5,2.6) go to archive instead of latest https://docs.ansible.com/ansible/2.5/user_guide/vault.html

thanks for the initial test findings! I'm going to create a separate PR for docs.testing.ansible.com so we can validate everything.

can you be more specific about the redirects you were testing in this comment, please?

* 2.3 - 2.6 guides redirect to a strange location. (Prior to this PR, they would redirect to their related pages on /latest/) https://docs.ansible.com/ansible/2.6/installation_guide/intro_configuration.html is one example I used.

Are you referring to the (plugins|modules) rules here?

RedirectMatch permanent "^/ansible/(2\.(10|[3-6]))/(plugins|modules)/(.+)\.html$" "https://docs.ansible.com/collections.html"

Or did you observe something weird with the specific guide redirects here?

# Vault redirects (2.3, 2.4, 2.5, 2.6)

There aren't any rules configured for installation_guide/intro_configuration.html so if you were looking at that page the weirdness could be from something else.

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

also @samccann I think that catch all rule at the end might be causing a problem:

# EOL Archive Redirects for all the rest

I haven't set this up on the test server yet (will do that tomorrow morning b/c vpn...) but I'm getting a server 500 error now. this rule might be causing an issue. specifically the $1 backreference might not be capturing the right path. I'm not sure but that could be what is behind that strange location you were hitting.

that catch all redirect came from the original config file that Toshio set up:

RedirectMatch permanent "^/ansible/2.6/?(.+)?.html" "/ansible/latest/$1.html"

I bet we could just nuke that. it's not even in all the old 2.x config files and likely more hassle than it's worth. I think we really only need to be concerned with the pre-collections plugins and modules stuff.

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

My general thoughts - though it does 'lower' the user experience ( aka I used to get to the correct module page from an ancient stack overflow page that predated collections), it is acceptable since collections are now 4+ years old. I agree with @oraNod that this is necessary both from a maintenance point of view (see k8s module currently 404s because we can't test/update 1k redirects to keep them all accurate), and from the overall stratey to move to RTD (which cannot handle 1k redirects anyway).

wrt user experience I think it will be possible to embed a straightforward enough search implementation with Javascript - or better ReadTheDocs query with ElasticSearch indexing - so that you can search through https://docs.ansible.com/ansible/latest/collections/* and find matching results.

at the same time I don't think we should go overkill with it and create another maintenance nightmare for ourselves, especially as collections are 4+ years old at this point. tbh avoiding 404s and degrading SEO authority from thousands of broken links is more of a concern than users not being taken directly to the corresponding collection.

now that I think about it we should also point users to the forum to ask for help as well as the collection index and other doc links...

@samccann
Copy link
Collaborator

samccann commented Nov 11, 2024

There aren't any rules configured for installation_guide/intro_configuration.html so if you were looking at that page the weirdness could be from something else.

So yes, based on that tool I was using, all the guides will go someplace wonky. So anything that is NOT a plugin or NOT that vault page, seems to end up in a strange place.

just tried https://docs.ansible.com/ansible/2.3/reference_appendices/YAMLSyntax.html and in that tool, it triggers this redirect:
RedirectMatch permanent "^/ansible/(2\.(10|[3-6]))/(.+)\.html$" "/ansible/latest/$1.html"
And ends up at https://docs.ansible.com/ansible/latest/2.3.html

So you are correct, it's caused by the combined redirect that should be pushing older EOL docs to latest.

@samccann
Copy link
Collaborator

that catch all redirect came from the original config file that Toshio set up:

RedirectMatch permanent "^/ansible/2.6/?(.+)?.html" "/ansible/latest/$1.html"

The catchall redirect came from me, not toshio. It's the redirect that takes all links to 2.6 and redirects to latest. We need to keep this category of redirects as that is what ensures those top google hits that used to point to 2.4-2.6 docs now end up on latest.

@samccann
Copy link
Collaborator

The vault redirects aren't working and I wonder if they ever did. I'll try to test this more tomorrow, but the original vault location for 2.3, 2.4 for example was https://docs.ansible.com/archive/ansible/2.4/vault.html but that doesn't match what I put in the EOL redirects in the 2.4 directory....

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 13, 2024

Thanks for the extra details @samccann - I'll have to come back to that. I was thinking that catchall redirect might have been causing a conflict with other redirect rules that was borking the test server.

^/ansible/2.6/?(.+)?.html" "/ansible/latest/$1.html

I think the good news here is that we can add these to Read The Docs like this:

Type: Exact Redirect
From URL: /ansible/2.6/*
To URL: /projects/ansible/latest/:splat

See https://docs.readthedocs.io/en/stable/user-defined-redirects.html#redirecting-a-directory

I know we're limited to 100 redirects per RTD project but if all this consolidation works as expected then we'll be well below that limit for the top-level project.

@oraNod oraNod force-pushed the redirect-collections branch from 7905896 to a4b3ab2 Compare November 13, 2024 17:55
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
@oraNod oraNod force-pushed the redirect-collections branch 2 times, most recently from d1dd82c to f64b482 Compare November 21, 2024 11:46
@oraNod oraNod requested a review from samccann November 21, 2024 11:46
@oraNod
Copy link
Collaborator Author

oraNod commented Nov 29, 2024

Testing the changes in this PR

I've been testing the consolidated redirect rules in the .htaccess configuration file in this PR. Sharing my findings here. I'll try to keep it as brief as possible.

Test methodology

I created a couple of scripts to iterate over URLs and return the http status code for pages in the Ansible package docs. You can find the scripts and all the test details here: https://github.com/oraNod/url-status-checker (I plan to add that Python script to this repo but I want to make a few tweaks to it first.)

I used docs.testing.ansible.com as the test environment because it contains Ansible 2.x and 3-11 builds of the packge docs. Before running any tests, I removed any existing .htaccess files to avoid conflicting redirect rules. I also temporarily modified the build jobs that were deploying to the test environment so changes weren't lost in the middle of a run.

Complete details about how everything works is available in the README here: https://github.com/oraNod/url-status-checker/blob/main/README.md

Test results

I've attached txt and csv files generated from the url checker script in this tarball:

redirect-reports.tar.gz

  • All plugins and modules pages return a 301 status and redirect to the collections.html page.
  • Ansible 3 - 10 pages return a 200 status and are not redirected.
  • Ansible 2.9 pages return a 200 status.
  • Redirect rules for Ansible 2.9 return a 404 status. This is deceptive though because the Ansible 2.9 redirects are a special case and used for the version switcher. When Ansible 2.9 is removed from the version switcher and reaches EOL, we can add that to the consolidated redirects and validate them.

Target page 404s

Approximately 870 target pages (the page to which you are redirected) return a 404 status.

A large number of these pages get created by the catch all redirect rules when a page from an older version gets redirected to a non-existent latest version. For example, this page in 2.10 docs: https://docs.ansible.com/ansible/2.10/collections/amazon/aws/aws_az_facts_module.html

The catch all rule constructs a redirect to https://docs.ansible.com/ansible/2.10/collections/amazon/aws/aws_az_facts_module.html which returns a 404 because the originating page is deprecated.

Another example are pages that get renamed, such as /dev_guide/developing_modules_python3.html which was renamed to /dev_guide/developing_python_3.html. There are also around 30 404 pages for various scenario guides and other pages that were moved or renamed, such as 2.x community roadmaps.

Many of these are 404 pages that already exist outside of this PR. For example:

Some of the 404 pages are caused by the catch all rules in this PR though. For example:

I think it's reasonable to accept the fact that older pages will break and 404s are going to happen. The target page 404s are all for Ansible 2.x versions and there are less than 900 of them out of 19,646 URLs tested. That is less than 5% of Ansible 2.x redirects leading to a 404 page.

@oraNod oraNod force-pushed the redirect-collections branch from 8a6da4a to a10f424 Compare December 4, 2024 18:40
link: "https://docs.ansible.com/ansible/latest/collections/all_plugins.html"
builtin_index:
label: Index of all modules and plugins contained in ansible-core
link: "https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if this link goes to https://docs.ansible.com/ansible-core/..., but unfortunately there's no latest equivalent for ansible-core...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.htaccess Outdated
#####################################################################

# Redirect plugin and module pages for devel and latest
RedirectMatch permanent "^/ansible/(devel|latest)/(plugins|modules)/(.+)\.html$" "/collections.html"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big downside of this redirect is that it breaks a lot of existing links on the web (in blog posts, stack overflow answers, mailing list answers, ...) from Ansible 2.9 and before that right now still work.

I have no idea how frequently these URLs are still used though. Finding the module/plugin in question isn't too hard with the new collections.html page, though, so I guess it will be OK...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look @felixfontein I can also see this is going to hose the links to these pages: https://docs.ansible.com/ansible/latest/plugins/

I overlooked those plugin pages, but glad I caught them. I'll raise this at the next DaWGs meeting but maybe we should remove this redirect rule and avoid globbing any latest or devel urls to the collections.html page.

Either that or we add a negative lookahead for the plugin pages which do exist, such as:

RedirectMatch permanent "^/ansible/(devel|latest)/modules/(.+)\.html$" "/collections.html"
RedirectMatch permanent "^/ansible/(devel|latest)/plugins/(?!action\.html|become\.html|cache\.html|callback\.html|cliconf\.html|connection\.html|docs_fragment\.html|filter\.html|httpapi\.html|inventory\.html|lookup\.html|module\.html|module_util\.html|netconf\.html|plugins\.html|shell\.html|strategy\.html|terminal\.html|test\.html|vars\.html)/(.+)\.html$" "/collections.html"

An alternative is to take all the latest source pages and redirect them to collections using the https://pypi.org/project/sphinx-reredirects/ extension.

It doesn't remove the maintenance overhead and maybe adds to the build time. We could probably add some regex to reduce the number of redirects that we'd need and point to specific collections, for example:

redirects = {
     "modules/azure_*": "collections/azure/index.html",
}

I guess we can see but we need to rethink the devel and latest rules here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works like a champ. I tried an experimental commit: oraNod/ansible-documentation@ec0790a

And deployed it to the test site on pages. All the latest redirects are there and it doesn't seem to add to the build time.

@oraNod
Copy link
Collaborator Author

oraNod commented Dec 18, 2024

@samccann (and all) Heads up that I've pushed a final commit: 467753a

This restores the redirects for any "latest" urls so that the changes in this PR are limited in scope to only the 2.x versions.

We've got a couple of options for the "latest" urls and can do them separately. I'll send related PRs and update the forum post to explain everything.

.htaccess Outdated
RedirectMatch permanent "^/ansible/2.6/vmware/index.html" "/ansible/latest/collections/community/vmware/index.html"

# EOL Archive Redirects for all the rest
RedirectMatch permanent "^/ansible/(2\.(10|[3-8]))/(.+)\.html$" "/ansible/latest/$3.html"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My redirect foo is bad - is this actually redirecting ansible 2.10, and 3-8 to latest? We don't have any archive for these docs so that would mean any current users would have no way to see those docs. Web analytics shows that Ansible 3-5 each have 5k visitors a quarter still (tho i don't know why cuz we never advertised that as a url. We've always had just latest and devel on the version switcher for package docs).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that is because of bookmarks. Should users be going to those pages if they are EOL?

This redirect is the "catch all" that will match ansible/2.10, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8/ in the url then redirect whatever is after that to the corresponding latest page.

For example ansible/2.10/foo.html redirects to ansible/latest/foo.html.

There are catch all redirects in the 2.3, 2.4, 2.5, and 2.6 files such as this one:

RedirectMatch permanent "^/ansible/2.3/?(.+)?.html" "/ansible/latest/$1.html"

However these don't actually work as intended because the backreference $1 points to the wrong part of the url. This was uncovered while doing all the testing.

No one is going to get a 404 out of this but they will get taken to the latest version of the docs. If someone is using a very old version of Ansible, I don't really see the harm in that to be honest.

It might also be an interesting thing to find out. If anyone yells at us, then we can have a conversation.

But maybe we should put all the 2.x docs up on the archive. I got all the files off the server and have them in tarballs so we could just move them without having to do a build. We can also deploy them to Read The Docs if we want to preserve all the old versions there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so I think the logic should be only for 2.3-2.7. We shouldn't redirect 2.10 or 2.8 because we have no archives for those docs (and last I checked, the jenkins jobs couldn't republish them).

To clarify - the 2.3-2.7 redirects were in place not because they were EOL, but because they were getting huge amounts of traffic from very old blogs/stack overflow etc and people were getting stale info. So instead, I published them to an /archive/ url, and then redirected each to latest after that.

And just because I was curious, we've had 900 visitors to the archive in the past 3 months. Given it's a 'newish' url, I'm guessing that many people took the time to dig out the old docs that we moved to the archive. (now the debate on whether all that effort was worthwhile... 🤷🏻

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samccann I've updated the catch all rule to apply to only 2.3 to 2.7. Note that this also meant changing the back reference from $3 to $2 because the regex changed. There are only 2 capture groups in the regex now instead of 3.

You can do a quick verification on https://htaccess.madewithlove.com/

Try something like https://docs.ansible.com/ansible/2.3/my_page.html as the url to which you want to apply the rule and then paste the rule into the form. It should indicate that the new url is https://docs.ansible.com/ansible/latest/my_page.html.

Another note here, I can see that 2.7 docs are still reachable. https://docs.ansible.com/ansible/2.7/index.html

The catch all seems to be in place for 2.3 to 2.6 currently. Should we make this rule apply to just those versions?

link: "https://docs.ansible.com/ansible/latest/collections/all_plugins.html"
builtin_index:
label: Index of all modules and plugins contained in ansible-core
link: "https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oraNod oraNod force-pushed the redirect-collections branch from 467753a to b172512 Compare January 20, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants