-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add collection redirect page and consolidate htaccess configuration files #261
base: main
Are you sure you want to change the base?
Conversation
templates/collections.html
Outdated
<ul class="fa-ul"> | ||
<li> | ||
<span class="fa-li"><i class="fas fa-book"></i></span> | ||
<a href="{{ index.quicklinks.collection_index.link }}" target="_blank">{{ index.quicklinks.collection_index.label }}</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
builtin modules are by far the most popular. Can we add a quick link here for https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html#plugins-in-ansible-builtin
I copied the .httaccess to https://htaccess.madewithlove.com/ and typed in a bunch of urls to see what it would redirect to:
So ansible (precollections) 2.3 through 2.6 redirect today to /latest/ and we republished those guides to an archive site so those who do still need them can fine them. We didn't do this with 2.7 and 2.8 because I couldn't republish them (jenkins updates made it too difficult so I gave up). Just putting that here for history on why these proposed new redirects don't go beyond 2.6. |
Some problems (running same test as prior comment):
|
My general thoughts - though it does 'lower' the user experience ( aka I used to get to the correct module page from an ancient stack overflow page that predated collections), it is acceptable since collections are now 4+ years old. I agree with @oraNod that this is necessary both from a maintenance point of view (see k8s module currently 404s because we can't test/update 1k redirects to keep them all accurate), and from the overall stratey to move to RTD (which cannot handle 1k redirects anyway). |
oh last thoughts - once we get enough approvers, we should blast out a warning in matrix before we merge (and don't merge on a Friday lol) in case it all blows up... |
@samccann I plan to create a forum post on this but would like to get some thoughts from folks on the review list first. I'll also announce this on the bullhorn for at least two editions. |
thanks for the initial test findings! I'm going to create a separate PR for can you be more specific about the redirects you were testing in this comment, please?
Are you referring to the Line 22 in bee311b
Or did you observe something weird with the specific guide redirects here? Line 24 in bee311b
There aren't any rules configured for |
also @samccann I think that catch all rule at the end might be causing a problem: Line 37 in bee311b
I haven't set this up on the test server yet (will do that tomorrow morning b/c vpn...) but I'm getting a server 500 error now. this rule might be causing an issue. specifically the that catch all redirect came from the original config file that Toshio set up: Line 4051 in 2ee83dc
I bet we could just nuke that. it's not even in all the old 2.x config files and likely more hassle than it's worth. I think we really only need to be concerned with the pre-collections plugins and modules stuff. |
wrt user experience I think it will be possible to embed a straightforward enough search implementation with Javascript - or better ReadTheDocs query with ElasticSearch indexing - so that you can search through at the same time I don't think we should go overkill with it and create another maintenance nightmare for ourselves, especially as collections are 4+ years old at this point. tbh avoiding 404s and degrading SEO authority from thousands of broken links is more of a concern than users not being taken directly to the corresponding collection. now that I think about it we should also point users to the forum to ask for help as well as the collection index and other doc links... |
So yes, based on that tool I was using, all the guides will go someplace wonky. So anything that is NOT a plugin or NOT that vault page, seems to end up in a strange place. just tried https://docs.ansible.com/ansible/2.3/reference_appendices/YAMLSyntax.html and in that tool, it triggers this redirect: So you are correct, it's caused by the combined redirect that should be pushing older EOL docs to latest. |
The catchall redirect came from me, not toshio. It's the redirect that takes all links to 2.6 and redirects to latest. We need to keep this category of redirects as that is what ensures those top google hits that used to point to 2.4-2.6 docs now end up on latest. |
The vault redirects aren't working and I wonder if they ever did. I'll try to test this more tomorrow, but the original vault location for 2.3, 2.4 for example was https://docs.ansible.com/archive/ansible/2.4/vault.html but that doesn't match what I put in the EOL redirects in the 2.4 directory.... |
Thanks for the extra details @samccann - I'll have to come back to that. I was thinking that catchall redirect might have been causing a conflict with other redirect rules that was borking the test server.
I think the good news here is that we can add these to Read The Docs like this:
See https://docs.readthedocs.io/en/stable/user-defined-redirects.html#redirecting-a-directory I know we're limited to 100 redirects per RTD project but if all this consolidation works as expected then we'll be well below that limit for the top-level project. |
7905896
to
a4b3ab2
Compare
d1dd82c
to
f64b482
Compare
Testing the changes in this PRI've been testing the consolidated redirect rules in the Test methodologyI created a couple of scripts to iterate over URLs and return the http status code for pages in the Ansible package docs. You can find the scripts and all the test details here: https://github.com/oraNod/url-status-checker (I plan to add that Python script to this repo but I want to make a few tweaks to it first.) I used Complete details about how everything works is available in the README here: https://github.com/oraNod/url-status-checker/blob/main/README.md Test resultsI've attached txt and csv files generated from the url checker script in this tarball:
Target page 404sApproximately 870 target pages (the page to which you are redirected) return a 404 status. A large number of these pages get created by the catch all redirect rules when a page from an older version gets redirected to a non-existent latest version. For example, this page in 2.10 docs: https://docs.ansible.com/ansible/2.10/collections/amazon/aws/aws_az_facts_module.html The catch all rule constructs a redirect to Another example are pages that get renamed, such as Many of these are 404 pages that already exist outside of this PR. For example:
Some of the 404 pages are caused by the catch all rules in this PR though. For example:
I think it's reasonable to accept the fact that older pages will break and 404s are going to happen. The target page 404s are all for Ansible 2.x versions and there are less than 900 of them out of 19,646 URLs tested. That is less than 5% of Ansible 2.x redirects leading to a 404 page. |
8a6da4a
to
a10f424
Compare
link: "https://docs.ansible.com/ansible/latest/collections/all_plugins.html" | ||
builtin_index: | ||
label: Index of all modules and plugins contained in ansible-core | ||
link: "https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nicer if this link goes to https://docs.ansible.com/ansible-core/...
, but unfortunately there's no latest
equivalent for ansible-core...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about https://docs.ansible.com/ansible-core/devel ?
.htaccess
Outdated
##################################################################### | ||
|
||
# Redirect plugin and module pages for devel and latest | ||
RedirectMatch permanent "^/ansible/(devel|latest)/(plugins|modules)/(.+)\.html$" "/collections.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One big downside of this redirect is that it breaks a lot of existing links on the web (in blog posts, stack overflow answers, mailing list answers, ...) from Ansible 2.9 and before that right now still work.
I have no idea how frequently these URLs are still used though. Finding the module/plugin in question isn't too hard with the new collections.html page, though, so I guess it will be OK...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @felixfontein I can also see this is going to hose the links to these pages: https://docs.ansible.com/ansible/latest/plugins/
I overlooked those plugin pages, but glad I caught them. I'll raise this at the next DaWGs meeting but maybe we should remove this redirect rule and avoid globbing any latest or devel urls to the collections.html
page.
Either that or we add a negative lookahead for the plugin pages which do exist, such as:
RedirectMatch permanent "^/ansible/(devel|latest)/modules/(.+)\.html$" "/collections.html"
RedirectMatch permanent "^/ansible/(devel|latest)/plugins/(?!action\.html|become\.html|cache\.html|callback\.html|cliconf\.html|connection\.html|docs_fragment\.html|filter\.html|httpapi\.html|inventory\.html|lookup\.html|module\.html|module_util\.html|netconf\.html|plugins\.html|shell\.html|strategy\.html|terminal\.html|test\.html|vars\.html)/(.+)\.html$" "/collections.html"
An alternative is to take all the latest source pages and redirect them to collections using the https://pypi.org/project/sphinx-reredirects/ extension.
It doesn't remove the maintenance overhead and maybe adds to the build time. We could probably add some regex to reduce the number of redirects that we'd need and point to specific collections, for example:
redirects = {
"modules/azure_*": "collections/azure/index.html",
}
I guess we can see but we need to rethink the devel and latest rules here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that works like a champ. I tried an experimental commit: oraNod/ansible-documentation@ec0790a
And deployed it to the test site on pages. All the latest redirects are there and it doesn't seem to add to the build time.
@samccann (and all) Heads up that I've pushed a final commit: 467753a This restores the redirects for any "latest" urls so that the changes in this PR are limited in scope to only the 2.x versions. We've got a couple of options for the "latest" urls and can do them separately. I'll send related PRs and update the forum post to explain everything. |
.htaccess
Outdated
RedirectMatch permanent "^/ansible/2.6/vmware/index.html" "/ansible/latest/collections/community/vmware/index.html" | ||
|
||
# EOL Archive Redirects for all the rest | ||
RedirectMatch permanent "^/ansible/(2\.(10|[3-8]))/(.+)\.html$" "/ansible/latest/$3.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My redirect foo is bad - is this actually redirecting ansible 2.10, and 3-8 to latest? We don't have any archive for these docs so that would mean any current users would have no way to see those docs. Web analytics shows that Ansible 3-5 each have 5k visitors a quarter still (tho i don't know why cuz we never advertised that as a url. We've always had just latest and devel on the version switcher for package docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if that is because of bookmarks. Should users be going to those pages if they are EOL?
This redirect is the "catch all" that will match ansible/2.10, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8/
in the url then redirect whatever is after that to the corresponding latest page.
For example ansible/2.10/foo.html
redirects to ansible/latest/foo.html
.
There are catch all redirects in the 2.3, 2.4, 2.5, and 2.6 files such as this one:
Line 4050 in a4ddf59
RedirectMatch permanent "^/ansible/2.3/?(.+)?.html" "/ansible/latest/$1.html" |
However these don't actually work as intended because the backreference $1
points to the wrong part of the url. This was uncovered while doing all the testing.
No one is going to get a 404 out of this but they will get taken to the latest version of the docs. If someone is using a very old version of Ansible, I don't really see the harm in that to be honest.
It might also be an interesting thing to find out. If anyone yells at us, then we can have a conversation.
But maybe we should put all the 2.x docs up on the archive. I got all the files off the server and have them in tarballs so we could just move them without having to do a build. We can also deploy them to Read The Docs if we want to preserve all the old versions there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so I think the logic should be only for 2.3-2.7. We shouldn't redirect 2.10 or 2.8 because we have no archives for those docs (and last I checked, the jenkins jobs couldn't republish them).
To clarify - the 2.3-2.7 redirects were in place not because they were EOL, but because they were getting huge amounts of traffic from very old blogs/stack overflow etc and people were getting stale info. So instead, I published them to an /archive/ url, and then redirected each to latest after that.
And just because I was curious, we've had 900 visitors to the archive in the past 3 months. Given it's a 'newish' url, I'm guessing that many people took the time to dig out the old docs that we moved to the archive. (now the debate on whether all that effort was worthwhile... 🤷🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samccann I've updated the catch all rule to apply to only 2.3 to 2.7. Note that this also meant changing the back reference from $3
to $2
because the regex changed. There are only 2 capture groups in the regex now instead of 3.
You can do a quick verification on https://htaccess.madewithlove.com/
Try something like https://docs.ansible.com/ansible/2.3/my_page.html
as the url to which you want to apply the rule and then paste the rule into the form. It should indicate that the new url is https://docs.ansible.com/ansible/latest/my_page.html
.
Another note here, I can see that 2.7 docs are still reachable. https://docs.ansible.com/ansible/2.7/index.html
The catch all seems to be in place for 2.3 to 2.6 currently. Should we make this rule apply to just those versions?
link: "https://docs.ansible.com/ansible/latest/collections/all_plugins.html" | ||
builtin_index: | ||
label: Index of all modules and plugins contained in ansible-core | ||
link: "https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about https://docs.ansible.com/ansible-core/devel ?
467753a
to
b172512
Compare
This PR makes the following changes:
As a result of this change users that access plugin and module pages will be redirected to collections.html. For example, this link is available from an
ansible.com
blog post: https://docs.ansible.com/ansible/latest/modules/k8s_module.htmlA rule exists to redirect this page to the corresponding collection page, which has been moved and results in a 404 when the redirect is followed:
docsite/ansible/10/.htaccess
Line 2466 in 2ee83dc
Instead of having to maintain all the rules in the htaccess configuration files between releases, we can redirect users to a single page. While true that this will require users to search for the appropriate documentation, it does avoid 404s and any resulting SEO degradation.
For any reviewers, here's a link to the
collections.html
page on the RTD preview build: https://ansible--261.org.readthedocs.build/collections.htmlTo evaluate these changes on the test server, use this branch in my fork because it tests the
collections.html
page on the test server. Note to say that we can't actually evaluate these changes with the test server because the rsync command that the jenkins job uses doesn't include the--delete
flag so the old.htaccess
configuration files aren't getting removed. They aren't playing nicely with the redirects here. Plan B is to stand up a test server somewhere. I don't want to prune old files off the actual test server just to validate these redirects.