feat(algolia): add facets based on version for search#1051
feat(algolia): add facets based on version for search#1051haarchri wants to merge 3 commits intocrossplane:masterfrom
Conversation
✅ Deploy Preview for crossplane ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
jbw976
left a comment
There was a problem hiding this comment.
This is very interesting @haarchri! Hopefully everything works smoothly after this PR gets merged 😂
One immediate question for you - do you know what the UX is going to be for showing multiple versions in the search? will the search popup make it clear which version each result is from? Will it favor newer versions over older versions? are you able to see this in your local testing, or do we have to merge this, update Algolia config, and see what happens?
jbw976
left a comment
There was a problem hiding this comment.
RE: the Algolia config instructions
Add:
filterOnly(docsearch:version)
to Attributes for faceting
Looks like we have a version attribute for faceting already on https://dashboard.algolia.com/apps/9UXKYX61NK/explorer/configuration/crossplane/facets, is that useful?
I was able to do a network inspect on the preview site and when searching it does look like facetFilters are being sent in the request, so that's a good sign 😉
facetFilters=%5B%22docsearch%3Aversion%3A0.0.0-master%22%5D
|
@jbw976 the facet needs to match the tagName ... can you add |
|
added |
and i think the answer to this is that we don't show multiple versions in the results - we only show results for the version you're currently on. is that right @haarchri? |
|
by the way, trying to also make sense of this comment in the the netlify_build.sh: is this configured somewhere in this docs repo that we need to update? on the algolia side? i haven't found it yet 😇 edit: ah, i think it's controlled by the sitemap.xml: https://docs.crossplane.io/sitemap.xml, which is generated from https://github.com/crossplane/docs/blob/master/themes/geekboot/layouts/_default/sitemap.xml. Algolia doesn't appear to be configured to only look at latest, our sitemap/canonical/permalinks/etc. make it so any search engine (including Algolia) will only look at /latest and /contribute. So i think we'd have to update that logic in this PR also for Algolia to start indexing those? If so, does that mean that google etc. will start indexing old versions too? we probably don't want old versions showing up in google results, only having /latest there is probably the right thing to do. this is starting to get confusing, what do you think @haarchri? 😂 |
Signed-off-by: Christopher Haar <christopher.haar@upbound.io>
2f73f76 to
6afdf98
Compare
Signed-off-by: Christopher Haar <christopher.haar@upbound.io>
Signed-off-by: Christopher Haar <christopher.haar@upbound.io>
|
updated crawler config - removed appId and appKey for comparison old: new Crawler({
rateLimit: 8,
maxDepth: 10,
renderJavaScript: false,
sitemaps: ["https://docs.crossplane.io/sitemap.xml"],
ignoreCanonicalTo: false,
schedule: "at 01:00 am",
actions: [
{
name: "CrossplaneCrawler",
indexName: "crossplane",
pathsToMatch: [
"https://docs.crossplane.io/latest/**",
"https://docs.crossplane.io/knowledge-base/**",
"https://docs.crossplane.io/contribute/**",
"!https://docs.crossplane.io/latest/concepts/composition/",
],
recordExtractor: ({ $, helpers }) => {
// Strip all the headers out of the content in order to index them outside of the content
const headlines = $(
".DocSearch-content h1, .DocSearch-content h2, .DocSearch-content h3, .DocSearch-content h4,.DocSearch-content h5, .DocSearch-content h6",
);
headlines.each(function (i, elem) {
$(this).remove();
$(".DocSearch-content").append($(this));
});
// Collection of elements to remove from the content
const removeElems = $(
".DocSearch-content .admonition-title, .DocSearch-content .lnlinks, .DocSearch-content table thead",
);
removeElems.each(function (i, elem) {
$(this).remove();
});
// Collection of elements to wrap in spaces
const wrapElems = $(
".DocSearch-content .admonition-content, .DocSearch-content p, .DocSearch-content table td",
);
wrapElems.each(function (i, elem) {
var innerText = $(this).text();
$(this).text(" " + innerText + " ");
});
// Collection of elements to add a period and trailing space
const periodElems = $(".DocSearch-content li");
periodElems.each(function (i, elem) {
var innerText = $(this).text();
$(this).text(innerText + ". ");
});
return helpers.docsearch({
recordProps: {
lvl0: {
selectors: [".expansion-link", "h1"],
},
lvl1: [".kind", "h2"],
lvl2: ["h3"],
lvl3: ["h4"],
lvl4: ["h5"],
lvl5: ["h6"],
content: [".description", ".DocSearch-content"],
},
aggregateContent: true,
recordVersion: "v3",
});
},
},
],
appId: "xxxx",
apiKey: "xxxx",
});new: new Crawler({
rateLimit: 8,
maxDepth: 10,
renderJavaScript: false,
sitemaps: ["https://docs.crossplane.io/sitemap.xml"],
ignoreCanonicalTo: true,
schedule: "at 01:00 am",
actions: [
{
name: "CrossplaneCrawler",
indexName: "crossplane",
pathsToMatch: [
"https://docs.crossplane.io/latest/**",
"https://docs.crossplane.io/master/**",
"https://docs.crossplane.io/v2.1/**",
"https://docs.crossplane.io/v2.0/**",
"https://docs.crossplane.io/v1.20/**",
"https://docs.crossplane.io/knowledge-base/**",
"https://docs.crossplane.io/contribute/**",
"!https://docs.crossplane.io/latest/concepts/composition/",
],
recordExtractor: ({ $, helpers }) => {
// Strip all the headers out of the content in order to index them outside of the content
const headlines = $(
".DocSearch-content h1, .DocSearch-content h2, .DocSearch-content h3, .DocSearch-content h4,.DocSearch-content h5, .DocSearch-content h6",
);
headlines.each(function (i, elem) {
$(this).remove();
$(".DocSearch-content").append($(this));
});
// Collection of elements to remove from the content
const removeElems = $(
".DocSearch-content .admonition-title, .DocSearch-content .lnlinks, .DocSearch-content table thead, .DocSearch-content .bd-callout-info",
);
removeElems.each(function (i, elem) {
$(this).remove();
});
// Collection of elements to wrap in spaces
const wrapElems = $(
".DocSearch-content .admonition-content, .DocSearch-content p, .DocSearch-content table td",
);
wrapElems.each(function (i, elem) {
var innerText = $(this).text();
$(this).text(" " + innerText + " ");
});
// Collection of elements to add a period and trailing space
const periodElems = $(".DocSearch-content li");
periodElems.each(function (i, elem) {
var innerText = $(this).text();
$(this).text(innerText + ". ");
});
// Extract version from meta tag
const version =
$('meta[name="docsearch:version"]').attr("content") || "";
const records = helpers.docsearch({
recordProps: {
lvl0: {
selectors: [".expansion-link", "h1"],
},
lvl1: [".kind", "h2"],
lvl2: ["h3"],
lvl3: ["h4"],
lvl4: ["h5"],
lvl5: ["h6"],
content: [".description", ".DocSearch-content"],
},
aggregateContent: true,
recordVersion: "v3",
});
return records.map((record) => ({
...record,
version: version,
}));
},
},
],
appId: "xxxx",
apiKey: "xxxx",
}); |





When browsing older documentation versions (for example, v1.20), search results currently all content from latest....
This leads to confusion when users discover features or APIs in search results that don’t actually exist in the version they’re viewing.
This PR introduces version-aware search filtering using Algolia facet filters. https://www.algolia.com/doc/guides/managing-results/refine-results/faceting
Search results will be restricted to the documentation version the user is currently viewing.
.Page.Params.version.facetFiltersvalue dynamically:version:0.0.0-master, which matches the existing meta tag convention where master is labeled0.0.0-masterto keep it de-ranked in searchBackground
Each page already includes:
which the Algolia crawler indexes.
Required: Algolia configuration
After this PR is merged, someone with Algolia admin access must enable faceting:
Open Algolia Dashboard
Go to the crossplane index
Navigate to Configuration → Facets
Add:
searchable(version)
to Attributes for faceting
Save
Testing:
To verify this PR:
we see the facetFilters
Fixes: #1008