Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 31 additions & 6 deletions vtex/handlers/sitemap.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,35 @@ const includeSiteMaps = (
: currentXML;
};

const excludeSitemapEntries = (
xml: string,
exclude?: string[],
): string => {
if (!exclude?.length) return xml;
return xml.replace(
/<sitemap>\s*<loc>([^<]*)<\/loc>[\s\S]*?<\/sitemap>/gi,
(block, loc) => {
const shouldExclude = exclude.some(
(entry) => loc.includes(entry) || loc.endsWith(entry),
);
return shouldExclude ? "" : block;
},
);
};

export interface Props {
include?: string[];
/**
* @title Sitemap entries to remove from the sitemap index
* @description URLs or path suffixes to match; any <sitemap> whose <loc> contains or ends with one of these will be removed.
*/
excludeSiteMapEntry?: string[];
}
/**
* @title Sitemap Proxy
*/
export default function Sitemap(
{ include }: Props,
{ include, excludeSiteMapEntry }: Props,
{ publicUrl: url, usePortalSitemap, account }: AppContext,
) {
return async (
Expand Down Expand Up @@ -62,12 +83,16 @@ export default function Sitemap(
const reqUrl = new URL(req.url);
const text = await response.text();

const withIncludes = includeSiteMaps(
text.replaceAll(publicUrl, `${reqUrl.origin}/`),
reqUrl.origin,
include,
);

const filtered = excludeSitemapEntries(withIncludes, excludeSiteMapEntry);

return new Response(
includeSiteMaps(
text.replaceAll(publicUrl, `${reqUrl.origin}/`),
reqUrl.origin,
include,
),
filtered,
Comment on lines +86 to +95
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exclude upstream entries before re-injecting custom ones.

Because vtex/loaders/proxy.ts:131-155 pushes includeSiteMapWithHandler paths into include, filtering after includeSiteMaps() removes the replacement entry too. Excluding /sitemap/foo.xml and re-adding the same path currently leaves no <sitemap> block for it in /sitemap.xml.

🐛 Proposed fix
-    const withIncludes = includeSiteMaps(
-      text.replaceAll(publicUrl, `${reqUrl.origin}/`),
-      reqUrl.origin,
-      include,
-    );
-
-    const filtered = excludeSitemapEntries(withIncludes, excludeSiteMapEntry);
+    const withoutExcluded = excludeSitemapEntries(
+      text.replaceAll(publicUrl, `${reqUrl.origin}/`),
+      excludeSiteMapEntry,
+    );
+
+    const filtered = includeSiteMaps(
+      withoutExcluded,
+      reqUrl.origin,
+      include,
+    );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const withIncludes = includeSiteMaps(
text.replaceAll(publicUrl, `${reqUrl.origin}/`),
reqUrl.origin,
include,
);
const filtered = excludeSitemapEntries(withIncludes, excludeSiteMapEntry);
return new Response(
includeSiteMaps(
text.replaceAll(publicUrl, `${reqUrl.origin}/`),
reqUrl.origin,
include,
),
filtered,
const withoutExcluded = excludeSitemapEntries(
text.replaceAll(publicUrl, `${reqUrl.origin}/`),
excludeSiteMapEntry,
);
const filtered = includeSiteMaps(
withoutExcluded,
reqUrl.origin,
include,
);
return new Response(
filtered,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vtex/handlers/sitemap.ts` around lines 86 - 95, The current flow calls
includeSiteMaps(...) then excludeSitemapEntries(...), which removes any upstream
entries that were just replaced and prevents re-injecting custom sitemap
entries; instead call excludeSitemapEntries on the original response content
(the value passed into includeSiteMaps — i.e., text.replaceAll(publicUrl,
`${reqUrl.origin}/`)) first using the same excludeSiteMapEntry predicate, then
pass that filtered content into includeSiteMaps along with reqUrl.origin and
include so upstream entries are removed before re-injection; update references
in sitemap handler (functions includeSiteMaps, excludeSitemapEntries and the
variables text, publicUrl, reqUrl.origin, include, excludeSiteMapEntry) and
return the result of includeSiteMaps(filteredContent, reqUrl.origin, include).

{
headers: response.headers,
status: response.status,
Expand Down
86 changes: 76 additions & 10 deletions vtex/loaders/proxy.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,23 +23,59 @@ export const VTEX_PATHS_THAT_REQUIRES_SAME_REFERER = ["/no-cache/AviseMe.aspx"];

const decoSiteMapUrl = "/sitemap/deco.xml";

/** @title {{__title}} */
export interface IncludeSiteMapEntry {
/** @title Title (CMS only) */
__title?: string;
/** @title Path */
path: string;
/** @title Handler */
handler?: string;
/** @title Paths to exclude from the sitemap */
excludePaths?: string[];
}

const normalizeIncludeSiteMap = (
includeSiteMap?: IncludeSiteMapEntry[],
): IncludeSiteMapEntry[] => (includeSiteMap ?? []);

const includeEntriesToPaths = (entries: IncludeSiteMapEntry[]): string[] =>
entries.map((e) => e.path);
Comment on lines +38 to +43
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize and dedupe sitemap paths before materializing them.

As vtex/handlers/sitemap.ts:8-27 currently emits one <sitemap> block per include item, the raw concatenation here can produce duplicate <loc> entries. Repeated IncludeSiteMapEntry.path values also generate duplicate custom routes.

♻️ Proposed fix
 const normalizeIncludeSiteMap = (
   includeSiteMap?: IncludeSiteMapEntry[],
-): IncludeSiteMapEntry[] => (includeSiteMap ?? []);
+): IncludeSiteMapEntry[] => {
+  const byPath = new Map<string, IncludeSiteMapEntry>();
+  for (const entry of includeSiteMap ?? []) {
+    byPath.set(entry.path, entry);
+  }
+  return [...byPath.values()];
+};
+
+const dedupePaths = (paths: string[]) => [...new Set(paths)];
 
 const includeEntriesToPaths = (entries: IncludeSiteMapEntry[]): string[] =>
   entries.map((e) => e.path);
@@
     const [include, routes] = generateDecoSiteMap
       ? [
-        [
+        dedupePaths([
           ...includeEntriesToPaths(entries),
           ...(includeSiteMap ?? []),
           decoSiteMapUrl,
-        ],
+        ]),
         [
           ...customSitemapRoutes,
           {
@@
       ]
       : [
-        [...includeEntriesToPaths(entries), ...(includeSiteMap ?? [])],
+        dedupePaths([
+          ...includeEntriesToPaths(entries),
+          ...(includeSiteMap ?? []),
+        ]),
         customSitemapRoutes,
       ];

Also applies to: 131-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vtex/loaders/proxy.ts` around lines 38 - 43, normalizeIncludeSiteMap and
includeEntriesToPaths currently return/emit raw entries which can contain
duplicate IncludeSiteMapEntry.path values causing duplicate <loc> and custom
routes; update normalizeIncludeSiteMap to both normalize (default to empty
array) and deduplicate entries by path, and change includeEntriesToPaths to
produce a list of unique, normalized paths (e.g., map to path, filter out falsy,
use a Set or keyed dedupe) so downstream sitemap materialization and route
generation use unique paths; apply the same dedupe/normalization logic to the
analogous functions around the other block mentioned (the functions referenced
by includeEntriesToPaths and normalizeIncludeSiteMap in the later section).


const includeEntriesToRoutes = (entries: IncludeSiteMapEntry[]): Route[] =>
entries
.map(({ path, handler, excludePaths, ...rest }) => ({
pathTemplate: path,
handler: {
value: {
excludePaths,
__resolveType: handler ?? "website/handlers/sitemap.ts",
...rest,
},
},
}));

const buildProxyRoutes = (
{
publicUrl,
extraPaths,
includeSiteMap,
includeSiteMapWithHandler,
includePathToDecoSitemap,
generateDecoSiteMap,
excludePathsFromDecoSiteMap,
excludeSiteMapEntry,
includeScriptsToHead,
includeScriptsToBody,
}: {
publicUrl?: string;
extraPaths: string[];
includeSiteMap?: string[];
includeSiteMapWithHandler?: IncludeSiteMapEntry[];
includePathToDecoSitemap?: string[];
generateDecoSiteMap?: boolean;
excludePathsFromDecoSiteMap: string[];
excludeSiteMapEntry?: string[];
includeScriptsToHead?: {
includes?: Script[];
};
Expand All @@ -53,6 +89,8 @@ const buildProxyRoutes = (
}

try {
const entries = normalizeIncludeSiteMap(includeSiteMapWithHandler);
const customSitemapRoutes = includeEntriesToRoutes(entries);
const hostname = (new URL(
publicUrl?.startsWith("http") ? publicUrl : `https://${publicUrl}`,
)).hostname;
Expand Down Expand Up @@ -91,17 +129,30 @@ const buildProxyRoutes = (
);

const [include, routes] = generateDecoSiteMap
? [[...(includeSiteMap ?? []), decoSiteMapUrl], [{
pathTemplate: decoSiteMapUrl,
handler: {
value: {
excludePaths: excludePathsFromDecoSiteMap,
includePaths: includePathToDecoSitemap,
__resolveType: "website/handlers/sitemap.ts",
? [
[
...includeEntriesToPaths(entries),
...(includeSiteMap ?? []),
decoSiteMapUrl,
],
[
...customSitemapRoutes,
{
pathTemplate: decoSiteMapUrl,
handler: {
value: {
excludePaths: excludePathsFromDecoSiteMap,
includePaths: includePathToDecoSitemap,
__resolveType: "website/handlers/sitemap.ts",
},
},
},
},
}]]
: [includeSiteMap, []];
],
]
: [
[...includeEntriesToPaths(entries), ...(includeSiteMap ?? [])],
customSitemapRoutes,
];

return [
...routes,
Expand All @@ -110,6 +161,7 @@ const buildProxyRoutes = (
handler: {
value: {
include,
excludeSiteMapEntry,
__resolveType: "vtex/handlers/sitemap.ts",
},
},
Expand Down Expand Up @@ -137,6 +189,11 @@ export interface Props {
* @title Other site maps to include
*/
includeSiteMap?: string[];
/**
* @title Other site maps to include
* @description URL path (e.g. "/sitemap/blog.xml") or object with path + handler (__resolveType) to register a route and add to the index. Use the object form for dynamic sitemaps (e.g. from Sanity).
*/
includeSiteMapWithHandler?: IncludeSiteMapEntry[];
Comment on lines +192 to +196
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The schema description still implies raw strings are valid here.

includeSiteMapWithHandler is typed as IncludeSiteMapEntry[], but this text reads like the field accepts either "/sitemap/foo.xml" or an object. That is likely to trip up anyone editing site.json directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vtex/loaders/proxy.ts` around lines 192 - 196, The schema comment for
includeSiteMapWithHandler is misleading because IncludeSiteMapEntry[] does not
accept raw string paths; update the JSDoc above includeSiteMapWithHandler to
remove the suggestion that a plain string like "/sitemap/blog.xml" is valid and
instead state that entries must be objects with path and handler (mention
__resolveType for dynamic sitemaps) or, if you intend to support raw strings,
change the TypeScript type to include string (e.g., IncludeSiteMapEntry |
string) and update related validation/consumers accordingly; reference
includeSiteMapWithHandler and IncludeSiteMapEntry when making the change.

/**
* @title Paths to include in the deco sitemap
*/
Expand All @@ -149,6 +206,11 @@ export interface Props {
* @title Exclude paths from /deco-sitemap.xml
*/
excludePathsFromDecoSiteMap?: string[];
/**
* @title Sitemap entries to remove from the sitemap index
* @description Path or URL substrings; any &lt;sitemap&gt; in /sitemap.xml whose &lt;loc&gt; contains one of these will be removed.
*/
excludeSiteMapEntry?: string[];
/**
* @title Scripts to include on Html head
*/
Expand All @@ -171,9 +233,11 @@ function loader(
{
extraPathsToProxy = [],
includeSiteMap = [],
includeSiteMapWithHandler = [],
includePathToDecoSitemap = [],
generateDecoSiteMap = true,
excludePathsFromDecoSiteMap = [],
excludeSiteMapEntry = [],
includeScriptsToHead = { includes: [] },
includeScriptsToBody = { includes: [] },
}: Props,
Expand All @@ -183,7 +247,9 @@ function loader(
return buildProxyRoutes({
generateDecoSiteMap,
excludePathsFromDecoSiteMap,
excludeSiteMapEntry,
includeSiteMap,
includeSiteMapWithHandler,
includePathToDecoSitemap,
publicUrl: ctx.publicUrl,
extraPaths: extraPathsToProxy,
Expand Down
Loading