Update Broken link Script #3808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

logu-c8y wants to merge 13 commits into develop from Update_broken_link_script

Contributor

logu-c8y commented Oct 15, 2025

No description provided.


          Fix the script to extract the correct glossary links

d244563

logu-c8y requested a review from BeateRixen as a code owner

October 15, 2025 10:52

Contributor

github-actions bot commented Oct 15, 2025

Preview available here


          Update uncaught exceptions

fc7b046

carlosceia approved these changes

View reviewed changes

BeateRixen approved these changes

View reviewed changes

pawel-rynarzewski-c8y reviewed

View reviewed changes

broken-links-script/Extractlinks.js Outdated Show resolved Hide resolved

broken-links-script/Extractlinks.js Outdated Show resolved Hide resolved

logu-c8y and others added 5 commits

October 17, 2025 13:41


          Update broken-links-script/Extractlinks.js

2a8d7d8

Co-authored-by: Paweł Rynarzewski <92171763+pawel-rynarzewski-c8y@users.noreply.github.com>


          handle valid npm package URLs and exclude private github repositories…

1a94888

… links in link checker


          fix: resolve merge conflicts in Extractlinks.js

faf4296


          Update the link checker to test the fragments

f11e465


          Merge branch 'develop' into Update_broken_link_script

72cbea4

pawel-rynarzewski-c8y reviewed

View reviewed changes

broken-links-script/cypress/e2e/link-checker.cy.js Outdated Show resolved Hide resolved

broken-links-script/cypress/e2e/link-checker.cy.js Outdated Show resolved Hide resolved

broken-links-script/cypress/e2e/link-checker.cy.js Outdated Show resolved Hide resolved


          Remove normalize fragments and privategithubrepository condition

a07f658

pawel-rynarzewski-c8y reviewed

View reviewed changes

broken-links-script/cypress/e2e/link-checker.cy.js Outdated Show resolved Hide resolved

broken-links-script/cypress/e2e/link-checker.cy.js

Comment on lines +17 to +26

    
                  const iframes = doc.querySelectorAll('iframe, frame');

                  for (const frame of iframes) {

                    try {

                      const frameDoc = frame.contentDocument || frame.contentWindow?.document;

                      if (frameDoc) {

                        allFragments = allFragments.concat(collectFragments(frameDoc));

                      }

                    } catch (e) {

                    }

                  }

Contributor

pawel-rynarzewski-c8y Oct 24, 2025

Can you give me example of the case you're addressing here? I've made some tests and if you have a document with iframe, and iframe contains an anchor, you cannot use url like /main-document.html#a-name-from-iframe to link to the anchor inside the iframe, i.e. user won't be scrolled to the right position within the iframe.

Contributor Author

logu-c8y Oct 28, 2025

The intention here isn’t to support direct navigation to iframe anchors via #fragment, but to verify that referenced anchors (even inside embedded documents) actually exist. We have some pages that load documentation or content in iframes, so this logic helps our validation detect missing anchors there too.

broken-links-script/cypress/e2e/link-checker.cy.js Outdated Show resolved Hide resolved

broken-links-script/cypress/e2e/link-checker.cy.js

Comment on lines +80 to +94

    
                      const m = url.match(/^https:\/\/www\.npmjs\.com\/package\/(@[^/]+\/[^#?]+)/);

                      const pkg = m ? m[1] : null;

                      const encodedUrl = pkg ? url.replace(pkg, encodeURIComponent(pkg)) : url;

                      if (pkg) {

                        cy.request({

                          url: `https://registry.npmjs.org/${pkg}`,

                          failOnStatusCode: false,

                          headers: { Accept: 'application/vnd.npm.install-v1+json' }

                        }).then((res) => {

                          expect(res.status, `npm registry status for ${pkg}`).to.eq(200);

                        });

                      }

                      cy.visit(encodedUrl, { timeout: 50000, failOnStatusCode: false });

                      cy.url().should('include', '/package/%40');

Contributor

pawel-rynarzewski-c8y Oct 24, 2025

Can you explain this logic? I see you match only packages with scoped names starting with @ and currently we only have such urls in docs, but I wouldn't make it that specific. Could we assume that anything after https://www.npmjs.com/package/ is a full package name?
I also read about constraints on accessing https://www.npmjs.com by bots and that we need to use registry instead. But what's the purpose of then encoding url, visiting it and checking for %40?

Contributor Author

logu-c8y Oct 28, 2025

Currently, I am matching only scoped packages (@scope/pkg) since those are the only ones we have in our docs. The registry call is used to confirm that the package actually exists (to avoid npmjs.com’s bot-blocks).
After that, the encoded URL visit (%40) ensures the page resolves correctly for scoped packages, since npmjs.com automatically redirects them to the encoded form

broken-links-script/cypress/e2e/link-checker.cy.js

    
                        const contentType = response.headers['content-type'] || '';

                        if (!contentType.includes('text/html')) {

                          cy.log(`Non-HTML content detected for ${url}, skipping cy.visit()`);

                          expect(response.status).to.be.oneOf([200, 201, 202, 203, 204, 301, 302, 304]);

Contributor

pawel-rynarzewski-c8y Oct 24, 2025

Should we check that response.body is not empty? If we direct user there, there should be some content. Do you have any example of what non-html resource might that be?

Contributor Author

logu-c8y Oct 28, 2025

For example, Link: https://download.cumulocity.com/Apama/Debian/. These are file repositories or download links, not HTML pages, so I only check that they return a valid status and skip checking the body.

broken-links-script/Extractlinks.js Outdated Show resolved Hide resolved

broken-links-script/Extractlinks.js Outdated Show resolved Hide resolved

broken-links-script/Extractlinks.js Outdated Show resolved Hide resolved

logu-c8y and others added 5 commits

October 24, 2025 14:54


          rename block to frontmatter

e79ac2b

Co-authored-by: Paweł Rynarzewski <92171763+pawel-rynarzewski-c8y@users.noreply.github.com>


          Update broken-links-script/Extractlinks.js

4dbc2f3

Co-authored-by: Paweł Rynarzewski <92171763+pawel-rynarzewski-c8y@users.noreply.github.com>


          Removed .toLowerCase() for case sensitive comparisions

a3544de


          simplify GitHub fragment check by normalizing user-content- IDs

b5c4efe


          replace path.join with a simple string join like /

f34db0e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet