-
Couldn't load subscription status.
- Fork 26
Update Broken link Script #3808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Preview available here |
Co-authored-by: Paweł Rynarzewski <92171763+pawel-rynarzewski-c8y@users.noreply.github.com>
… links in link checker
| const iframes = doc.querySelectorAll('iframe, frame'); | ||
| for (const frame of iframes) { | ||
| try { | ||
| const frameDoc = frame.contentDocument || frame.contentWindow?.document; | ||
| if (frameDoc) { | ||
| allFragments = allFragments.concat(collectFragments(frameDoc)); | ||
| } | ||
| } catch (e) { | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me example of the case you're addressing here? I've made some tests and if you have a document with iframe, and iframe contains an anchor, you cannot use url like /main-document.html#a-name-from-iframe to link to the anchor inside the iframe, i.e. user won't be scrolled to the right position within the iframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here isn’t to support direct navigation to iframe anchors via #fragment, but to verify that referenced anchors (even inside embedded documents) actually exist. We have some pages that load documentation or content in iframes, so this logic helps our validation detect missing anchors there too.
| const m = url.match(/^https:\/\/www\.npmjs\.com\/package\/(@[^/]+\/[^#?]+)/); | ||
| const pkg = m ? m[1] : null; | ||
| const encodedUrl = pkg ? url.replace(pkg, encodeURIComponent(pkg)) : url; | ||
|
|
||
| if (pkg) { | ||
| cy.request({ | ||
| url: `https://registry.npmjs.org/${pkg}`, | ||
| failOnStatusCode: false, | ||
| headers: { Accept: 'application/vnd.npm.install-v1+json' } | ||
| }).then((res) => { | ||
| expect(res.status, `npm registry status for ${pkg}`).to.eq(200); | ||
| }); | ||
| } | ||
| cy.visit(encodedUrl, { timeout: 50000, failOnStatusCode: false }); | ||
| cy.url().should('include', '/package/%40'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this logic? I see you match only packages with scoped names starting with @ and currently we only have such urls in docs, but I wouldn't make it that specific. Could we assume that anything after https://www.npmjs.com/package/ is a full package name?
I also read about constraints on accessing https://www.npmjs.com by bots and that we need to use registry instead. But what's the purpose of then encoding url, visiting it and checking for %40?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I am matching only scoped packages (@scope/pkg) since those are the only ones we have in our docs. The registry call is used to confirm that the package actually exists (to avoid npmjs.com’s bot-blocks).
After that, the encoded URL visit (%40) ensures the page resolves correctly for scoped packages, since npmjs.com automatically redirects them to the encoded form
| const contentType = response.headers['content-type'] || ''; | ||
| if (!contentType.includes('text/html')) { | ||
| cy.log(`Non-HTML content detected for ${url}, skipping cy.visit()`); | ||
| expect(response.status).to.be.oneOf([200, 201, 202, 203, 204, 301, 302, 304]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that response.body is not empty? If we direct user there, there should be some content. Do you have any example of what non-html resource might that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, Link: https://download.cumulocity.com/Apama/Debian/. These are file repositories or download links, not HTML pages, so I only check that they return a valid status and skip checking the body.
Co-authored-by: Paweł Rynarzewski <92171763+pawel-rynarzewski-c8y@users.noreply.github.com>
Co-authored-by: Paweł Rynarzewski <92171763+pawel-rynarzewski-c8y@users.noreply.github.com>
No description provided.