-
Notifications
You must be signed in to change notification settings - Fork 1
Potential fix for code scanning alert no. 122: Incomplete multi-character sanitization #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -316,9 +316,16 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| private extractTextContent(html: string): string { | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return html | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<script[^>]*>.*?<\/script>/gi, '') | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<style[^>]*>.*?<\/style>/gi, '') | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Repeatedly remove <script> and <style> tags and their content | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| let sanitized = html; | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| let previous; | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| do { | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| previous = sanitized; | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| sanitized = sanitized | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<script[^>]*>.*?<\/script>/gis, '') | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+324
to
+325
Check failureCode scanning / CodeQL Incomplete multi-character sanitization High
This string may still contain
<script Error loading related location Loading
Copilot AutofixAI 6 months ago To fix the incomplete multi-character sanitization, we should ensure that all script and style tags are fully removed, even if they are nested, malformed, or appear consecutively. The best way to do this without changing existing functionality is to repeatedly apply the regular expression replacements for
Suggested changeset
1
src/lib/search-service.ts
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Refresh and try again.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<style[^>]*>.*?<\/style>/gis, ''); | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } while (sanitized !== previous); | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return sanitized | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+319
to
+328
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion ❓ Verification inconclusivePrefer DOM parsing over regex for robust tag stripping (addresses CodeQL alert #122). Regex-based HTML stripping is brittle (nested tags, malformed markup, edge whitespace). Since this runs in a browser context, use DOMParser to remove Proposed replacement within this method: private extractTextContent(html: string): string {
- // Repeatedly remove <script> and <style> tags and their content
- let sanitized = html;
- let previous;
- do {
- previous = sanitized;
- sanitized = sanitized
- .replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, '')
- .replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, '');
- } while (sanitized !== previous);
- return sanitized
- .replace(/<[^>]*>/g, ' ')
- .replace(/\s+/g, ' ')
- .trim();
+ // Prefer DOM parsing for robust removal of script/style content
+ if (typeof DOMParser !== 'undefined') {
+ const parser = new DOMParser();
+ const doc = parser.parseFromString(html, 'text/html');
+ doc.querySelectorAll('script, style').forEach((n) => n.remove());
+ const text = doc.body?.textContent ?? '';
+ return text.replace(/\s+/g, ' ').trim();
+ }
+ // Fallback to regex if DOMParser is unavailable (e.g., non-browser env)
+ let sanitized = html
+ .replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, '')
+ .replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, '')
+ .replace(/<[^>]*>/g, ' ');
+ return sanitized.replace(/\s+/g, ' ').trim();
}Verification: after applying, please re-run CodeQL on this PR to confirm alerts 185 and 186 are resolved. 🏁 Script executed: #!/bin/bash
echo "Checking for DOMParser references and runtime environment indicators..."
grep -R "DOMParser" -n src || echo "No existing DOMParser references in src/"
echo
echo "Checking for jsdom in dependencies..."
grep -R "\"jsdom\"" -n package.json || echo "No jsdom dependency in package.json"
echo
echo "Searching for browser globals usage..."
grep -R "document\." -n src || echo "No document references found in src/"
grep -R "window\." -n src || echo "No window references found in src/"
echo
echo "Locating extractTextContent usages..."
grep -R "extractTextContent" -n src || echo "No calls to extractTextContent found in src/"Length of output: 6218 Use DOMParser for robust Relying on iterative regex is brittle and can miss nested or malformed tags. Since this code runs in a browser (and under jsdom in tests), we can safely parse the HTML, remove unwanted nodes, and extract text. In non-browser environments, fall back to the original iterative regex logic. In private extractTextContent(html: string): string {
- // Repeatedly remove <script> and <style> tags and their content
- let sanitized = html;
- let previous;
- do {
- previous = sanitized;
- sanitized = sanitized
- .replace(/<script[^>]*>.*?<\/script>/gis, '')
- .replace(/<style[^>]*>.*?<\/style>/gis, '');
- } while (sanitized !== previous);
- return sanitized
- .replace(/<[^>]*>/g, ' ')
- .replace(/\s+/g, ' ')
- .trim();
+ // Prefer DOM parsing for robust removal of script/style content
+ if (typeof DOMParser !== 'undefined') {
+ const doc = new DOMParser().parseFromString(html, 'text/html');
+ doc.querySelectorAll('script, style').forEach((n) => n.remove());
+ const text = doc.body?.textContent ?? '';
+ return text.replace(/\s+/g, ' ').trim();
+ }
+ // Fallback to iterative regex removal in non-browser environments
+ let sanitized = html;
+ let previous: string;
+ do {
+ previous = sanitized;
+ sanitized = sanitized
+ .replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, '')
+ .replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, '');
+ } while (sanitized !== previous);
+ sanitized = sanitized.replace(/<[^>]*>/g, ' ');
+ return sanitized.replace(/\s+/g, ' ').trim();
}
After merging, please re-run CodeQL to confirm alerts 185 & 186 are resolved. 📝 Committable suggestion
Suggested change
🧰 Tools🪛 GitHub Check: CodeQL[failure] 324-325: Incomplete multi-character sanitization [failure] 325-325: Bad HTML filtering regexp 🤖 Prompt for AI Agents |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<[^>]*>/g, ' ') | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/\s+/g, ' ') | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .trim(); | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+319
to
331
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Add targeted tests for tricky cases (nested, spaced closing tags, multiline). To prevent regressions and to validate the CodeQL concerns, add unit tests covering:
I can draft a small test suite (Jest/Vitest) for 🧰 Tools🪛 GitHub Check: CodeQL[failure] 324-325: Incomplete multi-character sanitization [failure] 325-325: Bad HTML filtering regexp 🤖 Prompt for AI Agents |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicit any: type
previousto satisfy strict TS (noImplicitAny).let previous;infersany, violating strict TS and our guidelines. Type it explicitly.Apply this diff:
📝 Committable suggestion
🤖 Prompt for AI Agents