Potential fix for code scanning alert no. 122: Incomplete multi-character sanitization by Jackson57279 · Pull Request #54 · Jackson57279/zapdev

Jackson57279 · 2025-08-12T18:43:00Z

Potential fix for https://github.com/otdoges/zapdev/security/code-scanning/122

The best way to fix this problem is to use a well-tested HTML sanitization library, such as sanitize-html, to reliably remove all <script> and <style> tags and their content, as well as any other potentially dangerous HTML. However, if adding a dependency is not an option, a safer alternative is to repeatedly apply the regular expression replacements for <script> and <style> tags until no more matches are found. This ensures that all instances, including those that may appear after previous replacements, are removed.

To implement this fix in src/lib/search-service.ts, update the extractTextContent method (lines 318-325) to repeatedly remove <script> and <style> tags and their content until none remain, before proceeding to strip all other tags and whitespace. No new imports are needed for the repeated replacement approach.

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

Summary by CodeRabbit

Bug Fixes
- Improved search result text extraction for cleaner, safer snippets by robustly removing scripts, styles, and stray HTML.
- Reduced noise and formatting artifacts in search previews for better readability.
Chores
- Added an HTML sanitization library as a runtime dependency to support safer content processing.
- Updated dependency formatting in configuration to accommodate the new library.

…cter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

vercel · 2025-08-12T18:43:03Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Project	Deployment	Preview	Comments	Updated (UTC)
zapdev	❌ Failed			Aug 12, 2025 6:50pm

coderabbitai · 2025-08-12T18:43:07Z

Warning

Rate limit exceeded

@otdoges has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 33 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 48cd5d9 and 56fd959.

📒 Files selected for processing (2)

package.json (1 hunks)
src/lib/search-service.ts (2 hunks)

Walkthrough

Introduces sanitize-html as a new dependency and updates src/lib/search-service.ts to use it for HTML text extraction, replacing prior regex-based stripping. No public APIs are changed; only internal sanitization logic is updated.

Changes

Cohort / File(s)	Summary
Dependencies `package.json`	Added dependency `sanitize-html@^2.17.0`. Adjusted formatting to include trailing comma after `zod`. No other dependency changes.
Search Service Sanitization `src/lib/search-service.ts`	Imported `sanitize-html`. Reworked `extractTextContent(html: string)` to sanitize HTML (discarding disallowed tags, excluding script/style and their content) and then strip remaining tags and normalize whitespace. No public API changes.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant SearchService
  participant SanitizeHtml

  Caller->>SearchService: extractTextContent(html)
  SearchService->>SanitizeHtml: sanitize(html, options)
  SanitizeHtml-->>SearchService: sanitizedHtml
  SearchService->>SearchService: strip tags, collapse whitespace, trim
  SearchService-->>Caller: textContent

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

In fields of tags I twitch my nose,
Scripts and styles—poof! There it goes.
With sanitized hops through HTML dew,
I gather clean words, crisp and new.
Thump-thump, commit—dependency set,
A tidy burrow of text to net. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch alert-autofix-155

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

claude · 2025-08-12T18:43:19Z

Claude finished @otdoges's task —— View job

Code Review: Security Fix for Incomplete Multi-Character Sanitization

Todo List:

Read and analyze the modified search-service.ts file
Review the extractTextContent method implementation
Evaluate security fix effectiveness
Check for potential edge cases and improvements
Assess code quality and best practices
Provide comprehensive feedback

Review Summary

This PR addresses a legitimate security vulnerability (incomplete multi-character sanitization) by implementing a loop-based approach to repeatedly remove <script> and <style> tags until no more instances are found.

✅ Positive Aspects

Security Fix is Effective: The iterative approach correctly handles nested or overlapping script/style tags that could bypass single-pass regex replacement
Clean Implementation: The loop-based solution is readable and follows the suggested fix pattern
Maintains Existing Functionality: Other HTML stripping and whitespace normalization remains unchanged
Added Regex Flags: The s flag (dotall) ensures multiline script/style content is properly matched

🔍 Code Quality Assessment

File: src/lib/search-service.ts:318-332

The modified extractTextContent method:

private extractTextContent(html: string): string {
  // Repeatedly remove <script> and <style> tags and their content
  let sanitized = html;
  let previous;
  do {
    previous = sanitized;
    sanitized = sanitized
      .replace(/<script[^>]*>.*?<\/script>/gis, '')
      .replace(/<style[^>]*>.*?<\/style>/gis, '');
  } while (sanitized !== previous);
  return sanitized
    .replace(/<[^>]*>/g, ' ')
    .replace(/\s+/g, ' ')
    .trim();
}

🛡️ Security Analysis

Vulnerability Addressed: ✅ Fixes incomplete sanitization that could allow script injection through nested tags
Defense Depth: ✅ Uses iterative removal to handle edge cases like <scr<script>ipt>alert('xss')</script>
Input Validation: ✅ Existing input validation in validateSearchQuery provides additional protection
XSS Prevention: ✅ Combined with existing sanitization, provides robust protection

⚡ Performance Considerations

Efficiency: The loop has potential for O(n²) complexity in worst case, but:
- Input is limited to website HTML content (reasonable size)
- Loop terminates quickly in normal cases (1-2 iterations max)
- Performance impact is negligible for typical use cases

🔧 Minor Suggestions

Consider Adding JSDoc: Document the security purpose of this method

/**
 * Safely extracts text content from HTML by iteratively removing script/style tags
 * to prevent incomplete sanitization vulnerabilities.
 */

Optional: Add Loop Guard (defensive programming):

let iterations = 0;
const MAX_ITERATIONS = 10; // Safety guard
do {
  // ... existing logic
  iterations++;
} while (sanitized !== previous && iterations < MAX_ITERATIONS);

🧪 Test Coverage Recommendation

Consider adding unit tests for edge cases:

// Test cases to add:
- Nested script tags: `<scr<script>ipt>alert('xss')</script>`
- Mixed case: `<SCRIPT>alert('xss')</SCRIPT>`
- Malformed tags: `<script>alert('xss')<script>`
- Multiple iterations needed

📊 Overall Assessment

Aspect	Rating	Notes
Security Fix	✅ Excellent	Properly addresses the vulnerability
Code Quality	✅ Good	Clean, readable implementation
Performance	✅ Acceptable	Minor overhead, negligible impact
Maintainability	✅ Good	Clear intent and simple logic

✅ Recommendation: Approve

This is a solid security fix that effectively addresses the scanning alert. The implementation is clean, secure, and maintains existing functionality while fixing the vulnerability.

The iterative approach is the correct solution for handling incomplete multi-character sanitization, and the code quality is good. The minor suggestions above are optional improvements but not blockers.

src/lib/search-service.ts

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

claude · 2025-08-12T18:49:11Z

Claude finished @otdoges's task —— View job

Code Review: Security Fix for Incomplete Multi-Character Sanitization

Todo List:

Read and analyze the modified search-service.ts file
Review the extractTextContent method implementation
Evaluate security fix effectiveness
Check for potential edge cases and improvements
Assess code quality and best practices
Provide comprehensive feedback

🎯 Review Summary

This PR addresses a legitimate security vulnerability by implementing a superior solution using the sanitize-html library instead of regex-based approaches.

✅ Excellent Security Fix

File: src/lib/search-service.ts:319-333

private extractTextContent(html: string): string {
  const sanitized = sanitizeHtml(html, {
    allowedTags: false,
    disallowedTagsMode: 'discard', 
    exclusiveFilter: (frame) => {
      return frame.tag === 'script' || frame.tag === 'style';
    }
  });
  return sanitized
    .replace(/<[^>]*>/g, ' ')
    .replace(/\s+/g, ' ')
    .trim();
}

Security Benefits:

✅ Uses production-grade sanitize-html@2.17.0 library
✅ Eliminates incomplete multi-character sanitization vulnerability
✅ Handles all edge cases (nested tags, malformed HTML, case variations)
✅ More robust than regex-based approaches
✅ Properly integrated with existing input validation

🔧 Minor Optimization Suggestion

The current configuration has some redundancy. Consider simplifying:

private extractTextContent(html: string): string {
  const sanitized = sanitizeHtml(html, {
    allowedTags: [], // More explicit than false
    allowedAttributes: {},
    disallowedTagsMode: 'discard'
    // exclusiveFilter not needed since we're removing all tags anyway
  });
  return sanitized
    .replace(/\s+/g, ' ')
    .trim();
}

Rationale: Since allowedTags: false removes all tags, the exclusiveFilter and final regex become redundant.

🚨 GitHub Security Alerts

The security scanner is still flagging issues, but these appear to be false positives given the new implementation:

"Incomplete multi-character sanitization" - No longer applicable with sanitize-html
"Bad HTML filtering regexp" - The regex is now only for whitespace normalization, not security

📊 Code Quality Assessment

Aspect	Rating	Notes
Security	✅ Excellent	Production-grade sanitization
Performance	✅ Good	Library overhead is minimal
Maintainability	✅ Excellent	Clear, well-documented approach
Best Practices	✅ Good	Follows security-first principles

🧪 Test Coverage Recommendations

Consider adding unit tests for:

Nested script tags: <scr<script>ipt>alert('xss')</script>
Style tag variations: <STYLE>body{color:red}</STYLE>
Malformed HTML: <script>alert('xss')<script>
Performance with large HTML documents

✅ Final Recommendation: APPROVE

This is an excellent security fix that properly addresses the vulnerability using industry best practices. The sanitize-html approach is significantly more secure and robust than regex-based solutions.

Dependencies: The sanitize-html@2.17.0 dependency was properly added to package.json.

The minor optimization suggestion is optional - the current implementation is secure and functional.

cursor · 2025-08-12T18:52:24Z

Bugbot found 1 bug

To see it, activate your membership in the Cursor dashboard.

feat: Dark mode theme

Potential fix for code scanning alert no. 122: Incomplete multi-chara…

123026d

…cter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

vercel bot had a problem deploying to Preview August 12, 2025 18:44 Failure

github-advanced-security bot found potential problems Aug 12, 2025

View reviewed changes

src/lib/search-service.ts Fixed Show fixed Hide fixed

src/lib/search-service.ts Fixed Show fixed Hide fixed

Potential fix for code scanning alert no. 185: Bad HTML filtering regexp

56fd959

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Jackson57279 marked this pull request as ready for review August 12, 2025 18:48

vercel bot had a problem deploying to Preview August 12, 2025 18:50 Failure

Jackson57279 merged commit 86bd788 into main Aug 12, 2025
9 of 11 checks passed

Jackson57279 deleted the alert-autofix-155 branch August 12, 2025 18:51

Jackson57279 pushed a commit that referenced this pull request Sep 11, 2025

Merge pull request #54 from ZLostTK/DarkMode-Theme

3b82f50

feat: Dark mode theme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential fix for code scanning alert no. 122: Incomplete multi-character sanitization#54

Potential fix for code scanning alert no. 122: Incomplete multi-character sanitization#54
Jackson57279 merged 2 commits intomainfrom
alert-autofix-155

Jackson57279 commented Aug 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 12, 2025 •

edited

Loading

Rate limit exceeded

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

claude bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

claude bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

cursor bot commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jackson57279 commented Aug 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

claude bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Security Fix for Incomplete Multi-Character Sanitization

Review Summary

✅ Positive Aspects

🔍 Code Quality Assessment

🛡️ Security Analysis

⚡ Performance Considerations

🔧 Minor Suggestions

🧪 Test Coverage Recommendation

📊 Overall Assessment

✅ Recommendation: Approve

Uh oh!

Uh oh!

Uh oh!

claude bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List:

🎯 Review Summary

✅ Excellent Security Fix

🔧 Minor Optimization Suggestion

🚨 GitHub Security Alerts

📊 Code Quality Assessment

🧪 Test Coverage Recommendations

✅ Final Recommendation: APPROVE

Uh oh!

Uh oh!

cursor bot commented Aug 12, 2025

Bugbot found 1 bug

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jackson57279 commented Aug 12, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Aug 12, 2025 •

edited

Loading

coderabbitai bot commented Aug 12, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

claude bot commented Aug 12, 2025 •

edited

Loading

claude bot commented Aug 12, 2025 •

edited

Loading