Potential fix for code scanning alert no. 121: Incomplete multi-character sanitization by Jackson57279 · Pull Request #55 · Jackson57279/zapdev

Jackson57279 · 2025-08-12T18:49:37Z

Potential fix for https://github.com/otdoges/zapdev/security/code-scanning/121

The best way to fix this problem is to ensure that all instances of <script> and <style> tags are completely removed from the input, even if they appear consecutively or are nested. This can be achieved by repeatedly applying the regular expression replacement until no more matches are found, as described in the background. Alternatively, using a well-tested library such as sanitize-html would be preferable, but if that is not an option, the repeated replacement approach is effective.

Specifically, in src/lib/search-service.ts, within the extractTextContent method (lines 318-325), replace the single-pass .replace() calls for <script> and <style> tags with a loop that repeatedly applies the replacements until the input no longer changes. This ensures that all such tags are removed, regardless of their arrangement in the input.

No new methods or imports are needed unless you choose to use a library. For the repeated replacement approach, simply refactor the code within the method.

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

Summary by CodeRabbit

Bug Fixes
- Improved text extraction for search results, ensuring scripts, styles, and HTML artifacts are fully removed.
- Produces cleaner plain-text snippets and previews with more consistent whitespace.
- Reduces edge cases where unwanted content appeared in summaries.

…cter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

vercel · 2025-08-12T18:49:40Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Project	Deployment	Preview	Comments	Updated (UTC)
zapdev	✅ Ready	Preview	Comment	Aug 12, 2025 6:56pm

coderabbitai · 2025-08-12T18:49:43Z

Warning

Rate limit exceeded

@otdoges has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d486d7 and b53781a.

📒 Files selected for processing (2)

package.json (1 hunks)
src/lib/search-service.ts (2 hunks)

Walkthrough

The extractTextContent method in src/lib/search-service.ts now iteratively removes all <script> and <style> blocks using repeated regex passes before stripping remaining HTML tags and collapsing whitespace. This replaces a single-pass approach with loops to ensure complete removal across multiline and multiple occurrences.

Changes

Cohort / File(s)	Summary of Changes
Search Service Sanitization `src/lib/search-service.ts`	Changed extractTextContent to use do-while loops with global, dot-all regex for removing script/style blocks, then strip remaining tags and collapse whitespace. No exported/public API changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Poem

I nibble code like clover leaves,
Snipping scripts the markup weaves—
Styles trimmed with gentle care,
Tags fall light as springtime air.
Now text stands clean, serene, and bright—
A rabbit’s tidy byte-delight. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch alert-autofix-121

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

claude · 2025-08-12T18:49:54Z

Claude finished @otdoges's task —— View job

Code Review for Security Fix PR #55

Reviewing the pull request that addresses code scanning alert #121 for incomplete multi-character sanitization.

Todo List:

Read and analyze the modified file (src/lib/search-service.ts)
Evaluate the security fix implementation
Check code quality and best practices
Assess performance considerations
Identify potential bugs or edge cases
Evaluate test coverage needs
Provide comprehensive review feedback

Review Feedback

✅ Security Fix Analysis

The implementation correctly addresses the security vulnerability by replacing single-pass .replace() calls with a loop-based approach in extractTextContent method (src/lib/search-service.ts:318-335). This effectively prevents incomplete multi-character sanitization attacks where nested or consecutive <script> and <style> tags could bypass single-pass removal.

Before (vulnerable):

// Single pass could leave: <scr<script>ipt>alert('xss')</script>
sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');

After (secure):

// Loop ensures complete removal of all script/style tags
do {
  previous = sanitized;
  sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');
} while (sanitized !== previous);

✅ Code Quality & Best Practices

Clean Implementation: The fix follows the existing code style and patterns
Proper Variable Naming: previous variable clearly indicates its purpose
Consistent Approach: Both <script> and <style> tags use the same secure pattern
Regex Flags: Correct use of gis flags (global, case-insensitive, dotall)

⚠️ Performance Considerations

The loop-based approach has potential performance implications:

Worst Case: Deeply nested tags could require multiple iterations
Typical Case: Most legitimate HTML will complete in 1-2 iterations
Recommendation: Consider adding a safety counter to prevent infinite loops:

let iterations = 0;
const maxIterations = 10; // Safety limit
do {
  previous = sanitized;
  sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');
  iterations++;
} while (sanitized !== previous && iterations < maxIterations);

🔍 Potential Edge Cases

Self-closing script tags: <script src="..."/> - Currently handled correctly
Malformed tags: <script> without closing - The regex .*? handles this gracefully
CDATA sections: Not explicitly handled, but unlikely in typical use cases
Case variations: Properly handled with i flag

⚠️ Test Coverage Recommendations

The security fix should include tests for:

Nested script tags: <scr<script>ipt>alert('xss')</script>
Consecutive tags: <script></script><script>alert('xss')</script>
Mixed case: <SCRIPT>alert('xss')</SCRIPT>
Self-closing tags: <script src="evil.js"/>
Style tag variants: Similar patterns with <style> tags

📋 Additional Security Notes

The fix only addresses <script> and <style> tags specifically mentioned in the alert
Other potentially dangerous tags (<iframe>, <object>, etc.) are handled by the final HTML tag removal: sanitized.replace(/<[^>]*>/g, ' ')
The method is used for content extraction, not rendering, which reduces XSS risk

Summary

This is a solid security fix that properly addresses the incomplete multi-character sanitization vulnerability. The implementation is clean and follows security best practices. Main recommendations:

Add iteration safety limit to prevent potential DoS
Add comprehensive test cases to verify the fix works for all edge cases
Consider documenting the security reasoning in code comments

The fix is ready for merge with the suggested performance enhancement.

src/lib/search-service.ts

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/lib/search-service.ts (1)

332-334: Optional: Prefer a DOM-based text extraction for robustness.

Regex-based HTML parsing is inherently brittle. If runtime environment permits, using DOMParser yields more reliable text extraction and handles entities.

For example:

private extractTextContent(html: string): string {
  try {
    const doc = new DOMParser().parseFromString(html, 'text/html');
    const text = doc.body?.textContent ?? '';
    return text.replace(/\s+/g, ' ').trim();
  } catch {
    // Fallback to regex approach if DOMParser fails
    let sanitized = html;
    let previous: string;
    do { previous = sanitized; sanitized = sanitized.replace(/<script\b[^>]*>.*?<\/script\s*>/gis, ''); } while (sanitized !== previous);
    do { previous = sanitized; sanitized = sanitized.replace(/<style\b[^>]*>.*?<\/style\s*>/gis, ''); } while (sanitized !== previous);
    sanitized = sanitized.replace(/<[^>]*>/g, ' ');
    return sanitized.replace(/\s+/g, ' ').trim();
  }
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43a83a6 and 7d486d7.

📒 Files selected for processing (1)

src/lib/search-service.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/convex-security.mdc)

**/*.{ts,tsx}: All Convex queries and mutations MUST use proper authentication. Never accept user IDs from client parameters.
Always verify user owns the data before allowing access.
Use the authenticated user's identity.subject for user references.
Implement proper error messages that don't leak information.
Authentication verification in every function.
Authorization checks for data ownership.
Input validation and sanitization.
Error handling without information leakage.

**/*.{ts,tsx}: Use Sonner for toast notifications to provide consistent user feedback, including success, error, and loading states.
Always handle errors gracefully using try-catch blocks in asynchronous functions, providing user feedback and logging errors.
Provide specific, actionable error messages for form validation errors using toast notifications.
Handle common network error scenarios in catch blocks, providing appropriate toast messages for network errors, authentication errors, and unexpected errors.

If using TypeScript, use an enum to store flag names.

Strict TypeScript must be used with no 'any' types allowed

**/*.{ts,tsx}: NEVER use any type - use proper TypeScript types
Use unknown for truly unknown data types
Implement proper interface definitions
Do not use empty interfaces; use a type alias instead (e.g., type InputProps = ... instead of interface InputProps {})
All function parameters must be typed
All return types should be explicit for public APIs
Use proper generic constraints
Implement discriminated unions for state management
Use proper interface definitions for error handling types (e.g., interface ValidationResult { isValid: boolean; error?: string; })

**/*.{ts,tsx}: Always sanitize user input before storing or displaying using a sanitization function like sanitizeText.
Implement comprehensive input validation, including length checks and detection of malicious patterns, as shown in the validateInput function.
Define and use security constants suc...

Files:

src/lib/search-service.ts

**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/posthog-integration.mdc)

**/*.{js,jsx,ts,tsx}: Use a consistent naming convention for this storage. enum/const object members should be written UPPERCASE_WITH_UNDERSCORE.
If a custom property for a person or event is at any point referenced in two or more files or two or more callsites in the same file, use an enum or const object, as above in feature flags.

Files:

src/lib/search-service.ts

🪛 GitHub Check: CodeQL

src/lib/search-service.ts

[failure] 324-324: Bad HTML filtering regexp
This regular expression does not match script end tags like </script >.

🔇 Additional comments (1)

src/lib/search-service.ts (1)

319-326: Iterative removal approach is a solid improvement.

Looping until convergence ensures all nested or consecutive <script> blocks are removed before general tag stripping.

src/lib/search-service.ts

Potential fix for code scanning alert no. 121: Incomplete multi-chara…

7d486d7

…cter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Jackson57279 marked this pull request as ready for review August 12, 2025 18:49

github-advanced-security bot found potential problems Aug 12, 2025

View reviewed changes

src/lib/search-service.ts Fixed Show fixed Hide fixed

vercel bot deployed to Preview August 12, 2025 18:51 View deployment

Potential fix for code scanning alert no. 187: Bad HTML filtering regexp

b53781a

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

coderabbitai bot reviewed Aug 12, 2025

View reviewed changes

src/lib/search-service.ts Outdated Show resolved Hide resolved

src/lib/search-service.ts Outdated Show resolved Hide resolved

src/lib/search-service.ts Outdated Show resolved Hide resolved

vercel bot deployed to Preview August 12, 2025 18:56 View deployment

Jackson57279 enabled auto-merge (squash) August 12, 2025 19:08

Jackson57279 merged commit 469c1c7 into main Aug 12, 2025
8 checks passed

Jackson57279 deleted the alert-autofix-121 branch August 12, 2025 19:10

This was referenced Aug 12, 2025

Potential fix for code scanning alert no. 122: Incomplete multi-character sanitization #53

Merged

Potential fix for code scanning alert no. 186: Incomplete multi-character sanitization #62

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential fix for code scanning alert no. 121: Incomplete multi-character sanitization#55

Potential fix for code scanning alert no. 121: Incomplete multi-character sanitization#55
Jackson57279 merged 2 commits intomainfrom
alert-autofix-121

Jackson57279 commented Aug 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 12, 2025 •

edited

Loading

Rate limit exceeded

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

claude bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jackson57279 commented Aug 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

claude bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for Security Fix PR #55

Todo List:

Review Feedback

✅ Security Fix Analysis

✅ Code Quality & Best Practices

⚠️ Performance Considerations

🔍 Potential Edge Cases

⚠️ Test Coverage Recommendations

📋 Additional Security Notes

Summary

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jackson57279 commented Aug 12, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Aug 12, 2025 •

edited

Loading

coderabbitai bot commented Aug 12, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

claude bot commented Aug 12, 2025 •

edited

Loading