Skip to content

Potential fix for code scanning alert no. 121: Incomplete multi-character sanitization#55

Merged
Jackson57279 merged 2 commits intomainfrom
alert-autofix-121
Aug 12, 2025
Merged

Potential fix for code scanning alert no. 121: Incomplete multi-character sanitization#55
Jackson57279 merged 2 commits intomainfrom
alert-autofix-121

Conversation

@Jackson57279
Copy link
Owner

@Jackson57279 Jackson57279 commented Aug 12, 2025

Potential fix for https://github.com/otdoges/zapdev/security/code-scanning/121

The best way to fix this problem is to ensure that all instances of <script> and <style> tags are completely removed from the input, even if they appear consecutively or are nested. This can be achieved by repeatedly applying the regular expression replacement until no more matches are found, as described in the background. Alternatively, using a well-tested library such as sanitize-html would be preferable, but if that is not an option, the repeated replacement approach is effective.

Specifically, in src/lib/search-service.ts, within the extractTextContent method (lines 318-325), replace the single-pass .replace() calls for <script> and <style> tags with a loop that repeatedly applies the replacements until the input no longer changes. This ensures that all such tags are removed, regardless of their arrangement in the input.

No new methods or imports are needed unless you choose to use a library. For the repeated replacement approach, simply refactor the code within the method.


Suggested fixes powered by Copilot Autofix. Review carefully before merging.

Summary by CodeRabbit

  • Bug Fixes
    • Improved text extraction for search results, ensuring scripts, styles, and HTML artifacts are fully removed.
    • Produces cleaner plain-text snippets and previews with more consistent whitespace.
    • Reduces edge cases where unwanted content appeared in summaries.

…cter sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@vercel
Copy link

vercel bot commented Aug 12, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Project Deployment Preview Comments Updated (UTC)
zapdev Ready Preview Comment Aug 12, 2025 6:56pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 12, 2025

Warning

Rate limit exceeded

@otdoges has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d486d7 and b53781a.

📒 Files selected for processing (2)
  • package.json (1 hunks)
  • src/lib/search-service.ts (2 hunks)

Walkthrough

The extractTextContent method in src/lib/search-service.ts now iteratively removes all <script> and <style> blocks using repeated regex passes before stripping remaining HTML tags and collapsing whitespace. This replaces a single-pass approach with loops to ensure complete removal across multiline and multiple occurrences.

Changes

Cohort / File(s) Summary of Changes
Search Service Sanitization
src/lib/search-service.ts
Changed extractTextContent to use do-while loops with global, dot-all regex for removing script/style blocks, then strip remaining tags and collapse whitespace. No exported/public API changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Poem

I nibble code like clover leaves,
Snipping scripts the markup weaves—
Styles trimmed with gentle care,
Tags fall light as springtime air.
Now text stands clean, serene, and bright—
A rabbit’s tidy byte-delight. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch alert-autofix-121

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Jackson57279 Jackson57279 marked this pull request as ready for review August 12, 2025 18:49
@claude
Copy link

claude bot commented Aug 12, 2025

Claude finished @otdoges's task —— View job


Code Review for Security Fix PR #55

Reviewing the pull request that addresses code scanning alert #121 for incomplete multi-character sanitization.

Todo List:

  • Read and analyze the modified file (src/lib/search-service.ts)
  • Evaluate the security fix implementation
  • Check code quality and best practices
  • Assess performance considerations
  • Identify potential bugs or edge cases
  • Evaluate test coverage needs
  • Provide comprehensive review feedback

Review Feedback

Security Fix Analysis

The implementation correctly addresses the security vulnerability by replacing single-pass .replace() calls with a loop-based approach in extractTextContent method (src/lib/search-service.ts:318-335). This effectively prevents incomplete multi-character sanitization attacks where nested or consecutive <script> and <style> tags could bypass single-pass removal.

Before (vulnerable):

// Single pass could leave: <scr<script>ipt>alert('xss')</script>
sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');

After (secure):

// Loop ensures complete removal of all script/style tags
do {
  previous = sanitized;
  sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');
} while (sanitized !== previous);

Code Quality & Best Practices

  • Clean Implementation: The fix follows the existing code style and patterns
  • Proper Variable Naming: previous variable clearly indicates its purpose
  • Consistent Approach: Both <script> and <style> tags use the same secure pattern
  • Regex Flags: Correct use of gis flags (global, case-insensitive, dotall)

⚠️ Performance Considerations

The loop-based approach has potential performance implications:

  • Worst Case: Deeply nested tags could require multiple iterations
  • Typical Case: Most legitimate HTML will complete in 1-2 iterations
  • Recommendation: Consider adding a safety counter to prevent infinite loops:
let iterations = 0;
const maxIterations = 10; // Safety limit
do {
  previous = sanitized;
  sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');
  iterations++;
} while (sanitized !== previous && iterations < maxIterations);

🔍 Potential Edge Cases

  1. Self-closing script tags: <script src="..."/> - Currently handled correctly
  2. Malformed tags: <script> without closing - The regex .*? handles this gracefully
  3. CDATA sections: Not explicitly handled, but unlikely in typical use cases
  4. Case variations: Properly handled with i flag

⚠️ Test Coverage Recommendations

The security fix should include tests for:

  • Nested script tags: <scr<script>ipt>alert('xss')</script>
  • Consecutive tags: <script></script><script>alert('xss')</script>
  • Mixed case: <SCRIPT>alert('xss')</SCRIPT>
  • Self-closing tags: <script src="evil.js"/>
  • Style tag variants: Similar patterns with <style> tags

📋 Additional Security Notes

  • The fix only addresses <script> and <style> tags specifically mentioned in the alert
  • Other potentially dangerous tags (<iframe>, <object>, etc.) are handled by the final HTML tag removal: sanitized.replace(/<[^>]*>/g, ' ')
  • The method is used for content extraction, not rendering, which reduces XSS risk

Summary

This is a solid security fix that properly addresses the incomplete multi-character sanitization vulnerability. The implementation is clean and follows security best practices. Main recommendations:

  1. Add iteration safety limit to prevent potential DoS
  2. Add comprehensive test cases to verify the fix works for all edge cases
  3. Consider documenting the security reasoning in code comments

The fix is ready for merge with the suggested performance enhancement.


Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/lib/search-service.ts (1)

332-334: Optional: Prefer a DOM-based text extraction for robustness.

Regex-based HTML parsing is inherently brittle. If runtime environment permits, using DOMParser yields more reliable text extraction and handles entities.

For example:

private extractTextContent(html: string): string {
  try {
    const doc = new DOMParser().parseFromString(html, 'text/html');
    const text = doc.body?.textContent ?? '';
    return text.replace(/\s+/g, ' ').trim();
  } catch {
    // Fallback to regex approach if DOMParser fails
    let sanitized = html;
    let previous: string;
    do { previous = sanitized; sanitized = sanitized.replace(/<script\b[^>]*>.*?<\/script\s*>/gis, ''); } while (sanitized !== previous);
    do { previous = sanitized; sanitized = sanitized.replace(/<style\b[^>]*>.*?<\/style\s*>/gis, ''); } while (sanitized !== previous);
    sanitized = sanitized.replace(/<[^>]*>/g, ' ');
    return sanitized.replace(/\s+/g, ' ').trim();
  }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43a83a6 and 7d486d7.

📒 Files selected for processing (1)
  • src/lib/search-service.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/convex-security.mdc)

**/*.{ts,tsx}: All Convex queries and mutations MUST use proper authentication. Never accept user IDs from client parameters.
Always verify user owns the data before allowing access.
Use the authenticated user's identity.subject for user references.
Implement proper error messages that don't leak information.
Authentication verification in every function.
Authorization checks for data ownership.
Input validation and sanitization.
Error handling without information leakage.

**/*.{ts,tsx}: Use Sonner for toast notifications to provide consistent user feedback, including success, error, and loading states.
Always handle errors gracefully using try-catch blocks in asynchronous functions, providing user feedback and logging errors.
Provide specific, actionable error messages for form validation errors using toast notifications.
Handle common network error scenarios in catch blocks, providing appropriate toast messages for network errors, authentication errors, and unexpected errors.

If using TypeScript, use an enum to store flag names.

Strict TypeScript must be used with no 'any' types allowed

**/*.{ts,tsx}: NEVER use any type - use proper TypeScript types
Use unknown for truly unknown data types
Implement proper interface definitions
Do not use empty interfaces; use a type alias instead (e.g., type InputProps = ... instead of interface InputProps {})
All function parameters must be typed
All return types should be explicit for public APIs
Use proper generic constraints
Implement discriminated unions for state management
Use proper interface definitions for error handling types (e.g., interface ValidationResult { isValid: boolean; error?: string; })

**/*.{ts,tsx}: Always sanitize user input before storing or displaying using a sanitization function like sanitizeText.
Implement comprehensive input validation, including length checks and detection of malicious patterns, as shown in the validateInput function.
Define and use security constants suc...

Files:

  • src/lib/search-service.ts
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/posthog-integration.mdc)

**/*.{js,jsx,ts,tsx}: Use a consistent naming convention for this storage. enum/const object members should be written UPPERCASE_WITH_UNDERSCORE.
If a custom property for a person or event is at any point referenced in two or more files or two or more callsites in the same file, use an enum or const object, as above in feature flags.

Files:

  • src/lib/search-service.ts
🪛 GitHub Check: CodeQL
src/lib/search-service.ts

[failure] 324-324: Bad HTML filtering regexp
This regular expression does not match script end tags like </script >.

🔇 Additional comments (1)
src/lib/search-service.ts (1)

319-326: Iterative removal approach is a solid improvement.

Looping until convergence ensures all nested or consecutive <script> blocks are removed before general tag stripping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant