Potential fix for code scanning alert no. 121: Incomplete multi-character sanitization#55
Potential fix for code scanning alert no. 121: Incomplete multi-character sanitization#55Jackson57279 merged 2 commits intomainfrom
Conversation
…cter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
Warning Rate limit exceeded@otdoges has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughThe extractTextContent method in src/lib/search-service.ts now iteratively removes all <script> and <style> blocks using repeated regex passes before stripping remaining HTML tags and collapsing whitespace. This replaces a single-pass approach with loops to ensure complete removal across multiline and multiple occurrences. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Claude finished @otdoges's task —— View job Code Review for Security Fix PR #55Reviewing the pull request that addresses code scanning alert #121 for incomplete multi-character sanitization. Todo List:
Review Feedback✅ Security Fix AnalysisThe implementation correctly addresses the security vulnerability by replacing single-pass Before (vulnerable): // Single pass could leave: <scr<script>ipt>alert('xss')</script>
sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');After (secure): // Loop ensures complete removal of all script/style tags
do {
previous = sanitized;
sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gis, '');
} while (sanitized !== previous);✅ Code Quality & Best Practices
|
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/lib/search-service.ts (1)
332-334: Optional: Prefer a DOM-based text extraction for robustness.Regex-based HTML parsing is inherently brittle. If runtime environment permits, using DOMParser yields more reliable text extraction and handles entities.
For example:
private extractTextContent(html: string): string { try { const doc = new DOMParser().parseFromString(html, 'text/html'); const text = doc.body?.textContent ?? ''; return text.replace(/\s+/g, ' ').trim(); } catch { // Fallback to regex approach if DOMParser fails let sanitized = html; let previous: string; do { previous = sanitized; sanitized = sanitized.replace(/<script\b[^>]*>.*?<\/script\s*>/gis, ''); } while (sanitized !== previous); do { previous = sanitized; sanitized = sanitized.replace(/<style\b[^>]*>.*?<\/style\s*>/gis, ''); } while (sanitized !== previous); sanitized = sanitized.replace(/<[^>]*>/g, ' '); return sanitized.replace(/\s+/g, ' ').trim(); } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lib/search-service.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}
📄 CodeRabbit Inference Engine (.cursor/rules/convex-security.mdc)
**/*.{ts,tsx}: All Convex queries and mutations MUST use proper authentication. Never accept user IDs from client parameters.
Always verify user owns the data before allowing access.
Use the authenticated user's identity.subject for user references.
Implement proper error messages that don't leak information.
Authentication verification in every function.
Authorization checks for data ownership.
Input validation and sanitization.
Error handling without information leakage.
**/*.{ts,tsx}: Use Sonner for toast notifications to provide consistent user feedback, including success, error, and loading states.
Always handle errors gracefully using try-catch blocks in asynchronous functions, providing user feedback and logging errors.
Provide specific, actionable error messages for form validation errors using toast notifications.
Handle common network error scenarios in catch blocks, providing appropriate toast messages for network errors, authentication errors, and unexpected errors.If using TypeScript, use an enum to store flag names.
Strict TypeScript must be used with no 'any' types allowed
**/*.{ts,tsx}: NEVER useanytype - use proper TypeScript types
Useunknownfor truly unknown data types
Implement proper interface definitions
Do not use empty interfaces; use a type alias instead (e.g.,type InputProps = ...instead ofinterface InputProps {})
All function parameters must be typed
All return types should be explicit for public APIs
Use proper generic constraints
Implement discriminated unions for state management
Use proper interface definitions for error handling types (e.g.,interface ValidationResult { isValid: boolean; error?: string; })
**/*.{ts,tsx}: Always sanitize user input before storing or displaying using a sanitization function likesanitizeText.
Implement comprehensive input validation, including length checks and detection of malicious patterns, as shown in thevalidateInputfunction.
Define and use security constants suc...
Files:
src/lib/search-service.ts
**/*.{js,jsx,ts,tsx}
📄 CodeRabbit Inference Engine (.cursor/rules/posthog-integration.mdc)
**/*.{js,jsx,ts,tsx}: Use a consistent naming convention for this storage. enum/const object members should be written UPPERCASE_WITH_UNDERSCORE.
If a custom property for a person or event is at any point referenced in two or more files or two or more callsites in the same file, use an enum or const object, as above in feature flags.
Files:
src/lib/search-service.ts
🪛 GitHub Check: CodeQL
src/lib/search-service.ts
[failure] 324-324: Bad HTML filtering regexp
This regular expression does not match script end tags like </script >.
🔇 Additional comments (1)
src/lib/search-service.ts (1)
319-326: Iterative removal approach is a solid improvement.Looping until convergence ensures all nested or consecutive
<script>blocks are removed before general tag stripping.
Potential fix for https://github.com/otdoges/zapdev/security/code-scanning/121
The best way to fix this problem is to ensure that all instances of
<script>and<style>tags are completely removed from the input, even if they appear consecutively or are nested. This can be achieved by repeatedly applying the regular expression replacement until no more matches are found, as described in the background. Alternatively, using a well-tested library such assanitize-htmlwould be preferable, but if that is not an option, the repeated replacement approach is effective.Specifically, in
src/lib/search-service.ts, within theextractTextContentmethod (lines 318-325), replace the single-pass.replace()calls for<script>and<style>tags with a loop that repeatedly applies the replacements until the input no longer changes. This ensures that all such tags are removed, regardless of their arrangement in the input.No new methods or imports are needed unless you choose to use a library. For the repeated replacement approach, simply refactor the code within the method.
Suggested fixes powered by Copilot Autofix. Review carefully before merging.
Summary by CodeRabbit