-
-
Notifications
You must be signed in to change notification settings - Fork 6
refactor(node): add prompt isolation for repository content and centralize CLI error handling #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…alize CLI error handling
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary of ChangesHello @calchiwo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the CLI application to improve its robustness, maintainability, and security posture, particularly against prompt injection. By isolating untrusted repository content with explicit delimiters and centralizing error handling for both LLM interactions and file reading, the changes ensure a more reliable and secure operation of the tool. The refactoring also reduces code duplication, making the codebase cleaner and easier to manage. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42ca76fefa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| README content: | ||
| <readme> | ||
| ${readme || "No README provided"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Escape repo text before relying on XML delimiters
The new prompt-isolation approach is bypassable because untrusted repository content is interpolated directly inside XML-like sections without escaping. If a README (or other repo text) contains </readme> followed by instructions, the model sees attacker-controlled text outside the intended data block and can still be steered, which defeats the security goal of this refactor. This affects the same pattern in the other prompt builders too, so delimiter escaping (or structured encoding like JSON) is needed for the isolation to hold.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces excellent refactoring by centralizing error handling for repository file reading and LLM generation into safeReadRepoFiles and generateWithExit helper functions, significantly improving code clarity and maintainability. It also takes a positive step towards prompt isolation by using XML-style delimiters and critical instructions in LLM prompts. However, the implementation is incomplete as it does not escape these delimiters within the untrusted content itself, leaving the system vulnerable to indirect prompt injection where an attacker could craft repository content to manipulate the LLM's output.
| <repository_metadata> | ||
| Name: ${repoName} | ||
| Description: ${description || "No description provided"} | ||
| </repository_metadata> | ||
|
|
||
| README content: | ||
| <readme> | ||
| ${readme || "No README provided"} | ||
| </readme> | ||
|
|
||
| Repo structure: | ||
| <repo_structure> | ||
| ${treeText || "No file tree provided"} | ||
| </repo_structure> | ||
|
|
||
| Key code files: | ||
| <code_files> | ||
| ${filesText || "No code files provided"} | ||
| </code_files> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of XML-style delimiters (e.g., <readme>, </readme>) to isolate untrusted repository content is a good step toward preventing prompt injection. However, the untrusted content (like readme, treeText, and filesText) is interpolated directly into the prompt without escaping. An attacker can include a closing tag like </readme> or </code_files> in their repository to break out of the data block and inject malicious instructions. This is a form of indirect prompt injection.
To remediate this, you should sanitize or escape the untrusted content to ensure it cannot contain the closing tags used as delimiters. For example, you could replace </ with </ or a similar safe representation within the untrusted strings.
| <repository_metadata> | ||
| Name: ${repoName} | ||
| Description: ${description || "No description provided"} | ||
| </repository_metadata> | ||
|
|
||
| README snippet: | ||
| <readme> | ||
| ${readmeSnippet} | ||
| </readme> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <repository_metadata> | ||
| Name: ${repoName} | ||
| Description: ${description || "No description provided"} | ||
| </repository_metadata> | ||
|
|
||
| README content: | ||
| <readme> | ||
| ${readmeContent} | ||
| </readme> | ||
|
|
||
| Repo structure: | ||
| <repo_structure> | ||
| ${treeContent} | ||
| </repo_structure> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| async function safeReadRepoFiles( | ||
| owner: string, | ||
| repo: string | ||
| ): Promise<RepoReadResult | null> { | ||
| try { | ||
| return await readRepoSignalFiles(owner, repo); | ||
| } catch (e: any) { | ||
| console.warn(`Warning: Could not read repo files: ${e?.message || e}`); | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better type safety, it's recommended to catch errors as unknown instead of any. This forces you to handle the error type explicitly, preventing potential runtime errors if the caught value is not an Error object. The e?.message || e pattern can also produce unhelpful output like [object Object].
| async function safeReadRepoFiles( | |
| owner: string, | |
| repo: string | |
| ): Promise<RepoReadResult | null> { | |
| try { | |
| return await readRepoSignalFiles(owner, repo); | |
| } catch (e: any) { | |
| console.warn(`Warning: Could not read repo files: ${e?.message || e}`); | |
| return null; | |
| } | |
| } | |
| async function safeReadRepoFiles( | |
| owner: string, | |
| repo: string | |
| ): Promise<RepoReadResult | null> { | |
| try { | |
| return await readRepoSignalFiles(owner, repo); | |
| } catch (e: unknown) { | |
| const message = e instanceof Error ? e.message : String(e); | |
| console.warn(`Warning: Could not read repo files: ${message}`); | |
| return null; | |
| } | |
| } |
| async function generateWithExit(prompt: string): Promise<string> { | ||
| try { | ||
| return await generateExplanation(prompt); | ||
| } catch (e: any) { | ||
| console.error("Failed to generate explanation."); | ||
| console.error(`error: ${e?.message || e}`); | ||
| console.error("\nfix:"); | ||
| console.error("- Ensure GEMINI_API_KEY is set"); | ||
| console.error("- Or run: explainthisrepo --doctor"); | ||
| process.exit(1); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the safeReadRepoFiles function, using unknown for the caught error provides better type safety than any. Explicitly checking if e is an Error instance before accessing its message property is a more robust way to handle errors.
| async function generateWithExit(prompt: string): Promise<string> { | |
| try { | |
| return await generateExplanation(prompt); | |
| } catch (e: any) { | |
| console.error("Failed to generate explanation."); | |
| console.error(`error: ${e?.message || e}`); | |
| console.error("\nfix:"); | |
| console.error("- Ensure GEMINI_API_KEY is set"); | |
| console.error("- Or run: explainthisrepo --doctor"); | |
| process.exit(1); | |
| } | |
| } | |
| async function generateWithExit(prompt: string): Promise<string> { | |
| try { | |
| return await generateExplanation(prompt); | |
| } catch (e: unknown) { | |
| console.error("Failed to generate explanation."); | |
| const message = e instanceof Error ? e.message : String(e); | |
| console.error(`error: ${message}`); | |
| console.error("\nfix:"); | |
| console.error("- Ensure GEMINI_API_KEY is set"); | |
| console.error("- Or run: explainthisrepo --doctor"); | |
| process.exit(1); | |
| } | |
| } |
| @@ -0,0 +1 @@ | |||
| {} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new package.json file and the corresponding package-lock.json at the root of the repository appear to be for an empty package. Were they added intentionally? If they are not needed, they should probably be removed to avoid confusion with the main project files located in the node_version directory and to keep the repository root clean.
This PR isolate untrusted repo content using explicit delimiters
safeReadRepoFilesgenerateWithExit, to centralize the error handling logic for LLM generation callstry/catchblocksanywithRepoReadResulttypepackage.jsonandpackage-lock.json