Skip to content

Conversation

@calchiwo
Copy link
Owner

@calchiwo calchiwo commented Feb 12, 2026

This PR isolate untrusted repo content using explicit delimiters

  • Implemented XML-style delimiters around repository content (metadata, README, structure, and code files)
  • Add model instruction to treat all repository content strictly as data and ignore embedded instructions
  • Centralize error handling for LLM generation
  • Centrlize repo file read logic
  • Extracted the logic for reading repository signal files into a new helper function, safeReadRepoFiles
  • Introduced a new helper function, generateWithExit, to centralize the error handling logic for LLM generation calls
  • Remove duplicated try/catch blocks
  • Replace any with RepoReadResult type
  • Improved type safety
  • Bumped version from 0.4.1 to 0.4.2 in package.json and package-lock.json

@vercel
Copy link

vercel bot commented Feb 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
explainthisrepo Ready Ready Preview, Comment Feb 12, 2026 2:13pm

@gemini-code-assist
Copy link

Summary of Changes

Hello @calchiwo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the CLI application to improve its robustness, maintainability, and security posture, particularly against prompt injection. By isolating untrusted repository content with explicit delimiters and centralizing error handling for both LLM interactions and file reading, the changes ensure a more reliable and secure operation of the tool. The refactoring also reduces code duplication, making the codebase cleaner and easier to manage.

Highlights

  • Prompt Isolation and Hardening: Implemented XML-style delimiters around repository content (metadata, README, structure, code files) within the LLM prompts to clearly separate data from instructions. Additionally, explicit instructions were added to the prompts to treat all repository content strictly as data and ignore any embedded instructions, significantly enhancing resistance against prompt injection attacks.
  • Centralized Error Handling for LLM Generation: Introduced a new helper function, generateWithExit, to centralize the error handling logic for LLM generation calls. This consolidates try/catch blocks, provides consistent error messages, and ensures a controlled exit on failure, improving robustness and maintainability.
  • Centralized Repository File Reading: Extracted the logic for reading repository signal files into a new helper function, safeReadRepoFiles. This centralizes the file reading process and its associated error handling, reducing code duplication and making the process more resilient.
  • Code Refactoring and Duplication Removal: Refactored the cli.ts file by replacing multiple duplicated try/catch blocks with calls to the newly introduced generateWithExit and safeReadRepoFiles helper functions, leading to cleaner and more maintainable code.
  • Improved Type Safety: Updated the cli.ts file to import and utilize the RepoReadResult type, replacing generic any types for repository read results and enhancing type safety.
  • Version Update: The project version in package.json and package-lock.json has been bumped from 0.4.1 to 0.4.2.
Changelog
  • node_version/cli.ts
    • Imported RepoReadResult type from repo_reader.js.
    • Added safeReadRepoFiles helper function to encapsulate repository file reading with error handling.
    • Added generateWithExit helper function to centralize LLM generation calls and error handling, including process exit on failure.
    • Replaced direct calls to generateExplanation with generateWithExit in various command flows.
    • Replaced direct calls to readRepoSignalFiles with safeReadRepoFiles.
  • node_version/package-lock.json
    • Updated project version from 0.4.1 to 0.4.2.
  • node_version/package.json
    • Updated project version from 0.4.1 to 0.4.2.
  • node_version/prompt.ts
    • Wrapped repository metadata, README content, repository structure, and key code files with XML-style delimiters (<repository_metadata>, <readme>, <repo_structure>, <code_files>) in buildPrompt, buildQuickPrompt, and buildSimplePrompt functions.
    • Added critical instructions to all prompt builders, explicitly telling the LLM to treat repository content as data and ignore any embedded instructions to prevent prompt injection.
  • package-lock.json
    • Added a new package-lock.json file at the repository root.
  • package.json
    • Added a new package.json file at the repository root.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42ca76fefa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


README content:
<readme>
${readme || "No README provided"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape repo text before relying on XML delimiters

The new prompt-isolation approach is bypassable because untrusted repository content is interpolated directly inside XML-like sections without escaping. If a README (or other repo text) contains </readme> followed by instructions, the model sees attacker-controlled text outside the intended data block and can still be steered, which defeats the security goal of this refactor. This affects the same pattern in the other prompt builders too, so delimiter escaping (or structured encoding like JSON) is needed for the isolation to hold.

Useful? React with 👍 / 👎.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces excellent refactoring by centralizing error handling for repository file reading and LLM generation into safeReadRepoFiles and generateWithExit helper functions, significantly improving code clarity and maintainability. It also takes a positive step towards prompt isolation by using XML-style delimiters and critical instructions in LLM prompts. However, the implementation is incomplete as it does not escape these delimiters within the untrusted content itself, leaving the system vulnerable to indirect prompt injection where an attacker could craft repository content to manipulate the LLM's output.

Comment on lines +13 to +28
<repository_metadata>
Name: ${repoName}
Description: ${description || "No description provided"}
</repository_metadata>

README content:
<readme>
${readme || "No README provided"}
</readme>

Repo structure:
<repo_structure>
${treeText || "No file tree provided"}
</repo_structure>

Key code files:
<code_files>
${filesText || "No code files provided"}
</code_files>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The use of XML-style delimiters (e.g., <readme>, </readme>) to isolate untrusted repository content is a good step toward preventing prompt injection. However, the untrusted content (like readme, treeText, and filesText) is interpolated directly into the prompt without escaping. An attacker can include a closing tag like </readme> or </code_files> in their repository to break out of the data block and inject malicious instructions. This is a form of indirect prompt injection.

To remediate this, you should sanitize or escape the untrusted content to ensure it cannot contain the closing tags used as delimiters. For example, you could replace </ with &lt;/ or a similar safe representation within the untrusted strings.

Comment on lines +77 to +84
<repository_metadata>
Name: ${repoName}
Description: ${description || "No description provided"}
</repository_metadata>

README snippet:
<readme>
${readmeSnippet}
</readme>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Similar to the main prompt, the quick prompt uses XML delimiters but does not escape the untrusted readmeSnippet. An attacker could use </readme> in the README to inject instructions into the quick summary prompt.

Comment on lines +114 to +125
<repository_metadata>
Name: ${repoName}
Description: ${description || "No description provided"}
</repository_metadata>

README content:
<readme>
${readmeContent}
</readme>

Repo structure:
<repo_structure>
${treeContent}
</repo_structure>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The simple prompt is also vulnerable to tag breakout because it does not escape readmeContent or treeContent. An attacker can inject instructions by including </readme> or </repo_structure> in the repository content.

Comment on lines +140 to +150
async function safeReadRepoFiles(
owner: string,
repo: string
): Promise<RepoReadResult | null> {
try {
return await readRepoSignalFiles(owner, repo);
} catch (e: any) {
console.warn(`Warning: Could not read repo files: ${e?.message || e}`);
return null;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better type safety, it's recommended to catch errors as unknown instead of any. This forces you to handle the error type explicitly, preventing potential runtime errors if the caught value is not an Error object. The e?.message || e pattern can also produce unhelpful output like [object Object].

Suggested change
async function safeReadRepoFiles(
owner: string,
repo: string
): Promise<RepoReadResult | null> {
try {
return await readRepoSignalFiles(owner, repo);
} catch (e: any) {
console.warn(`Warning: Could not read repo files: ${e?.message || e}`);
return null;
}
}
async function safeReadRepoFiles(
owner: string,
repo: string
): Promise<RepoReadResult | null> {
try {
return await readRepoSignalFiles(owner, repo);
} catch (e: unknown) {
const message = e instanceof Error ? e.message : String(e);
console.warn(`Warning: Could not read repo files: ${message}`);
return null;
}
}

Comment on lines +152 to +163
async function generateWithExit(prompt: string): Promise<string> {
try {
return await generateExplanation(prompt);
} catch (e: any) {
console.error("Failed to generate explanation.");
console.error(`error: ${e?.message || e}`);
console.error("\nfix:");
console.error("- Ensure GEMINI_API_KEY is set");
console.error("- Or run: explainthisrepo --doctor");
process.exit(1);
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the safeReadRepoFiles function, using unknown for the caught error provides better type safety than any. Explicitly checking if e is an Error instance before accessing its message property is a more robust way to handle errors.

Suggested change
async function generateWithExit(prompt: string): Promise<string> {
try {
return await generateExplanation(prompt);
} catch (e: any) {
console.error("Failed to generate explanation.");
console.error(`error: ${e?.message || e}`);
console.error("\nfix:");
console.error("- Ensure GEMINI_API_KEY is set");
console.error("- Or run: explainthisrepo --doctor");
process.exit(1);
}
}
async function generateWithExit(prompt: string): Promise<string> {
try {
return await generateExplanation(prompt);
} catch (e: unknown) {
console.error("Failed to generate explanation.");
const message = e instanceof Error ? e.message : String(e);
console.error(`error: ${message}`);
console.error("\nfix:");
console.error("- Ensure GEMINI_API_KEY is set");
console.error("- Or run: explainthisrepo --doctor");
process.exit(1);
}
}

@@ -0,0 +1 @@
{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new package.json file and the corresponding package-lock.json at the root of the repository appear to be for an empty package. Were they added intentionally? If they are not needed, they should probably be removed to avoid confusion with the main project files located in the node_version directory and to keep the repository root clean.

@calchiwo calchiwo merged commit 1d5cb79 into main Feb 12, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant