Add server instructions and make all instructions externalized and overridable#161
Add server instructions and make all instructions externalized and overridable#161keyurva merged 11 commits intodatacommonsorg:mainfrom
Conversation
Summary of ChangesHello @keyurva, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the Data Commons MCP server to enhance its configurability and modularity. By externalizing server and tool instructions into markdown files and introducing a mechanism for overriding them via an environment variable, the system becomes more adaptable to custom requirements. The introduction of a central Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant refactoring by externalizing server and tool instructions into markdown files, making the system much more configurable, and improves architecture with the DCApp class for global state management. However, a critical security issue was identified where the Data Commons API key is logged in plain text during server initialization. This sensitive information must be excluded from logs. Additionally, there are suggestions to further improve the design by reducing module coupling and simplifying a utility function.
| logger.info( | ||
| "Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2) | ||
| ) |
There was a problem hiding this comment.
The application logs the entire configuration object using self.settings.model_dump_json(indent=2), which includes the api_key (aliased from DC_API_KEY). This results in sensitive credentials being written to the application logs in plain text. An attacker with access to the logs could use this API key to make unauthorized requests to the Data Commons API or exhaust the account's quota.
| logger.info( | |
| "Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2) | |
| ) | |
| logger.info( | |
| "Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2, exclude={"api_key"}) | |
| ) |
References
- Do not log sensitive information like API keys, even if not within an exception object, to prevent sensitive data exposure.
There was a problem hiding this comment.
Changed to not log the key but log an indicator (SET, NOT_SET) on whether one was provided or not.
|
|
||
| from typing import TYPE_CHECKING | ||
|
|
||
| from datacommons_mcp.app import app |
There was a problem hiding this comment.
Importing the global app instance here creates a tight coupling between tools.py and app.py. This can lead to circular dependencies (e.g., server.py -> tools.py -> app.py) which makes the code harder to test and maintain.
To improve this, consider using dependency injection. The client could be passed to the tool functions, for example, by creating a class to hold the tools and initializing it with the client in app.py. This would break the cycle and make the tool functions more self-contained and easier to test.
There was a problem hiding this comment.
Agree in principle but since tools need to be pure functions, this coupling is intentional.
| parts = filename.split("/") | ||
| # Start at instructions package | ||
| resource = importlib.resources.files(package) | ||
|
|
||
| # Traverse down the path | ||
| for part in parts: | ||
| resource = resource.joinpath(part) |
There was a problem hiding this comment.
The logic for constructing the resource path can be simplified. The joinpath method of a Traversable object can handle path segments with / separators directly, so you don't need to split the filename and loop through the parts.
| parts = filename.split("/") | |
| # Start at instructions package | |
| resource = importlib.resources.files(package) | |
| # Traverse down the path | |
| for part in parts: | |
| resource = resource.joinpath(part) | |
| resource = importlib.resources.files(package).joinpath(filename) |
| description="Whether to use the legacy search-indicators endpoint (True) or the client library (False) for fetching indicators.", | ||
| ) | ||
|
|
||
| instructions_dir: str | None = Field( |
There was a problem hiding this comment.
consider calling this either "tool_instructions_dir" or even "tool_descriptions_dir" throughout this PR (including the env var name) to be more explicit of what this represents.
"instructions_dir" is a little vague could be interpreted as instructions for someone running the server
There was a problem hiding this comment.
after looking through the rest of the PR, I see that this dir is for both server instructions and tool descriptions so potentially ignore this comment. although I still think it's a little vague, I don't have a better suggestion 😅
There was a problem hiding this comment.
Agree and couldn't think of a better name myself! Going with this for now and update it if we think of anything better before cutting a prod release.
| }, | ||
| "status": "SUCCESS" | ||
| } | ||
|
|
|
|
||
| [tool.setuptools] | ||
| include-package-data = true | ||
| include-package-data = false |
There was a problem hiding this comment.
can you add a comment for why this change?
There was a problem hiding this comment.
Added comment. When I ran uv build locally, it was picking up certain gitignored file and making this false prevented that.
packages/datacommons-mcp/.env.sample
Outdated
| # Directory containing custom instruction files (markdown) | ||
| # Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md). | ||
| # Files must match standard names (e.g., server.md, tools/get_observations.md). | ||
| # DC_INSTRUCTIONS_DIR=/path/to/custom/instructions |
There was a problem hiding this comment.
Here's a suggestion for how I might reword this, but feel free to ignore!
| # Directory containing custom instruction files (markdown) | |
| # Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md). | |
| # Files must match standard names (e.g., server.md, tools/get_observations.md). | |
| # DC_INSTRUCTIONS_DIR=/path/to/custom/instructions | |
| # Path to directory containing markdown file overrides for server instructions and/or tool descriptions. | |
| # Supports partial overrides: only create files for the specific | |
| # instructions or tools you want to replace. The system will fall back to package defaults for any file not found here. | |
| # | |
| # Expected structure inside this directory: | |
| # - server.md | |
| # - tools/{tool_name}.md | |
| DC_INSTRUCTIONS_DIR=/path/to/custom/instructions |
There was a problem hiding this comment.
This is perfect - thanks! Done.
| # Dump the Pydantic model to a dictionary | ||
| return response.model_dump(exclude_none=True) | ||
| # Register tools | ||
| app.register_tool(tools.get_observations, "tools/get_observations.md") |
There was a problem hiding this comment.
thoughts on registering the tools inside tools.py instead of in server.py?
I don't feel strongly either way, just curious on your opinion here.
There was a problem hiding this comment.
Subjective call on keeping the "wiring" code in server.py. Also, if we want to conditionally add tools in the future, server.py seemed like the better place to do it.
| return response.model_dump(exclude_none=True) | ||
| # Register tools | ||
| app.register_tool(tools.get_observations, "tools/get_observations.md") | ||
| app.register_tool(tools.search_indicators, "tools/search_indicators.md") |
There was a problem hiding this comment.
nit: consider moving the filename strings into constants
keyurva
left a comment
There was a problem hiding this comment.
Thanks for the review!
| description="Whether to use the legacy search-indicators endpoint (True) or the client library (False) for fetching indicators.", | ||
| ) | ||
|
|
||
| instructions_dir: str | None = Field( |
There was a problem hiding this comment.
Agree and couldn't think of a better name myself! Going with this for now and update it if we think of anything better before cutting a prod release.
| }, | ||
| "status": "SUCCESS" | ||
| } | ||
|
|
| logger.info( | ||
| "Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2) | ||
| ) |
There was a problem hiding this comment.
Changed to not log the key but log an indicator (SET, NOT_SET) on whether one was provided or not.
| # Dump the Pydantic model to a dictionary | ||
| return response.model_dump(exclude_none=True) | ||
| # Register tools | ||
| app.register_tool(tools.get_observations, "tools/get_observations.md") |
There was a problem hiding this comment.
Subjective call on keeping the "wiring" code in server.py. Also, if we want to conditionally add tools in the future, server.py seemed like the better place to do it.
| return response.model_dump(exclude_none=True) | ||
| # Register tools | ||
| app.register_tool(tools.get_observations, "tools/get_observations.md") | ||
| app.register_tool(tools.search_indicators, "tools/search_indicators.md") |
packages/datacommons-mcp/.env.sample
Outdated
| # Directory containing custom instruction files (markdown) | ||
| # Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md). | ||
| # Files must match standard names (e.g., server.md, tools/get_observations.md). | ||
| # DC_INSTRUCTIONS_DIR=/path/to/custom/instructions |
There was a problem hiding this comment.
This is perfect - thanks! Done.
|
|
||
| [tool.setuptools] | ||
| include-package-data = true | ||
| include-package-data = false |
There was a problem hiding this comment.
Added comment. When I ran uv build locally, it was picking up certain gitignored file and making this false prevented that.
DC_INSTRUCTIONS_DIRconfiguration to allow overriding instructions at runtime.DCAppwrapper to wrap all global objects and move tool implementations totools.py.New Features
server.md.DC_INSTRUCTIONS_DIR=<path>to override system prompts (server.md) and tool descriptions (tools/<tool_name>.md).Testing Instructions
uv run datacommons-mcp serve http --port 8080With this, the server uses your custom server and tool instructions.
```