Skip to content

Add server instructions and make all instructions externalized and overridable#161

Merged
keyurva merged 11 commits intodatacommonsorg:mainfrom
keyurva:instructions
Feb 13, 2026
Merged

Add server instructions and make all instructions externalized and overridable#161
keyurva merged 11 commits intodatacommonsorg:mainfrom
keyurva:instructions

Conversation

@keyurva
Copy link
Contributor

@keyurva keyurva commented Feb 13, 2026

  • Refactor the MCP server to load tool and server instructions from external markdown files.
  • Add DC_INSTRUCTIONS_DIR configuration to allow overriding instructions at runtime.
  • Create a DCApp wrapper to wrap all global objects and move tool implementations to tools.py.

New Features

  • Server Instructions: Server instructions are now loaded from server.md.
  • Instruction Overrides: Set DC_INSTRUCTIONS_DIR=<path> to override system prompts (server.md) and tool descriptions (tools/<tool_name>.md).

Testing Instructions

  1. Run the server with default instructions:
    uv run datacommons-mcp serve http --port 8080
  2. Override instructions by creating a custom directory:
    mkdir -p custom_instructions/tools
    echo "Custom Data Commons server instructions." > custom_instructions/server.md
    echo "Custom get_observations tool instructions." > custom_instructions/get_observations.md
    DC_INSTRUCTIONS_DIR=custom_instructions uv run datacommons-mcp serve http --port 8080
    

With this, the server uses your custom server and tool instructions.
```

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @keyurva, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Data Commons MCP server to enhance its configurability and modularity. By externalizing server and tool instructions into markdown files and introducing a mechanism for overriding them via an environment variable, the system becomes more adaptable to custom requirements. The introduction of a central DCApp class streamlines application setup and tool management, leading to a cleaner and more maintainable codebase.

Highlights

  • Externalized Instructions: Refactored the MCP server to load both server-level system prompts (server.md) and individual tool descriptions (tools/<tool_name>.md) from external markdown files. This enhances flexibility and allows for easier customization of prompts.
  • Instruction Overrides via Configuration: Introduced a new configuration option, DC_INSTRUCTIONS_DIR, which allows users to specify a custom directory for instruction files. If set, the server will prioritize loading instructions from this directory, falling back to default package resources if a file is not found, enabling runtime overriding of prompts.
  • Centralized Application Logic (DCApp): Created a DCApp wrapper class to encapsulate global objects such as settings and the Data Commons client, and to manage the FastMCP instance. This centralizes application setup and provides a structured way to register tools with their dynamically loaded instructions.
  • Modular Tool Implementations: Moved the implementation details of get_observations and search_indicators tools from server.py into a new tools.py module. This improves modularity and separation of concerns, making server.py primarily responsible for routing and tool registration.
Changelog
  • packages/datacommons-mcp/.env.sample
    • Added DC_INSTRUCTIONS_DIR environment variable to allow specifying a custom directory for instruction files.
  • packages/datacommons-mcp/datacommons_mcp/app.py
    • Added new file app.py to introduce the DCApp class, which centralizes application initialization, settings, client creation, and FastMCP instance management.
    • Implemented logic within DCApp to load server instructions and register tools, supporting both default package resources and external overrides.
  • packages/datacommons-mcp/datacommons_mcp/data_models/settings.py
    • Added instructions_dir field to DCSettings model to store the path for custom instruction files.
  • packages/datacommons-mcp/datacommons_mcp/instructions/server.md
    • Added new file server.md containing the default system prompt for the DC MCP Server.
  • packages/datacommons-mcp/datacommons_mcp/instructions/tools/get_observations.md
    • Added new file get_observations.md containing detailed markdown instructions for the get_observations tool.
  • packages/datacommons-mcp/datacommons_mcp/instructions/tools/search_indicators.md
    • Added new file search_indicators.md containing detailed markdown instructions for the search_indicators tool.
  • packages/datacommons-mcp/datacommons_mcp/server.py
    • Removed direct FastMCP initialization and client creation, now handled by the DCApp instance.
    • Removed inline get_observations and search_indicators tool definitions, delegating their implementation to tools.py.
    • Updated to import and use the global mcp instance from app.py.
    • Registered tools using app.register_tool with references to the new tools.py implementations and their respective markdown instruction files.
  • packages/datacommons-mcp/datacommons_mcp/tools.py
    • Added new file tools.py to house the concrete implementations of get_observations and search_indicators functions, separating them from the server.py module.
  • packages/datacommons-mcp/datacommons_mcp/utils.py
    • Added read_external_content function to read file content from a specified external path.
    • Added read_package_content function to read file content from Python package resources.
  • packages/datacommons-mcp/tests/conftest.py
    • Added create_test_file pytest fixture to simplify the creation of temporary files for testing purposes.
  • packages/datacommons-mcp/tests/test_app.py
    • Added new file test_app.py containing unit tests for the DCApp class, covering its initialization, instruction loading, and tool registration logic.
  • packages/datacommons-mcp/tests/test_utils.py
    • Added new test cases for read_external_content and read_package_content utility functions.
  • uv.lock
    • Updated datacommons-mcp package version from 1.1.6 to 1.1.7.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring by externalizing server and tool instructions into markdown files, making the system much more configurable, and improves architecture with the DCApp class for global state management. However, a critical security issue was identified where the Data Commons API key is logged in plain text during server initialization. This sensitive information must be excluded from logs. Additionally, there are suggestions to further improve the design by reducing module coupling and simplifying a utility function.

Comment on lines +47 to +49
logger.info(
"Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The application logs the entire configuration object using self.settings.model_dump_json(indent=2), which includes the api_key (aliased from DC_API_KEY). This results in sensitive credentials being written to the application logs in plain text. An attacker with access to the logs could use this API key to make unauthorized requests to the Data Commons API or exhaust the account's quota.

Suggested change
logger.info(
"Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2)
)
logger.info(
"Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2, exclude={"api_key"})
)
References
  1. Do not log sensitive information like API keys, even if not within an exception object, to prevent sensitive data exposure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to not log the key but log an indicator (SET, NOT_SET) on whether one was provided or not.


from typing import TYPE_CHECKING

from datacommons_mcp.app import app
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Importing the global app instance here creates a tight coupling between tools.py and app.py. This can lead to circular dependencies (e.g., server.py -> tools.py -> app.py) which makes the code harder to test and maintain.

To improve this, consider using dependency injection. The client could be passed to the tool functions, for example, by creating a class to hold the tools and initializing it with the client in app.py. This would break the cycle and make the tool functions more self-contained and easier to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree in principle but since tools need to be pure functions, this coupling is intentional.

Comment on lines +139 to +145
parts = filename.split("/")
# Start at instructions package
resource = importlib.resources.files(package)

# Traverse down the path
for part in parts:
resource = resource.joinpath(part)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for constructing the resource path can be simplified. The joinpath method of a Traversable object can handle path segments with / separators directly, so you don't need to split the filename and loop through the parts.

Suggested change
parts = filename.split("/")
# Start at instructions package
resource = importlib.resources.files(package)
# Traverse down the path
for part in parts:
resource = resource.joinpath(part)
resource = importlib.resources.files(package).joinpath(filename)

Copy link
Contributor

@clincoln8 clincoln8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Keyur!

description="Whether to use the legacy search-indicators endpoint (True) or the client library (False) for fetching indicators.",
)

instructions_dir: str | None = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider calling this either "tool_instructions_dir" or even "tool_descriptions_dir" throughout this PR (including the env var name) to be more explicit of what this represents.
"instructions_dir" is a little vague could be interpreted as instructions for someone running the server

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after looking through the rest of the PR, I see that this dir is for both server instructions and tool descriptions so potentially ignore this comment. although I still think it's a little vague, I don't have a better suggestion 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree and couldn't think of a better name myself! Going with this for now and update it if we think of anything better before cutting a prod release.

},
"status": "SUCCESS"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close json block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - thanks! Done.


[tool.setuptools]
include-package-data = true
include-package-data = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment for why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment. When I ran uv build locally, it was picking up certain gitignored file and making this false prevented that.

Comment on lines +50 to +53
# Directory containing custom instruction files (markdown)
# Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md).
# Files must match standard names (e.g., server.md, tools/get_observations.md).
# DC_INSTRUCTIONS_DIR=/path/to/custom/instructions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a suggestion for how I might reword this, but feel free to ignore!

Suggested change
# Directory containing custom instruction files (markdown)
# Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md).
# Files must match standard names (e.g., server.md, tools/get_observations.md).
# DC_INSTRUCTIONS_DIR=/path/to/custom/instructions
# Path to directory containing markdown file overrides for server instructions and/or tool descriptions.
# Supports partial overrides: only create files for the specific
# instructions or tools you want to replace. The system will fall back to package defaults for any file not found here.
#
# Expected structure inside this directory:
# - server.md
# - tools/{tool_name}.md
DC_INSTRUCTIONS_DIR=/path/to/custom/instructions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perfect - thanks! Done.

# Dump the Pydantic model to a dictionary
return response.model_dump(exclude_none=True)
# Register tools
app.register_tool(tools.get_observations, "tools/get_observations.md")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on registering the tools inside tools.py instead of in server.py?
I don't feel strongly either way, just curious on your opinion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subjective call on keeping the "wiring" code in server.py. Also, if we want to conditionally add tools in the future, server.py seemed like the better place to do it.

return response.model_dump(exclude_none=True)
# Register tools
app.register_tool(tools.get_observations, "tools/get_observations.md")
app.register_tool(tools.search_indicators, "tools/search_indicators.md")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider moving the filename strings into constants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@keyurva keyurva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

description="Whether to use the legacy search-indicators endpoint (True) or the client library (False) for fetching indicators.",
)

instructions_dir: str | None = Field(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree and couldn't think of a better name myself! Going with this for now and update it if we think of anything better before cutting a prod release.

},
"status": "SUCCESS"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - thanks! Done.

Comment on lines +47 to +49
logger.info(
"Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to not log the key but log an indicator (SET, NOT_SET) on whether one was provided or not.

# Dump the Pydantic model to a dictionary
return response.model_dump(exclude_none=True)
# Register tools
app.register_tool(tools.get_observations, "tools/get_observations.md")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subjective call on keeping the "wiring" code in server.py. Also, if we want to conditionally add tools in the future, server.py seemed like the better place to do it.

return response.model_dump(exclude_none=True)
# Register tools
app.register_tool(tools.get_observations, "tools/get_observations.md")
app.register_tool(tools.search_indicators, "tools/search_indicators.md")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +50 to +53
# Directory containing custom instruction files (markdown)
# Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md).
# Files must match standard names (e.g., server.md, tools/get_observations.md).
# DC_INSTRUCTIONS_DIR=/path/to/custom/instructions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perfect - thanks! Done.


[tool.setuptools]
include-package-data = true
include-package-data = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment. When I ran uv build locally, it was picking up certain gitignored file and making this false prevented that.

@keyurva keyurva merged commit b9372bc into datacommonsorg:main Feb 13, 2026
9 of 10 checks passed
@keyurva keyurva deleted the instructions branch February 13, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants