Add server instructions and make all instructions externalized and overridable by keyurva · Pull Request #161 · datacommonsorg/agent-toolkit

keyurva · 2026-02-13T18:09:57Z

Refactor the MCP server to load tool and server instructions from external markdown files.
Add DC_INSTRUCTIONS_DIR configuration to allow overriding instructions at runtime.
Create a DCApp wrapper to wrap all global objects and move tool implementations to tools.py.

New Features

Server Instructions: Server instructions are now loaded from server.md.
Instruction Overrides: Set DC_INSTRUCTIONS_DIR=<path> to override system prompts (server.md) and tool descriptions (tools/<tool_name>.md).

Testing Instructions

Run the server with default instructions:
uv run datacommons-mcp serve http --port 8080

Override instructions by creating a custom directory:

mkdir -p custom_instructions/tools
echo "Custom Data Commons server instructions." > custom_instructions/server.md
echo "Custom get_observations tool instructions." > custom_instructions/get_observations.md
DC_INSTRUCTIONS_DIR=custom_instructions uv run datacommons-mcp serve http --port 8080

With this, the server uses your custom server and tool instructions.
```

gemini-code-assist · 2026-02-13T18:10:25Z

Summary of Changes

Hello @keyurva, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Data Commons MCP server to enhance its configurability and modularity. By externalizing server and tool instructions into markdown files and introducing a mechanism for overriding them via an environment variable, the system becomes more adaptable to custom requirements. The introduction of a central DCApp class streamlines application setup and tool management, leading to a cleaner and more maintainable codebase.

Highlights

Externalized Instructions: Refactored the MCP server to load both server-level system prompts (server.md) and individual tool descriptions (tools/<tool_name>.md) from external markdown files. This enhances flexibility and allows for easier customization of prompts.
Instruction Overrides via Configuration: Introduced a new configuration option, DC_INSTRUCTIONS_DIR, which allows users to specify a custom directory for instruction files. If set, the server will prioritize loading instructions from this directory, falling back to default package resources if a file is not found, enabling runtime overriding of prompts.
Centralized Application Logic (DCApp): Created a DCApp wrapper class to encapsulate global objects such as settings and the Data Commons client, and to manage the FastMCP instance. This centralizes application setup and provides a structured way to register tools with their dynamically loaded instructions.
Modular Tool Implementations: Moved the implementation details of get_observations and search_indicators tools from server.py into a new tools.py module. This improves modularity and separation of concerns, making server.py primarily responsible for routing and tool registration.

Changelog

packages/datacommons-mcp/.env.sample
- Added DC_INSTRUCTIONS_DIR environment variable to allow specifying a custom directory for instruction files.
packages/datacommons-mcp/datacommons_mcp/app.py
- Added new file app.py to introduce the DCApp class, which centralizes application initialization, settings, client creation, and FastMCP instance management.
- Implemented logic within DCApp to load server instructions and register tools, supporting both default package resources and external overrides.
packages/datacommons-mcp/datacommons_mcp/data_models/settings.py
- Added instructions_dir field to DCSettings model to store the path for custom instruction files.
packages/datacommons-mcp/datacommons_mcp/instructions/server.md
- Added new file server.md containing the default system prompt for the DC MCP Server.
packages/datacommons-mcp/datacommons_mcp/instructions/tools/get_observations.md
- Added new file get_observations.md containing detailed markdown instructions for the get_observations tool.
packages/datacommons-mcp/datacommons_mcp/instructions/tools/search_indicators.md
- Added new file search_indicators.md containing detailed markdown instructions for the search_indicators tool.
packages/datacommons-mcp/datacommons_mcp/server.py
- Removed direct FastMCP initialization and client creation, now handled by the DCApp instance.
- Removed inline get_observations and search_indicators tool definitions, delegating their implementation to tools.py.
- Updated to import and use the global mcp instance from app.py.
- Registered tools using app.register_tool with references to the new tools.py implementations and their respective markdown instruction files.
packages/datacommons-mcp/datacommons_mcp/tools.py
- Added new file tools.py to house the concrete implementations of get_observations and search_indicators functions, separating them from the server.py module.
packages/datacommons-mcp/datacommons_mcp/utils.py
- Added read_external_content function to read file content from a specified external path.
- Added read_package_content function to read file content from Python package resources.
packages/datacommons-mcp/tests/conftest.py
- Added create_test_file pytest fixture to simplify the creation of temporary files for testing purposes.
packages/datacommons-mcp/tests/test_app.py
- Added new file test_app.py containing unit tests for the DCApp class, covering its initialization, instruction loading, and tool registration logic.
packages/datacommons-mcp/tests/test_utils.py
- Added new test cases for read_external_content and read_package_content utility functions.
uv.lock
- Updated datacommons-mcp package version from 1.1.6 to 1.1.7.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant refactoring by externalizing server and tool instructions into markdown files, making the system much more configurable, and improves architecture with the DCApp class for global state management. However, a critical security issue was identified where the Data Commons API key is logged in plain text during server initialization. This sensitive information must be excluded from logs. Additionally, there are suggestions to further improve the design by reducing module coupling and simplifying a utility function.

gemini-code-assist · 2026-02-13T18:13:34Z

packages/datacommons-mcp/datacommons_mcp/app.py

+            logger.info(
+                "Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2)
+            )


The application logs the entire configuration object using self.settings.model_dump_json(indent=2), which includes the api_key (aliased from DC_API_KEY). This results in sensitive credentials being written to the application logs in plain text. An attacker with access to the logs could use this API key to make unauthorized requests to the Data Commons API or exhaust the account's quota.

Suggested change

logger.info(

"Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2)

)

logger.info(

"Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2, exclude={"api_key"})

)

References

Do not log sensitive information like API keys, even if not within an exception object, to prevent sensitive data exposure.

Changed to not log the key but log an indicator (SET, NOT_SET) on whether one was provided or not.

packages/datacommons-mcp/datacommons_mcp/app.py

gemini-code-assist · 2026-02-13T18:13:34Z

packages/datacommons-mcp/datacommons_mcp/tools.py

+
+from typing import TYPE_CHECKING
+
+from datacommons_mcp.app import app


Importing the global app instance here creates a tight coupling between tools.py and app.py. This can lead to circular dependencies (e.g., server.py -> tools.py -> app.py) which makes the code harder to test and maintain.

To improve this, consider using dependency injection. The client could be passed to the tool functions, for example, by creating a class to hold the tools and initializing it with the client in app.py. This would break the cycle and make the tool functions more self-contained and easier to test.

Agree in principle but since tools need to be pure functions, this coupling is intentional.

gemini-code-assist · 2026-02-13T18:13:34Z

packages/datacommons-mcp/datacommons_mcp/utils.py

+        parts = filename.split("/")
+        # Start at instructions package
+        resource = importlib.resources.files(package)
+
+        # Traverse down the path
+        for part in parts:
+            resource = resource.joinpath(part)


The logic for constructing the resource path can be simplified. The joinpath method of a Traversable object can handle path segments with / separators directly, so you don't need to split the filename and loop through the parts.

Suggested change

parts = filename.split("/")

# Start at instructions package

resource = importlib.resources.files(package)

# Traverse down the path

for part in parts:

resource = resource.joinpath(part)

resource = importlib.resources.files(package).joinpath(filename)

clincoln8

Thanks Keyur!

clincoln8 · 2026-02-13T18:55:23Z

packages/datacommons-mcp/datacommons_mcp/data_models/settings.py

        description="Whether to use the legacy search-indicators endpoint (True) or the client library (False) for fetching indicators.",
    )

+    instructions_dir: str | None = Field(


consider calling this either "tool_instructions_dir" or even "tool_descriptions_dir" throughout this PR (including the env var name) to be more explicit of what this represents.
"instructions_dir" is a little vague could be interpreted as instructions for someone running the server

after looking through the rest of the PR, I see that this dir is for both server instructions and tool descriptions so potentially ignore this comment. although I still think it's a little vague, I don't have a better suggestion 😅

Agree and couldn't think of a better name myself! Going with this for now and update it if we think of anything better before cutting a prod release.

clincoln8 · 2026-02-13T18:56:42Z

packages/datacommons-mcp/datacommons_mcp/instructions/tools/search_indicators.md

+  },
+  "status": "SUCCESS"
+}
+


close json block

Ah - thanks! Done.

clincoln8 · 2026-02-13T19:04:30Z

packages/datacommons-mcp/pyproject.toml


 [tool.setuptools]
-include-package-data = true
+include-package-data = false


can you add a comment for why this change?

Added comment. When I ran uv build locally, it was picking up certain gitignored file and making this false prevented that.

clincoln8 · 2026-02-13T19:26:55Z

packages/datacommons-mcp/.env.sample

+# Directory containing custom instruction files (markdown)
+# Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md).
+# Files must match standard names (e.g., server.md, tools/get_observations.md).
+# DC_INSTRUCTIONS_DIR=/path/to/custom/instructions


Here's a suggestion for how I might reword this, but feel free to ignore!

Suggested change

# Directory containing custom instruction files (markdown)

# Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md).

# Files must match standard names (e.g., server.md, tools/get_observations.md).

# DC_INSTRUCTIONS_DIR=/path/to/custom/instructions

# Path to directory containing markdown file overrides for server instructions and/or tool descriptions.

# Supports partial overrides: only create files for the specific

# instructions or tools you want to replace. The system will fall back to package defaults for any file not found here.

#

# Expected structure inside this directory:

# - server.md

# - tools/{tool_name}.md

DC_INSTRUCTIONS_DIR=/path/to/custom/instructions

This is perfect - thanks! Done.

clincoln8 · 2026-02-13T19:30:20Z

packages/datacommons-mcp/datacommons_mcp/server.py

-    # Dump the Pydantic model to a dictionary
-    return response.model_dump(exclude_none=True)
+# Register tools
+app.register_tool(tools.get_observations, "tools/get_observations.md")


thoughts on registering the tools inside tools.py instead of in server.py?
I don't feel strongly either way, just curious on your opinion here.

Subjective call on keeping the "wiring" code in server.py. Also, if we want to conditionally add tools in the future, server.py seemed like the better place to do it.

clincoln8 · 2026-02-13T19:36:20Z

packages/datacommons-mcp/datacommons_mcp/server.py

-    return response.model_dump(exclude_none=True)
+# Register tools
+app.register_tool(tools.get_observations, "tools/get_observations.md")
+app.register_tool(tools.search_indicators, "tools/search_indicators.md")


nit: consider moving the filename strings into constants

keyurva

Thanks for the review!

keyurva · 2026-02-13T20:34:13Z

packages/datacommons-mcp/datacommons_mcp/data_models/settings.py

        description="Whether to use the legacy search-indicators endpoint (True) or the client library (False) for fetching indicators.",
    )

+    instructions_dir: str | None = Field(


Agree and couldn't think of a better name myself! Going with this for now and update it if we think of anything better before cutting a prod release.

keyurva · 2026-02-13T20:35:16Z

packages/datacommons-mcp/datacommons_mcp/instructions/tools/search_indicators.md

+  },
+  "status": "SUCCESS"
+}
+


Ah - thanks! Done.

keyurva · 2026-02-13T20:41:27Z

packages/datacommons-mcp/datacommons_mcp/app.py

+            logger.info(
+                "Loaded DC settings:\n%s", self.settings.model_dump_json(indent=2)
+            )


Changed to not log the key but log an indicator (SET, NOT_SET) on whether one was provided or not.

keyurva · 2026-02-13T20:44:47Z

packages/datacommons-mcp/datacommons_mcp/server.py

-    # Dump the Pydantic model to a dictionary
-    return response.model_dump(exclude_none=True)
+# Register tools
+app.register_tool(tools.get_observations, "tools/get_observations.md")


Subjective call on keeping the "wiring" code in server.py. Also, if we want to conditionally add tools in the future, server.py seemed like the better place to do it.

keyurva · 2026-02-13T20:48:37Z

packages/datacommons-mcp/datacommons_mcp/server.py

-    return response.model_dump(exclude_none=True)
+# Register tools
+app.register_tool(tools.get_observations, "tools/get_observations.md")
+app.register_tool(tools.search_indicators, "tools/search_indicators.md")


keyurva · 2026-02-13T20:52:26Z

packages/datacommons-mcp/.env.sample

+# Directory containing custom instruction files (markdown)
+# Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md).
+# Files must match standard names (e.g., server.md, tools/get_observations.md).
+# DC_INSTRUCTIONS_DIR=/path/to/custom/instructions


This is perfect - thanks! Done.

keyurva · 2026-02-13T20:55:11Z

packages/datacommons-mcp/pyproject.toml


 [tool.setuptools]
-include-package-data = true
+include-package-data = false


Added comment. When I ran uv build locally, it was picking up certain gitignored file and making this false prevented that.

keyurva added 7 commits February 12, 2026 10:31

Add DC_INSTRUCTIONS_DIR to DCSettings

9b2c58e

Add server and tool instruction markdown files

7055be9

Implement DCApp and external instruction loading

39c3dcd

Add unit tests

fae96f7

Put tool implementations in tools.py

3ace2d8

Refactor server.py to use DCApp and registered tools

30996b9

Fix lint errors

96f7e17

keyurva had a problem deploying to evals-and-secrets February 13, 2026 18:10 — with GitHub Actions Error

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

Update year in copyright text for new files

a396ca2

keyurva had a problem deploying to evals-and-secrets February 13, 2026 18:19 — with GitHub Actions Error

keyurva requested a review from clincoln8 February 13, 2026 18:22

Update server instructions

3fa075f

keyurva had a problem deploying to evals-and-secrets February 13, 2026 18:27 — with GitHub Actions Error

Include instruction markdown files in built package

f28b05c

keyurva had a problem deploying to evals-and-secrets February 13, 2026 18:44 — with GitHub Actions Error

clincoln8 approved these changes Feb 13, 2026

View reviewed changes

Address review comments

7972e39

keyurva had a problem deploying to evals-and-secrets February 13, 2026 20:56 — with GitHub Actions Error

keyurva commented Feb 13, 2026

View reviewed changes

keyurva merged commit b9372bc into datacommonsorg:main Feb 13, 2026
9 of 10 checks passed

keyurva deleted the instructions branch February 13, 2026 21:01


		from typing import TYPE_CHECKING

		from datacommons_mcp.app import app

-# Directory containing custom instruction files (markdown)
-# Use this to override default system prompts (server.md) and tool instructions (e.g., tools/get_observations.md).
-# Files must match standard names (e.g., server.md, tools/get_observations.md).
-# DC_INSTRUCTIONS_DIR=/path/to/custom/instructions
+# Path to directory containing markdown file overrides for server instructions and/or tool descriptions.
+# Supports partial overrides: only create files for the specific
+# instructions or tools you want to replace. The system will fall back to package defaults for any file not found here.
+#
+# Expected structure inside this directory:
+# - server.md
+# - tools/{tool_name}.md
+DC_INSTRUCTIONS_DIR=/path/to/custom/instructions

Conversation

keyurva commented Feb 13, 2026

New Features

Testing Instructions

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

clincoln8 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keyurva left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels