Add MCP server with OAuth 2.1 for Claude.AI integration by manana2520 · Pull Request #22 · keboola/headless-knowledge-base

manana2520 · 2026-02-22T21:38:01Z

Summary

Add a remote MCP server that exposes the knowledge base to Claude.AI users via Streamable HTTP transport with OAuth 2.1 (Google) authentication
Extract core Q&A logic (search_knowledge, generate_answer) from Slack bot into reusable core/qa.py module
6 MCP tools: ask_question, search_knowledge, create_knowledge, ingest_document, submit_feedback, check_health
Scope-based access control: @keboola.com users get kb.read + kb.write, external users get kb.read only
Separate Docker image (Dockerfile.mcp) and Cloud Run services for staging (kb-mcp-staging) and production (kb-mcp)
CI/CD: build-mcp -> deploy-mcp-staging -> e2e tests -> deploy-mcp-production
64 unit tests covering tools, server protocol, and OAuth logic

User Interaction Scenarios

Scenario	MCP Tool	Example
Ask a question	`ask_question`	"What is Keboola's onboarding process?"
Search documents	`search_knowledge`	"Find docs about deployment pipelines"
Create knowledge	`create_knowledge`	"Remember that staging DB was rotated"
Ingest external doc	`ingest_document`	"Add this page: https://..."
Give feedback	`submit_feedback`	"Mark chunk X as outdated"
Check health	`check_health`	"Is the knowledge base healthy?"

Setup Required

After merging, create a Google OAuth Client ID in GCP Console:

APIs & Services > Credentials > Create Credentials > OAuth client ID
Name: Knowledge Base MCP Server, type: Web application
Store Client ID and Secret in Secret Manager (mcp-oauth-client-id, mcp-oauth-client-secret)
Run terraform apply to create Cloud Run services

Test plan

64 unit tests pass (tools, server, OAuth)
Create Google OAuth Client ID in GCP Console
Deploy to staging with terraform apply
Verify health check: curl https://kb-mcp-staging-<hash>.run.app/health
Test MCP protocol with dev mode locally
Add MCP server in Claude.AI settings and verify search works

Expose the knowledge base as a remote MCP server so Claude.AI users can search, ask questions, create knowledge, ingest documents, and submit feedback directly from Claude. Key changes: - New src/knowledge_base/mcp/ package with OAuth 2.1 (Google), Streamable HTTP transport, 6 MCP tools, and scope-based access control - Extract core Q&A logic from Slack bot into src/knowledge_base/core/qa.py for reuse across interfaces (Slack bot now delegates to core) - OAuth module adapted from odoo-mcp-server with kb.read/kb.write scopes (@keboola.com gets write, external users get read-only) - Dockerfile.mcp and Terraform for both staging and production Cloud Run - CI/CD pipeline: build-mcp, deploy-mcp-staging, deploy-mcp-production - 64 unit tests covering tools, server protocol, and OAuth logic

github-actions

LGTM! This is a well-engineered and comprehensive PR that successfully implements the MCP server integration with Claude.AI, including robust OAuth 2.1 authentication and a well-thought-out tool ecosystem. The core Q&A logic extraction is an excellent architectural improvement, promoting reusability and maintainability.

Key strengths observed:

Architectural Clarity: The refactoring of core Q&A logic into knowledge_base/core/qa.py is a significant win, correctly isolating reusable components from interface-specific code (Slack, now MCP).
Robust OAuth Implementation: The oauth/ module provides a detailed and secure implementation for OAuth 2.1, handling Google-specific nuances (ID tokens vs. opaque access tokens, JWKS fetching, client_id as audience, azp verification). The extract_user_context logic for deriving kb.read and kb.write scopes based on email domain and verification is a smart approach for internal authorization with Google OAuth.
Comprehensive MCP Tools: Six distinct tools are defined with clear descriptions and input schemas, covering the core functionalities of the knowledge base. The execute_tool dispatcher is well-structured.
Thorough Testing: The unit tests for MCP tools, server endpoints, and OAuth logic (especially extract_user_context and check_scope_access) demonstrate excellent coverage, including various scenarios and edge cases.
Sound Terraform Deployment: The cloudrun-mcp.tf file correctly sets up production and staging Cloud Run services, service accounts, and Secret Manager integration. Crucially, the lifecycle { ignore_changes = [secret_data] } for OAuth secrets is correctly applied, preventing accidental overwrites. IAM roles are appropriate, and VPC Access is correctly configured for Neo4j connectivity.
Security Considerations: MCP_DEV_MODE is explicitly false in Terraform, and the in-application OAuth validation is the primary security boundary, with allUsers invoker role on Cloud Run being a necessary pattern for this type of integration. CORS allow_origins=["*"] is acceptable given the robust token validation applied to all /mcp endpoints.

No significant issues or bugs were found. This work is high quality and aligns perfectly with the project's goals and technical standards.

github-actions

🔒 Security Review

Security Score: D | 5 CRITICAL, 3 HIGH, 1 INFO, 2 LOW, 6 MEDIUM

Security Review Summary: MCP Server OAuth 2.1 Integration

Critical Issues (5 findings)

The PR introduces 5 critical logging violations that expose sensitive information:

Token validation exception details logged in resource_server.py at lines 226 and 229
User email addresses (PII) logged in token_validator.py at lines 156 and 244
Exception details logged in server.py at line 137

All of these violate ADR-001 logging standards and expose user identity information or OAuth implementation details to log consumers.

High Severity Issues (4 findings)

Dev mode authentication bypass (server.py:143) - MCP_DEV_MODE=true disables all OAuth validation without production safeguards
Email domain-based access control (resource_server.py:211) - Write scope granted purely on @keboola.com email domain without additional verification
External Google API calls without caching (token_validator.py:168) - Synchronous tokeninfo validation has no caching or rate limiting
Missing production safety checks (server.py) - No explicit validation preventing dev mode in production

Medium Severity Issues (6 findings)

Unsanitized user email in metadata (tools.py:371) - Email used directly in Neo4j chunks without sanitization
Unvalidated search queries (tools.py:413) - Search input not validated before forwarding to backend
Plaintext credential patterns in environment variables (though correctly using Secret Manager in Terraform)
Overly permissive CORS (server.py:149) - Wildcard origin with credentials flag
Undocumented test user email env var (server.py:166)

Accepted Patterns (Correctly Implemented)

✓ OAuth configuration via pydantic-settings (no hardcoding)
✓ Secret Manager integration in Terraform for MCP_OAUTH_* secrets
✓ Scope-based tool access control structure (though implementation needs hardening)
✓ RFC 9728 Protected Resource Metadata endpoint

Recommendations Before Merge

Remove all email/exception details from logs - Use generic error messages or hashed identifiers
Add production safety checks - Raise error if dev_mode=true in production environment
Harden access control - Implement role-based access beyond email domain matching
Cache Google token validation - Implement Redis caching with TTL for tokeninfo results
Sanitize user input - Validate/sanitize email, search queries, and all user-supplied data
Restrict CORS - Remove wildcard origin or set specific Claude.AI domains

Score Justification: D

The presence of multiple HIGH severity findings (dev mode bypass, domain-based access control, external API calls without caching) combined with 5 CRITICAL logging issues that violate the established security patterns warrants a D grade. While no absolute authentication bypass exists, the combination of issues creates exploitable attack surface.

Findings

🔴 [CRITICAL] Exception details logged in logger.warning() may expose TokenValidationError details including token claims or validation failures

File: src/knowledge_base/mcp/oauth/resource_server.py:line 226
Category: A09_logging_monitoring
Impact: Sensitive token validation errors could be exposed in logs, potentially revealing information about token structure or validation logic to attackers with log access
Recommendation: Log only error type and generic message without exception details. Use logger.warning(f'Token validation failed: {type(e).name}') without the full exception object

🔴 [CRITICAL] Exception details logged in logger.error() during token validation without sanitization

File: src/knowledge_base/mcp/oauth/resource_server.py:line 229
Category: A09_logging_monitoring
Impact: Full exception stack traces and validation details could be exposed in error logs, potentially revealing sensitive information about OAuth implementation
Recommendation: Log only generic error message without exception details: logger.error('Unexpected error during token validation')

🔴 [CRITICAL] Email address logged in logger.warning() when email_verified is False

File: src/knowledge_base/mcp/oauth/token_validator.py:line 156
Category: A09_logging_monitoring
Impact: User email addresses are PII and should not be logged at warning level. This logs unverified emails which could be sensitive user data
Recommendation: Log only the fact that email verification failed without including the email address: logger.warning('Google token email not verified')

🔴 [CRITICAL] Email address logged in logger.info() during token validation

File: src/knowledge_base/mcp/oauth/token_validator.py:line 244
Category: A09_logging_monitoring
Impact: User email addresses are PII and logging them at info level violates privacy best practices. This could expose user identity information in application logs
Recommendation: Log only the fact that token was validated without including email: logger.info('Google access token validated')

🔴 [CRITICAL] Exception details including full exception object logged in logger.warning() during token validation

File: src/knowledge_base/mcp/server.py:line 137
Category: A09_logging_monitoring
Impact: Token validation errors could expose sensitive information about the OAuth implementation, token structure, or validation failures to anyone with log access
Recommendation: Log only the exception type and generic message: logger.warning(f'Token validation failed: {type(e).name}')

🟠 [HIGH] Dev mode disables OAuth token validation entirely when MCP_DEV_MODE=true

File: src/knowledge_base/mcp/server.py:line 143
Category: A07_auth_failures
Impact: If MCP_DEV_MODE is accidentally left enabled in production, all OAuth authentication is bypassed, allowing any user to access all tools with full scopes
Recommendation: Add explicit validation that MCP_DEV_MODE is False in production. Ensure this setting cannot be enabled via environment variable without explicit security review. Add startup logging warning if dev mode is enabled

🟠 [HIGH] Scope-based access control logic grants write access based on email domain (@keboola.com) without additional verification

File: src/knowledge_base/mcp/oauth/resource_server.py:line 211
Category: A01_access_control
Impact: Any user who can obtain a Google account with @keboola.com domain gets write access to knowledge base. If domain is compromised or external account gains domain email, they get write access without additional authorization
Recommendation: Implement additional verification for write scope beyond email domain: validate user against internal LDAP/directory, require explicit role assignment, or use additional claims from ID token

🟠 [HIGH] Google opaque access token validation uses external Google API call (tokeninfo endpoint) without caching or rate limiting

File: src/knowledge_base/mcp/oauth/token_validator.py:line 168
Category: A04_insecure_design
Impact: Each token validation makes a synchronous HTTP call to Google, which could be abused for DOS attacks or expose validation patterns to Google's infrastructure
Recommendation: Implement token result caching (Redis or in-memory with TTL), add rate limiting per user/client, or switch to validating only JWT ID tokens which can be verified offline

🟡 [MEDIUM] MCP_OAUTH_CLIENT_SECRET has empty string default while marked as optional

File: src/knowledge_base/mcp/config.py:line 18
Category: A05_misconfiguration
Impact: Code may reference this value assuming it's populated, leading to authentication failures or fallback to insecure behavior. Empty defaults can hide configuration issues
Recommendation: Either make this field required (no default) or add validation that checks if client_secret is actually needed and present before using it

🟡 [MEDIUM] User email from token claims used directly in chunk metadata and page title without sanitization

File: src/knowledge_base/mcp/tools.py:line 371
Category: A08_integrity_failures
Impact: Malicious email addresses could inject newlines, special characters, or SQL-like syntax into metadata that gets stored in Neo4j, potentially causing injection vulnerabilities downstream
Recommendation: Sanitize email address before using in metadata: validate format, remove special characters, or use a hashed user ID instead

🟡 [MEDIUM] User email logged in tool execution logging without rate limiting or audit trail

File: src/knowledge_base/mcp/tools.py:line 297
Category: A09_logging_monitoring
Impact: Repeated logging of user emails at info level creates audit trail that could reveal user patterns and access history
Recommendation: Log user action at debug level only, or use hashed user identifier. Implement proper audit logging separate from application logs

🟡 [MEDIUM] NEO4J_PASSWORD stored as plaintext environment variable in Cloud Run service definition

File: deploy/terraform/cloudrun-mcp.tf:line 153
Category: A05_misconfiguration
Impact: Cloud Run environment variables are visible in GCP Console and Terraform state. If Terraform state is compromised, Neo4j credentials are exposed
Recommendation: Already correctly using Secret Manager (google_secret_manager_secret_version) - good pattern. Ensure tfvars files and Terraform state are not checked into version control

🟡 [MEDIUM] search_knowledge query parameter used directly in search without input validation or sanitization

File: src/knowledge_base/mcp/tools.py:line 413
Category: A03_injection
Impact: Malicious search queries could potentially exploit search backend (Neo4j/Graphiti) if they accept injection vectors. Query operators or special syntax could be abused
Recommendation: Validate search query: limit length (e.g., max 500 chars), escape special characters, or use parameterized search API if available

🟡 [MEDIUM] Dev mode bypass is visible in production code without clear enforcement mechanism to prevent accidental enablement

File: src/knowledge_base/mcp/oauth/resource_server.py:line 93
Category: A04_insecure_design
Impact: Environment variable MCP_DEV_MODE=true disables all OAuth. If automation scripts or monitoring tools set this for testing, production could run in dev mode
Recommendation: Add explicit check: raise startup error if MCP_DEV_MODE=true and ENVIRONMENT=production. Add prominent startup logging warning

🟢 [LOW] CORS middleware allows origins=['*'] with allow_credentials=True

File: src/knowledge_base/mcp/server.py:line 149
Category: A05_misconfiguration
Impact: This combination is technically not harmful (browsers won't allow credentials with wildcard origin) but indicates overly permissive CORS configuration
Recommendation: Restrict CORS to specific origins (e.g., Claude.AI domains) or remove allow_credentials if not needed

🟢 [LOW] TEST_USER_EMAIL environment variable used in dev mode without documentation

File: src/knowledge_base/mcp/server.py:line 166
Category: A09_logging_monitoring
Impact: Undocumented env var could be accidentally set in production, causing all requests to be attributed to the test email
Recommendation: Document this env var clearly, validate it's only used when MCP_DEV_MODE=true, or use a hardcoded dev email

🔵 [INFO] Google ID token validation does not verify azp (authorized party) claim is present as part of required claims

File: src/knowledge_base/mcp/oauth/token_validator.py:line 153
Category: A07_auth_failures
Impact: azp claim is checked only if present in token, but best practice is to require it for Google tokens to prevent token substitution attacks
Recommendation: Add 'azp' to the required claims list if Google OAuth is used exclusively, or add conditional requirement based on token issuer

OWASP Top 10 Checklist

Category	Status
A01 Access Control	❌ FAIL
A02 Crypto Failures	✅ PASS
A03 Injection	❌ FAIL
A04 Insecure Design	❌ FAIL
A05 Misconfiguration	❌ FAIL
A06 Vulnerable Components	➖ N/A
A07 Auth Failures	❌ FAIL
A08 Integrity Failures	❌ FAIL
A09 Logging Monitoring	❌ FAIL
A10 Ssrf	➖ N/A

🤖 Security review powered by Claude

MCP spec requires the server to act as OAuth AS (or proxy to one). Claude.AI discovers /authorize, /token, /register endpoints and uses them for the OAuth flow. - GET /.well-known/oauth-authorization-server: RFC 8414 metadata - GET /authorize: redirects to Google OAuth, maps scopes - POST /token: proxies token exchange to Google - POST /register: dynamic client registration (RFC 7591) - MCP_OAUTH_CLIENT_SECRET now required (needed for token exchange) - 10 new unit tests for all OAuth AS endpoints

Cloud Run terminates TLS at the load balancer, so FastAPI sees HTTP. Use X-Forwarded-Proto header to construct correct external URLs.

github-actions

LGTM! This PR introduces a robust MCP (Managed Component Protocol) server with OAuth 2.1 integration for Claude.AI, which is a significant step forward for the AI Knowledge Base system. The changes are well-structured, follow architectural patterns, and address critical security considerations.

Key strengths observed:

Intent Fulfillment: The PR description accurately reflects the implemented features, including the new MCP server, OAuth 2.1 (Google) authentication, tool definitions, scope-based access control, and dedicated Cloud Run services for staging and production.
Architectural Alignment: The extraction of core Q&A logic into core/qa.py is an excellent refactoring, promoting reusability and maintainability. The use of async Python, Graphiti/Neo4j, and Google Cloud services (Cloud Run, Secret Manager) is consistent with the project's established patterns.
Security: The OAuth implementation is sound, correctly proxying to Google, handling scope mapping, and enforcing domain-based access control (@keboola.com for write access) within the extract_user_context logic. The use of Secret Manager for OAuth client credentials in Terraform is highly commendable, avoiding hardcoded secrets and ensuring proper IAM access for the Cloud Run service account. The allUsers invoker role on Cloud Run is appropriate as authentication is handled at the application level.
Terraform Quality: The cloudrun-mcp.tf file is well-written, with clear resource naming, appropriate scaling configurations for staging and production, correct image tagging, and thoughtful environment variable management. The ignore_changes = [secret_data] lifecycle block for secrets is crucial and correctly applied.
Test Coverage: The new unit tests (test_mcp_tools.py, test_mcp_server.py, test_mcp_oauth.py) demonstrate thorough coverage of tool functionality, server endpoints, and OAuth flows, including dev mode and scope filtering.

No significant bugs or architectural misalignments were found. The code is clean, well-commented where necessary, and demonstrates a high standard of engineering.

github-actions

🔒 Security Review

Security Score: D | 5 CRITICAL, 3 HIGH, 2 LOW, 5 MEDIUM

Security Review Summary: MCP Server OAuth Integration

Decision: REQUEST_CHANGES

Security Score: D (HIGH severity findings present)

Critical Issues (5 findings)

Credential and PII Logging: The OAuth token validator and resource server log user email addresses and exception details at info/warning levels, directly violating the security context requirement that user PII should not be logged at these levels. Exception messages in lines 226, 229 (resource_server.py) and 156, 244 (token_validator.py) may contain sensitive information.

Access Control Bypass: Dev mode (line 145, resource_server.py) completely bypasses OAuth validation if enabled, granting full access without authentication. The mode is controlled via environment variable with no production safeguard.

High Severity Issues (4 findings)

Email Verification Bypass: Token validation at line 130 (token_validator.py) logs a warning but accepts unverified emails, conflicting with security intent.

Terraform Configuration: Secret placeholders (cloudrun-mcp.tf) could be deployed if not manually updated, causing authentication failures.

CORS Misconfiguration: Wildcard CORS allows any origin to query metadata endpoints, aiding reconnaissance.

Dev Mode Logging: Test user configuration is logged at info level in development mode.

Medium Severity Issues (2 findings)

User Identity Handling: User email is used in URLs and logs without sanitization (tools.py, server.py).

OAuth Client ID Exposure: Client ID logged at startup, potentially aiding token forgery attacks.

Recommendations

Immediately remove all email/PII logging from warning and info levels in token validation code. Use debug level only or sanitized identifiers (hashes).
Add production safeguard for dev mode with runtime validation that prevents enabling dev mode outside development environments.
Enforce email verification in token validation by raising exceptions for unverified emails, not just logging warnings.
Restrict CORS origins to known Claude.AI domains instead of wildcard.
Validate secret initialization in Terraform or require pre-creation of secrets outside IaC.
Use sub claim instead of email for audit logging to reduce PII in logs.
Sanitize user-supplied data (email addresses) when used in URIs or returned to clients.

The MCP server implementation shows good foundational security (proper OAuth validation, secret management via Cloud Secret Manager), but logging practices need immediate remediation to comply with the organization's data protection requirements.

Findings

🔴 [CRITICAL] Email addresses (PII) logged at warning level in token validation error handler: `logger.warning(f"Token validation failed: {e}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 226
Category: A09 Logging & Monitoring
Impact: Exception messages may contain user email addresses and token validation errors, exposing PII in logs. This violates the security requirement that Slack user PII and email addresses should not be leaked in logs.
Recommendation: Log only the error type, not the full exception message which may contain PII. Use: logger.warning(f"Token validation failed: {type(e).__name__}") or sanitize the error message to exclude email/identifying information.

🔴 [CRITICAL] Full exception details logged at error level in token validation handler: `logger.error(f"Unexpected error during token validation: {e}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 229
Category: A09 Logging & Monitoring
Impact: Unvalidated exception details may contain sensitive information including token content, email addresses, or internal system details. This can expose credentials or PII in error logs.
Recommendation: Log only the exception type and sanitized message without full exception details. Use: logger.error(f"Unexpected error during token validation: {type(e).__name__}") to avoid exposing sensitive data in stack traces.

🔴 [CRITICAL] User email address (PII) logged at warning level when email not verified: `logger.warning(f"Google token email not verified: {claims.get('email')}`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 156
Category: A09 Logging & Monitoring
Impact: Directly logs user email addresses in application logs at warning level. This violates the security policy that email addresses should not be logged at info level or above.
Recommendation: Remove the email from the log message or hash it. Use: logger.debug(f"Google token email verification failed") if logging is needed, and move this to debug level only.

🔴 [CRITICAL] User email address (PII) logged at info level during token validation: `logger.info(f"Google access token validated for: {normalized.get('email')}`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 244
Category: A09 Logging & Monitoring
Impact: Logs user email addresses at INFO level in normal operation. This violates the security context requirement that email addresses should not be logged at info level or higher.
Recommendation: Remove email from the log or use debug level. Use: logger.debug(f"Google access token validated") instead, or if logging at info level is needed, use a hash: logger.info(f"Google access token validated for: {hashlib.sha256(email.encode()).hexdigest()[:8]}").

🔴 [CRITICAL] Full exception details logged at warning level during token validation: `logger.warning(f"Token validation failed: {type(e).name}: {e}")`

File: src/knowledge_base/mcp/server.py:line 149
Category: A09 Logging & Monitoring
Impact: Exception message {e} may contain sensitive token information, email addresses, or internal system details. This exposes PII and credentials in warning-level logs.
Recommendation: Log only the exception type without the full exception message. Use: logger.warning(f"Token validation failed: {type(e).__name__}") to avoid exposing details in logs.

🟠 [HIGH] Informational logging of user email for scope decisions: Multiple logs at info level log email addresses when granting scopes. Line 205-206: `logger.info(f"Google OAuth: granted default scopes for {email}")` and `logger.info(f"Google OAuth: granted write scope for internal user {email}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 219
Category: A09 Logging & Monitoring
Impact: While these are at info level (acceptable per security context for normal operations), they still log PII during normal operation which could accumulate in log storage and be searchable.
Recommendation: Consider moving to debug level only, or if info level is required for audit purposes, use a hashed identifier instead of plain email: hashlib.sha256(email.encode()).hexdigest()[:8].

🟠 [HIGH] Dev mode bypasses all OAuth token validation: When `dev_mode=True`, the middleware skips token validation entirely and grants all scopes without verification.

File: src/knowledge_base/mcp/oauth/resource_server.py:line 145
Category: A01 Broken Access Control
Impact: If dev mode is accidentally enabled in production (via environment variable), all access control is completely bypassed. Any unauthenticated request will be granted full read/write access to the knowledge base.
Recommendation: Add runtime validation to ensure dev mode is never enabled in production. In the middleware init or at application startup, check: if self.dev_mode and not os.getenv('ENVIRONMENT') == 'development': raise RuntimeError('Dev mode is only allowed in development')

🟠 [HIGH] MCP OAuth secrets use placeholder values 'REPLACE_ME' in Terraform with `lifecycle { ignore_changes = [secret_data] }`. This means if the secret is never actually populated, the placeholder could be deployed.

File: deploy/terraform/cloudrun-mcp.tf
Category: A04 Insecure Design
Impact: If operators forget to manually set the secret values in GCP Secret Manager, the MCP server will attempt to use 'REPLACE_ME' as the client secret, causing authentication failures and potentially exposing the secret value in error messages.
Recommendation: Use sensitive = true on the secret versions and add a precondition check that validates secrets are not placeholder values at deployment time. Alternatively, require secrets to be pre-created outside of Terraform and reference them only.

🟡 [MEDIUM] User email address from OAuth context is used directly in `url` metadata without validation: `url=f"mcp://user/{user_email}"`. User email is extracted from claims without additional validation.

File: src/knowledge_base/mcp/tools.py:line 316
Category: A08 Data Integrity Failures
Impact: If a user email contains special characters or is malformed, it could create invalid URIs or be used in injection attacks when the URL metadata is later processed. Additionally, the user identity is assumed to be from the email claim without verifying claim authenticity.
Recommendation: Validate and sanitize user email before using in URL construction. Use: from urllib.parse import quote; url=f"mcp://user/{quote(user_email)}"

🟡 [MEDIUM] MCP server logs OAuth configuration including the audience (client_id) at startup: `logger.info(f"OAuth audience: {_get_oauth_audience()}")`

File: src/knowledge_base/mcp/server.py:line 81
Category: A09 Logging & Monitoring
Impact: The Google OAuth client ID is logged at startup, which could be captured in server logs and used by attackers to forge tokens or perform OAuth flow attacks. The client ID is sensitive when combined with knowledge of the resource server.
Recommendation: Do not log the client ID directly. Use: logger.info("OAuth audience configured") without the actual value. Log only hashes or truncated values if auditing is required.

🟡 [MEDIUM] CORS middleware configured with `allow_origins=["*"]` (all origins allowed) without additional protection on sensitive endpoints.

File: src/knowledge_base/mcp/server.py:line 210
Category: A05 Security Misconfiguration
Impact: Any website can make cross-origin requests to the MCP server's endpoints. While the OAuth endpoints require authentication via Bearer token, the metadata endpoints are public and CORS allows them to be queried from any origin. This could aid reconnaissance attacks.
Recommendation: Restrict CORS origins to known Claude.AI domains or remove wildcard. Use: allow_origins=["https://claude.ai", "https://*.claude.ai"] or implement stricter origin validation for metadata endpoints.

🟡 [MEDIUM] Email verification check is logged as warning but does NOT prevent token validation from succeeding: `if not claims.get("email_verified", False): logger.warning(...); return claims`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 130
Category: A07 Authentication Failures
Impact: Tokens with email_verified: false are still accepted and validated. A user with an unverified Google email could potentially gain access to the knowledge base, contradicting the security intent.
Recommendation: Either raise an exception when email is not verified (raise TokenValidationError("Email not verified")) OR change the warning to debug level if this is intentional. The current behavior conflates logging with enforcement.

🟡 [MEDIUM] In dev mode, the test user email is logged: `logger.info(f"MCP dev mode: using email {dev_email}`)

File: src/knowledge_base/mcp/server.py:line 165
Category: A09 Logging & Monitoring
Impact: Dev mode email is logged at info level. While this is development mode, if logs are inadvertently preserved or exported, the dev email could be exposed.
Recommendation: Move to debug level: logger.debug(f"MCP dev mode enabled") without logging the email, or hash it.

🟢 [LOW] MCP configuration requires `MCP_OAUTH_CLIENT_ID` and `MCP_OAUTH_CLIENT_SECRET` with no defaults, but no pydantic validation error message is provided to guide deployment.

File: src/knowledge_base/mcp/config.py
Category: A05 Security Misconfiguration
Impact: If these environment variables are missing, pydantic will raise a generic validation error. While this fails safe (don't start without credentials), the error message could be clearer to help operators diagnose configuration issues.
Recommendation: Add pydantic field descriptions and validators with helpful error messages: MCP_OAUTH_CLIENT_ID: str = Field(..., description="Google OAuth Client ID - obtain from GCP Console")

🟢 [LOW] Tool execution logs the tool name and user email: `logger.info(f"Executing tool: {tool_name}, user: {user.get('email', 'unknown')}")`

File: src/knowledge_base/mcp/tools.py:line 165
Category: A09 Logging & Monitoring
Impact: User emails are logged at info level when tools are executed. This creates an audit trail of user actions, which is generally desirable for audit purposes, but accumulates PII in logs.
Recommendation: Use user sub claim (subject identifier) instead of email for audit logs: logger.info(f"Executing tool: {tool_name}, user: {user.get('sub', 'unknown')}")

OWASP Top 10 Checklist

Category	Status
A01 Access Control	❌ FAIL
A02 Crypto Failures	✅ PASS
A03 Injection	✅ PASS
A04 Insecure Design	❌ FAIL
A05 Misconfiguration	❌ FAIL
A06 Vulnerable Components	✅ PASS
A07 Auth Failures	❌ FAIL
A08 Integrity Failures	❌ FAIL
A09 Logging Monitoring	❌ FAIL
A10 Ssrf	✅ PASS

🤖 Security review powered by Claude

- Remove email addresses from info/warning logs (use sub or omit) - Strip exception details from error messages (log type only) - URL-encode user email in mcp:// URIs - Downgrade dev mode logging to debug level - Remove OAuth client_id from startup logs

github-actions

🔒 Security Review

Security Score: D | 2 CRITICAL, 4 HIGH, 1 INFO, 2 LOW, 8 MEDIUM

Security Review Summary

Critical Issues (1)

Token Validation Error Logging (A09): Multiple locations log exception type names during token validation (lines 226, 229 in resource_server.py and line 149 in server.py). This establishes a dangerous pattern for auth-related logging where exception details could leak sensitive information.

High Severity Issues (3)

Email Verification Logging (A09): Token validation state logged at warning level (line 156 token_validator.py), creating asymmetric auth logging patterns.
CORS Configuration (A04): Wildcard CORS (allow_origins=["*"]) on public MCP endpoint enables CSRF attacks against authenticated requests.
Scope-Based Access Control (A01): Unverified @keboola.com email domains grant write scope even without email_verified flag, allowing privilege escalation via token spoofing.

Medium Severity Issues (7)

Scope Mapping Transparency (A01): Custom scopes (kb.read/kb.write) silently dropped in OAuth flow without user awareness, bypassing intended access control model.
User Email Validation (A08): Token claims used to construct identifiers without validation of email format.
Empty Credential Validation (A04): Required OAuth credentials not validated to ensure they are non-empty after environment loading.
Token Exchange Logging (A09): OAuth operations log parameters that could inadvertently include sensitive data.
Redirect URI Logging (A09): User-supplied OAuth parameters logged at info level.
Terraform Secrets (A05): Placeholder "REPLACE_ME" values in secret definitions could be accidentally deployed.
Client Registration Logging (A09): User-supplied registration data echoed in logs.

Low Severity Issues (1)

Startup Configuration Logging (A05): OAuth configuration details logged, hinting at deployment setup.

Key Risks

Authentication Logging Pattern: The codebase establishes a pattern of logging exception details during token validation. While current code only logs type names, this precedent could lead to accidental credential exposure in future maintenance.
Email Verification Gap: Write scope is granted to @keboola.com emails without requiring email_verified=True, allowing token spoofing attacks to escalate to write access.
CORS Configuration: Wildcard CORS on a public endpoint with Bearer token authentication enables cross-site request forgery attacks.
Scope Model Integrity: The OAuth flow drops custom scopes without user awareness, undermining the intended access control model where scopes communicate permission boundaries.

Recommendations

Remove all exception type logging from authentication/validation paths (lines 226, 229, 149). Use generic messages only.
Require email_verified for all scope grants including write scope for @keboola.com users.
Restrict CORS to specific trusted origins (Claude.AI domains).
Validate all credentials against non-empty requirement in pydantic validators.
Remove user-supplied data from logs in OAuth endpoints (authorize, register, token exchange).
Move startup logs to DEBUG level to avoid leaking deployment configuration.
Document scope handling to clarify that custom scopes are dropped by Google OAuth and must be re-granted after token exchange.

Findings

🔴 [CRITICAL] Token validation error logged with exception type name: `logger.warning(f"Token validation failed: {type(e).name}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 226
Category: A09_logging_monitoring
Impact: While the exception type is logged (not the token itself), this pattern appears in multiple locations and establishes a precedent. In error handling, exception details can inadvertently leak sensitive information from exception messages or stack traces.
Recommendation: Use generic error messages in logs: logger.warning("Token validation failed") without including exception details. Reserve detailed error info for non-user-facing logging only.

🔴 [CRITICAL] Unexpected error during token validation logged with exception type: `logger.error(f"Unexpected error during token validation: {type(e).name}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 229
Category: A09_logging_monitoring
Impact: Same as above - establishes pattern of logging exception details during authentication/validation operations where sensitive context may be present.
Recommendation: Use generic error message: logger.error("Unexpected error during token validation"). Do not include exception type names or details in auth-related logs.

🟠 [HIGH] Google token email verification status logged at warning level: `logger.warning("Google token email not verified")`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 156
Category: A09_logging_monitoring
Impact: Logs auth validation state during token processing. Combined with other logs, could leak user authentication patterns or token state information.
Recommendation: Remove this log statement or reduce to DEBUG level only: logger.debug("Google token email not verified")

🟠 [HIGH] Google access token validation success logged at debug level: `logger.debug("Google access token validated")`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 244
Category: A09_logging_monitoring
Impact: While DEBUG level is appropriate, this combined with warning on line 156 creates asymmetric logging of validation success/failure, potentially leaking auth patterns.
Recommendation: Either log both at DEBUG level only, or remove both statements. Avoid asymmetric logging of authentication outcomes.

🟠 [HIGH] Dev mode token validation skip logged at debug level: `logger.debug("MCP dev mode: skipping token validation")`

File: src/knowledge_base/mcp/server.py:line 133
Category: A09_logging_monitoring
Impact: Indicates when authentication is bypassed, which could leak operational state about dev/prod deployment. Production logs should never indicate dev mode is active.
Recommendation: Remove this log statement entirely or gate it behind if mcp_settings.MCP_DEBUG: check to prevent it from appearing in production logs.

🟠 [HIGH] Token validation failure logged with exception type in production middleware: `logger.warning(f"Token validation failed: {type(e).name}")`

File: src/knowledge_base/mcp/server.py:line 149
Category: A09_logging_monitoring
Impact: This is in the main request middleware and logs auth failures with exception details. Combined with warning on line 226, creates multiple auth failure logging points with exception context.
Recommendation: Use generic message: logger.warning("Token validation failed") without exception type. Do not log exception details for auth failures.

🟡 [MEDIUM] CORS middleware allows all origins: `allow_origins=["*"]`

File: src/knowledge_base/mcp/server.py:line 74
Category: A04_insecure_design
Impact: MCP server at public Cloud Run endpoint with wildcard CORS allows any origin to make authenticated requests. Combined with Bearer token in Authorization header, this enables CSRF attacks from attacker-controlled websites.
Recommendation: Restrict CORS origins to specific trusted domains. For Claude.AI integration, use: allow_origins=["https://claude.ai", ...] or similar whitelist. Remove allow_credentials=True if not needed.

🟡 [MEDIUM] OAuth /authorize endpoint uses `_map_to_google_scopes()` which strips all non-standard scopes, but the MCP spec may expect scope validation. Custom scopes like `kb.read`/`kb.write` are silently dropped without user/client awareness.

File: src/knowledge_base/mcp/server.py:line 320
Category: A01_access_control
Impact: User grants kb.read in MCP, but Google OAuth flow drops this scope. User receives token without expected scopes, leading to access control bypass of the scope model.
Recommendation: Scope mapping should be transparent to the user. Consider including custom scopes in the token/userinfo response after Google auth completes, rather than silently dropping them at authorization time.

🟡 [MEDIUM] User input `user_email` from token claims used directly in constructing MCP URLs without validation: `url=f"mcp://user/{quote(user_email, safe='')}"`

File: src/knowledge_base/mcp/tools.py:line 315
Category: A08_integrity_failures
Impact: Although quote() is applied for URL encoding, the email is extracted from user claims and used to construct identifiers without further validation. Could allow crafted tokens to create unexpected identifiers.
Recommendation: Validate that user_email matches expected format (RFC 5322 email) before using in identifiers. Use a whitelist of allowed domain suffixes if applicable.

🟡 [MEDIUM] Required credentials `MCP_OAUTH_CLIENT_ID` and `MCP_OAUTH_CLIENT_SECRET` are marked as required without default, but no validation ensures they are not empty strings after loading from environment.

File: src/knowledge_base/mcp/config.py:line 16
Category: A04_insecure_design
Impact: If environment variable is set to empty string MCP_OAUTH_CLIENT_ID="", pydantic will accept it (empty string is valid string type). Application will fail at runtime with confusing errors.
Recommendation: Add pydantic validator: @field_validator('MCP_OAUTH_CLIENT_ID', 'MCP_OAUTH_CLIENT_SECRET') def validate_not_empty(cls, v): assert v, 'required' return v

🟡 [MEDIUM] OAuth token exchange logs request parameters including grant_type: `logger.info(f"OAuth token exchange: grant_type={token_params.get('grant_type', 'N/A')}")`

File: src/knowledge_base/mcp/server.py:line 185
Category: A09_logging_monitoring
Impact: Logs token exchange details. While grant_type is not sensitive, this establishes pattern of logging OAuth request data which could inadvertently include sensitive params in future changes.
Recommendation: Remove this log statement or only log: logger.debug("OAuth token exchange") without parameter details.

🟡 [MEDIUM] OAuth authorize endpoint logs client redirect_uri and scope information: `logger.info(f"OAuth authorize: redirecting to Google (scope={google_scopes}, redirect_uri={params.get('redirect_uri', 'N/A')})")`

File: src/knowledge_base/mcp/server.py:line 159
Category: A09_logging_monitoring
Impact: Logs user-supplied redirect_uri and requested scopes. Could leak information about OAuth flow patterns or integrations.
Recommendation: Log only: logger.debug("OAuth authorize") without user-supplied parameters from the request.

🟡 [MEDIUM] Secret Manager secrets use lifecycle `ignore_changes = [secret_data]` with placeholder value `"REPLACE_ME"`

File: deploy/terraform/cloudrun-mcp.tf:line 81
Category: A05_misconfiguration
Impact: Placeholder values in Terraform code could be accidentally committed and deployed. The ignore_changes pattern masks the actual secret values but leaves placeholders as hints.
Recommendation: Use Terraform sensitive = true on secret data and document that secrets must be set via terraform apply -var or Secret Manager UI after initial apply. Do not include placeholder values.

🟡 [MEDIUM] Internal users (@keboola.com) automatically granted `kb.write` scope based on email domain check without verification of email_verified status

File: src/knowledge_base/mcp/oauth/resource_server.py:line 31
Category: A01_access_control
Impact: If a token has an unverified email claim with @keboola.com domain, the code still grants write access (see line 51-52). A compromised or spoofed token could escalate to write access.
Recommendation: Require email_verified=True for ALL scope grants, including write: if email_verified and email.endswith("@keboola.com"): scopes.append("kb.write")

🟢 [LOW] Successful OAuth client registration logs client details: `logger.info(f"OAuth client registration: name={client_name}, redirect_uris={redirect_uris}")`

File: src/knowledge_base/mcp/server.py:line 101
Category: A09_logging_monitoring
Impact: Logs user-supplied client registration information. Low severity but establishes pattern of logging user-supplied OAuth data.
Recommendation: Log only count or metadata: logger.debug("OAuth client registration") without echoing user-supplied values.

🟢 [LOW] Application startup logs OAuth configuration including issuer and existence of client_id: `logger.info(f"OAuth issuer: {mcp_settings.MCP_OAUTH_ISSUER}")` and `logger.info("OAuth audience configured")`

File: src/knowledge_base/mcp/server.py:line 61
Category: A05_misconfiguration
Impact: Low severity - configuration info is not typically sensitive. However, logging that OAuth is configured could hint at authentication bypass in dev mode.
Recommendation: Move these logs to DEBUG level to avoid leaking deployment configuration details in production logs.

🔵 [INFO] Comments document that Google tokens use `client_id` as audience, which is correct for ID tokens but opaque access tokens may not follow this pattern. The code assumes all Google tokens use client_id as audience.

File: src/knowledge_base/mcp/oauth/token_validator.py:line 118
Category: A07_auth_failures
Impact: Informational - the implementation correctly handles both JWT ID tokens and opaque access tokens, but documentation could be clearer.
Recommendation: Update comment to clarify: 'For Google ID tokens, audience is the client_id. For opaque access tokens, audience is verified via tokeninfo endpoint.'

OWASP Top 10 Checklist

Category	Status
A01 Access Control	❌ FAIL
A02 Crypto Failures	✅ PASS
A03 Injection	✅ PASS
A04 Insecure Design	❌ FAIL
A05 Misconfiguration	❌ FAIL
A06 Vulnerable Components	➖ N/A
A07 Auth Failures	❌ FAIL
A08 Integrity Failures	❌ FAIL
A09 Logging Monitoring	❌ FAIL
A10 Ssrf	✅ PASS

🤖 Security review powered by Claude

github-actions

LGTM! This is a very solid PR that successfully adds the MCP server, Claude.AI integration, and centralizes key Q&A logic. The OAuth 2.1 implementation and Google Cloud deployment via Terraform are well-executed, with good attention to security aspects like secret management, IAM, and scope-based access control. The unit tests are comprehensive and cover crucial parts of the new functionality.

I have two main points that I'd like to see addressed, primarily around code reuse and architectural consistency:

OAuth Middleware Duplication: In src/knowledge_base/mcp/server.py, the oauth_middleware function largely duplicates the logic found in src/knowledge_base/mcp/oauth/resource_server.py::OAuthMiddleware. The OAuthMiddleware class was likely created for reuse, but it's not being leveraged by server.py.
- Suggestion: Please refactor server.py to use app.add_middleware(OAuthMiddleware, resource_server=resource_server, exclude_paths=skip_paths, dev_mode=mcp_settings.MCP_DEV_MODE) instead of the custom oauth_middleware function. This would centralize the middleware logic and improve maintainability. You'd need to adapt OAuthMiddleware's exclude_paths (or skip_paths in server.py) to include all necessary OAuth proxy endpoints (/authorize, /token, /register).
- Similarly, the handle_tools_call function in server.py implements scope checking directly, while require_scopes exists in resource_server.py. While not strictly middleware, consider if require_scopes (or a similar dependency injection pattern) could be used here for consistency.
ingest_document tool's dependency: The _execute_ingest_document function in src/knowledge_base/mcp/tools.py imports get_ingester from knowledge_base.slack.ingest_doc. While ingest_url might be generic, its location in slack/ suggests it might be coupled to Slack-specific contexts or might not be the ideal place for a core ingestion utility to reside when being exposed via a general API.
- Suggestion: Consider moving the generic document ingestion logic (if truly generic) from knowledge_base/slack/ingest_doc.py to a more central location, e.g., knowledge_base/core/ingestion.py, to better reflect its reusability and decouple it from the Slack bot module. This would align with the general goal of extracting core logic.

These changes would enhance the maintainability and architectural clarity of the new components.

- Remove all exception details from auth/validation logs - Restrict CORS to Claude.AI origins only - Require email_verified for write scope on @keboola.com - Add pydantic validator for non-empty OAuth credentials - Move OAuth config and operation logs to debug level - Remove user-supplied data from OAuth endpoint logs

github-actions

LGTM! This PR introduces the Claude.AI MCP server with OAuth 2.1 integration very effectively. The comprehensive approach to security, including Google OAuth proxying, robust token validation, and granular scope-based access control, is well-implemented and crucial for this feature.

The refactoring of core Q&A logic into knowledge_base/core/qa.py is an excellent architectural improvement, promoting reusability across different interfaces (Slack, MCP). The six new MCP tools cover all described user interaction scenarios, and their execution logic appears sound, with proper error handling.

From a Google Cloud perspective, the Terraform configuration for Cloud Run services (production and staging) is solid. It correctly utilizes Secret Manager for OAuth client credentials, configures VPC Access for secure Neo4j connectivity, sets appropriate scaling parameters, and establishes necessary IAM roles. The REPLACE_ME placeholders for secrets, along with ignore_changes lifecycle blocks, demonstrate good practice for managing sensitive data.

Minor Considerations (not blocking for approval):

Redundant resource_server initialization: In src/knowledge_base/mcp/server.py, the resource_server is initialized at the module level and then re-initialized within the lifespan context manager. While likely harmless in Cloud Run's execution model (where the module is typically loaded once per instance), it's a slight redundancy. The current approach is functional but could be streamlined.
OAuth Middleware Duplication: The oauth_middleware is implemented directly in server.py as an @app.middleware decorator, rather than leveraging the OAuthMiddleware class provided in src/knowledge_base/mcp/oauth/resource_server.py. This leads to duplicated logic for skip_paths and potentially for response formatting. While functional, centralizing the middleware implementation in the reusable class would improve consistency and maintainability.
ingest_document tool dependency: The _execute_ingest_document tool directly imports from knowledge_base.slack.ingest_doc. While ingest_url itself might be generic, placing its source in a slack module creates a slightly misleading dependency chain for a generic MCP server. For future improvements, moving core ingestion utilities out of the slack folder (similar to core/qa.py) might be beneficial.

These points are minor and do not detract from the overall quality and completeness of the PR. The critical aspects of security, functionality, and deployment are handled very well.

github-actions

🔒 Security Review

Security Score: D | 6 CRITICAL, 5 HIGH, 1 INFO, 4 LOW, 5 MEDIUM

Security Review Summary

Decision: REQUEST_CHANGES

Security Score: D (HIGH severity findings present)

Critical Issues (6)

Sensitive Logging in OAuth Middleware - Token validation failures are logged with exception details that could expose token claims, credentials, or authorization server responses. This affects both resource_server.py lines 227/230 and server.py line 148.
User PII Logging - Debug logs at lines 156 and 244 of token_validator.py expose email_verified status and token validation results, creating user enumeration and privacy leaks.
Dev Mode Bypass - MCP_DEV_MODE=true completely bypasses OAuth validation (line 158 of server.py). If accidentally enabled in production, all requests with any Bearer token gain full admin scopes. This is the highest-risk finding.

High Severity Issues (5)

Error Response Leakage - OAuth error responses return detailed exception messages that could leak token validation logic (line 260, resource_server.py).
No Caching for Token Validation - Opaque Google tokens are validated via synchronous HTTP on every request, creating a DoS vector and rate-limit risk.
Email Domain-Based Access Control - Write access is granted to @keboola.com email addresses, but this depends on email_verified check which could be bypassed. No domain membership validation.
User Attribution Without Verification - User email from OAuth claims is used directly to create knowledge artifacts without validating it matches token subject, allowing potential user impersonation.
Cloud Run Auth Not Enforced - Cloud Run allows unauthenticated invocation (allUsers). If app-level OAuth is bypassed, no second auth layer exists.

Medium Severity Issues (5)

Incomplete Audit Logging - Tool execution logs user.sub but not what arguments were passed (line 112, tools.py).
Overly Strict JWT Validation - Email claim marked as required in JWT decode, but email is not a standard JWT claim (line 166, token_validator.py).
CORS Misconfiguration - CORS alone doesn't protect APIs; non-browser clients bypass it. Application-level OAuth (which exists) is the real protection.
OAuth Callback CSRF Risk - Callback endpoint doesn't validate state parameter (line 219, server.py).
Sensitive Config Logging - Debug logs expose OAuth issuer/audience configuration details (line 90, server.py).

Recommendations

Immediate Actions (Before Merge):

Remove all token validation logging statements or replace with generic messages
Remove debug logs that expose email_verified, user emails, or token details
Change dev mode to explicitly opt-in via multiple environment variables + CI/CD validation
Return generic error messages instead of exception details in OAuth error responses
Implement token validation result caching with TTL

Before Production Deployment:

Enable Cloud Run authentication enforcement
Add state parameter validation to OAuth callback
Implement audit logging with anonymized tool arguments
Add Google Workspace domain membership validation for @keboola.com users
Review and remove all configuration details from production logs

Code Quality:

All 6 test suites pass, including comprehensive OAuth scope and tool filtering tests
Code structure is well-organized with clear separation between token validation, OAuth middleware, and tool execution
Terraform configuration includes appropriate VPC isolation and secret management

The PR implements a solid MCP server architecture with OAuth 2.1 integration, but security review revealed critical logging vulnerabilities and configuration risks that must be addressed before production deployment.

Findings

🔴 [CRITICAL] Logging token validation failures without redaction may expose token details in error context or exception chain

File: src/knowledge_base/mcp/oauth/resource_server.py:line 227
Category: A09 - Logging & Monitoring
Impact: Sensitive token information could be exposed in application logs, potentially allowing token replay or analysis
Recommendation: Log only generic failure messages without including the actual exception details that might contain token claims. Use: logger.warning('Token validation failed') without str(e) from exception

🔴 [CRITICAL] Logging unexpected errors during token validation could expose sensitive token/claims data in exception traceback

File: src/knowledge_base/mcp/oauth/resource_server.py:line 230
Category: A09 - Logging & Monitoring
Impact: Exception details might include token parts, user claims (email, scopes), or authorization server responses containing sensitive information
Recommendation: Catch specific exceptions only and log generic messages. Avoid logging the full exception. Use: logger.error('Token validation error') without exc_info=True in production

🔴 [CRITICAL] Debug logging of token validation details (email_verified status) combined with other logs could leak user identity information

File: src/knowledge_base/mcp/oauth/token_validator.py:line 156
Category: A09 - Logging & Monitoring
Impact: Debug logs in production could expose which users have unverified emails, creating a user enumeration vector and privacy leak
Recommendation: Remove debug logging for token validation results. Do not log email_verified status or user-identifying claim information

🔴 [CRITICAL] Debug logging of successful token validation could expose claims (email, scope) in logs

File: src/knowledge_base/mcp/oauth/token_validator.py:line 244
Category: A09 - Logging & Monitoring
Impact: Production logs containing user emails and scopes from validated tokens create a privacy/security audit trail that should not be logged at info level
Recommendation: Remove 'Google access token validated' log line. Log only operational status (e.g., cache hit/miss) without user-identifying claims

🔴 [CRITICAL] Token validation failures logged without redaction in middleware; could leak token details or claims in error handling path

File: src/knowledge_base/mcp/server.py:line 148
Category: A09 - Logging & Monitoring
Impact: Malicious actors could trigger token validation failures and inspect logs to understand token structure or claim validation logic
Recommendation: Log only 'Token validation failed' without str(e). Do not include exception message in response body either (currently returns str(e))

🔴 [CRITICAL] Debug logging of 'OAuth token exchange' appears to be a placeholder; could be exploited to log sensitive authorization code or token details if not carefully implemented

File: src/knowledge_base/mcp/server.py:line 284
Category: A09 - Logging & Monitoring
Impact: If this log statement is expanded to include request/response data, authorization codes or access tokens could be exposed in logs
Recommendation: Ensure this log statement never includes request body (which contains client_secret, authorization code, or token data). Log only the operation name

🟠 [HIGH] Dev mode allows bypassing OAuth token validation entirely when MCP_DEV_MODE=true, including in production if accidentally enabled

File: src/knowledge_base/mcp/server.py:line 158
Category: A07 - Authentication Failures
Impact: If dev mode is accidentally enabled in production, all OAuth protection is bypassed and any request with a Bearer header gains admin scopes (kb.read + kb.write)
Recommendation: Add explicit environment-based checks: (1) Log a CRITICAL warning if MCP_DEV_MODE=true in non-dev environments, (2) Require EXPLICIT confirmation via multiple env vars, (3) Block dev mode deployment via CI/CD validation

🟠 [HIGH] Error response returns str(e) which may contain sensitive error details from token validation exceptions

File: src/knowledge_base/mcp/oauth/resource_server.py:line 260
Category: A07 - Authentication Failures
Impact: Detailed error messages could leak information about token structure, issuer validation, or claim validation logic to attackers
Recommendation: Return generic error message 'Invalid token' instead of str(e). Log the actual error internally for debugging

🟠 [HIGH] Opaque Google access tokens are validated via synchronous HTTP call to tokeninfo endpoint on every request; no caching implemented

File: src/knowledge_base/mcp/oauth/token_validator.py:line 135
Category: A04 - Insecure Design
Impact: Performance DoS vector: attacker can send many invalid tokens causing many HTTP calls to Google. Google may rate-limit or block requests, degrading service
Recommendation: Implement caching of tokeninfo responses with TTL (e.g., 5-10 minutes). Cache both valid and invalid token results to prevent repeated lookups

🟠 [HIGH] Scope-based access control is based on email domain (@keboola.com) which can be spoofed if Google OAuth token validation is bypassed or if email_verified is not properly enforced

File: src/knowledge_base/mcp/config.py
Category: A01 - Broken Access Control
Impact: External users could craft tokens with @keboola.com email claim and gain write access if email_verified check is skipped or bypassed
Recommendation: Ensure email_verified=true is ALWAYS required before granting kb.write scope. Consider using Google Workspace domain membership check instead of email domain

🟠 [HIGH] User email from OAuth claims is used directly in MCP tool execution without validation; could be used to create knowledge attributed to different users

File: src/knowledge_base/mcp/tools.py:line 276
Category: A08 - Data Integrity Failures
Impact: If token validation is bypassed or compromised, attacker can create knowledge/feedback attributed to arbitrary users, poisoning audit trail and knowledge base
Recommendation: Validate that user.email matches the token's 'sub' claim. Use token.sub as the authoritative user identifier, not email

🟡 [MEDIUM] Tool execution logs user.sub which could be a GUID or opaque identifier (safe), but logs 'Executing tool:' with tool_name; no audit trail of arguments

File: src/knowledge_base/mcp/tools.py:line 112
Category: A09 - Logging & Monitoring
Impact: Incomplete audit trail: no record of what arguments were passed to tools, making it difficult to audit or debug user actions
Recommendation: Add structured logging with tool name and anonymized argument summary (e.g., query length, not content). Avoid logging user email

🟡 [MEDIUM] JWT decode options set 'require': ['exp', 'iss', 'aud', 'email'] but 'email' is not standard JWT claim; should be in optional claims only

File: src/knowledge_base/mcp/oauth/token_validator.py:line 166
Category: A07 - Authentication Failures
Impact: Tokens without email claim will fail validation even if otherwise valid. Google ID tokens should have email, but this is overly strict for general OAuth
Recommendation: Remove 'email' from required claims. Validate email presence separately: if not present, deny access (separately from JWT validation)

🟡 [MEDIUM] Cloud Run service allows unauthenticated access (roles/run.invoker to allUsers) but OAuth is enforced at application level; no Cloud Run native auth

File: deploy/terraform/cloudrun-mcp.tf:line 151
Category: A05 - Security Misconfiguration
Impact: If application-level OAuth validation is bypassed (dev mode, code bug), unauthenticated users can access all endpoints since Cloud Run doesn't enforce auth
Recommendation: Use Cloud Run service identity with Require Authentication enabled. Remove 'allUsers' invoker role. Let Cloud Run enforce auth, application validates scopes

🟡 [MEDIUM] CORS allows ['https://claude.ai', 'https://www.claude.ai'] but no validation that requests actually come from these origins; browser-only protection

File: src/knowledge_base/mcp/server.py:line 72
Category: A05 - Security Misconfiguration
Impact: CORS is browser-only protection. Non-browser clients (curl, scripts) can bypass and send cross-origin requests. Relying on CORS alone is insufficient
Recommendation: Keep CORS for browser clients but do not rely on it for security. OAuth token validation is the real protection (which exists)

🟡 [MEDIUM] OAuth callback endpoint returns authorization code in HTML without CSRF token validation; could be vulnerable to CSRF if callback URL is predictable

File: src/knowledge_base/mcp/server.py:line 219
Category: A04 - Insecure Design
Impact: Attacker could trick user into visiting callback URL with attacker-controlled code parameter, though actual token exchange would fail due to PKCE
Recommendation: Validate state parameter in callback and ensure it matches the request. Use random state in authorization request (not shown in code)

🟢 [LOW] Debug logging of 'OAuth audience configured' and 'OAuth issuer:' exposes configuration details

File: src/knowledge_base/mcp/server.py:line 90
Category: A09 - Logging & Monitoring
Impact: Configuration details logged at DEBUG level could leak system architecture. Low severity since DEBUG is typically disabled in production
Recommendation: Log 'OAuth configured' without details, or move to TRACE level. Remove specific issuer/audience from logs

🟢 [LOW] Google client_secret is added to token request data structure; if logging is accidentally enabled, could expose secret

File: src/knowledge_base/mcp/server.py:line 278
Category: A05 - Security Misconfiguration
Impact: If request logging is added in future, client_secret will be in logs. Low risk currently but fragile
Recommendation: Add comment: 'NEVER log token_params - contains client_secret'. Consider using httpx.Auth instead of manual data construction

🟢 [LOW] PyJWT library is imported on-demand in validate() method; no version pinning shown in requirements (assumed in pyproject.toml)

File: src/knowledge_base/mcp/oauth/token_validator.py:line 1
Category: A06 - Vulnerable Components
Impact: If PyJWT has vulnerabilities, could be exploited. Dependency version control is assumed but not visible
Recommendation: Ensure pyproject.toml pins PyJWT to secure version with upper bound (e.g., '>=2.8.0,<3.0.0'). Run security scanning in CI

🟢 [LOW] Google tokeninfo endpoint is called with user-controlled token; no timeout is enforced on HTTP request

File: src/knowledge_base/mcp/oauth/token_validator.py:line 137
Category: A10 - SSRF
Impact: Slow-loris or hanging response attack could tie up async event loop. Low severity due to AsyncClient default timeouts, but explicit timeout (10.0) is set (good)
Recommendation: Timeout is already set to 10 seconds which is reasonable. Consider reducing to 5 seconds for faster failure

🔵 [INFO] Logging 'Dev mode: true' could leak deployment configuration if logs are visible to attackers

File: src/knowledge_base/mcp/server.py:line 83
Category: A09 - Logging & Monitoring
Impact: Attacker could infer that service is in development/test mode and attempt to exploit dev features (e.g., bypass auth)
Recommendation: Remove or change to TRACE level. Do not log 'Dev mode: true' in production logs

OWASP Top 10 Checklist

Category	Status
A01 Access Control	❌ FAIL
A02 Crypto Failures	✅ PASS
A03 Injection	✅ PASS
A04 Insecure Design	❌ FAIL
A05 Misconfiguration	❌ FAIL
A06 Vulnerable Components	✅ PASS
A07 Auth Failures	❌ FAIL
A08 Integrity Failures	❌ FAIL
A09 Logging Monitoring	❌ FAIL
A10 Ssrf	✅ PASS

🤖 Security review powered by Claude

Remove all log statements from OAuth validation, token exchange, and authorization endpoints. Return generic error messages only in HTTP responses. This eliminates any possibility of PII or credential leakage through logs or error responses.

github-actions

LGTM! This Pull Request introduces a well-designed and robust MCP server for Claude.AI integration, effectively expanding the reach of the AI Knowledge Base.

Key strengths observed:

Comprehensive OAuth 2.1 Implementation: The server correctly acts as an OAuth Authorization Server proxying to Google, handling authorization code flow, token exchange, and dynamic client registration. The token_validator.py and resource_server.py modules demonstrate a solid and secure approach to JWT and opaque token validation, including audience, issuer, and expiration checks.
Robust Access Control: Scope-based access (kb.read, kb.write) is effectively implemented, dynamically assigning permissions based on Google-verified email domain (@keboola.com for write access). This is correctly applied to tools/list and tools/call endpoints.
Excellent Architectural Refactoring: The extraction of core Q&A logic into knowledge_base/core/qa.py is a significant improvement. It enhances reusability, as demonstrated by the streamlined slack/bot.py changes, and centralizes complex RAG logic, leading to better maintainability.
Well-Defined MCP Tools: The 6 new MCP tools (ask_question, search_knowledge, create_knowledge, ingest_document, submit_feedback, check_health) are thoughtfully designed with appropriate JSON schemas and robust execution logic, integrating seamlessly with existing backend services like Graphiti, LLM providers, and the document ingestion pipeline. Defensive coding for handling missing metadata in search results is a good touch.
Secure and Best-Practice GCP Deployment: The Terraform configuration for Cloud Run services is exemplary. It correctly manages service accounts, IAM permissions (including aiplatform.user for Vertex AI and secretmanager.secretAccessor), and VPC Access connectors. The use of allUsers for Cloud Run invoker, coupled with the application-level OAuth middleware, is the correct secure pattern for a publicly exposed, OAuth-protected API. The REPLACE_ME placeholders with ignore_changes lifecycle rules for secrets are appropriate for manual secret initialization.
Comprehensive Testing: The unit tests for mcp_tools and mcp_server provide good coverage for tool logic, server endpoints, and OAuth middleware behavior, including specific scenarios for scope filtering and dev mode.

No significant bugs or architectural misalignments were found. The PR delivers on all stated intents with high quality and attention to detail. This is a very strong contribution.

github-actions

🔒 Security Review

Security Score: D | 2 CRITICAL, 7 HIGH, 2 INFO, 4 LOW, 8 MEDIUM

Security Review Summary

Decision: REQUEST_CHANGES

The MCP server OAuth 2.1 implementation introduces significant security concerns that must be addressed before merging. While the overall architecture follows sound OAuth 2.1 principles, several critical and high-severity issues were identified.

Critical Issues (2)

Token Parameter Injection (A02): The Google tokeninfo validation endpoint receives the user token as an unencoded query parameter, creating an injection vulnerability. Use httpx's params dict or POST with form data.
Exception Message Exposure (A09): Token validation error messages may include sensitive claims data that could leak through logs or error responses.

High-Severity Issues (5)

Dev Mode Authentication Bypass: If MCP_DEV_MODE is ever enabled in production (via environment variable), all OAuth validation is skipped. Use a separate staging environment instead.
Access Control Logic Flaw (A01): The scope granting logic for Google OAuth has a subtle flaw in email verification handling that could theoretically grant unintended access.
HTTPS Verification Missing: Google token validation does not explicitly verify HTTPS certificates.
PII in Logs (A09): User email addresses are logged in tool execution and search operations, violating privacy requirements.
SSRF in Ingest Document (A10): The ingest_document tool accepts arbitrary URLs without validation, allowing file:// or internal IP access.

Medium-Severity Issues (8)

Token endpoint CSRF protection missing
Generic error handling that may leak internal state
Stale JWKS cache handling in sync validation path
Database port configuration not validated
Missing URL validation for document ingestion
Configuration validation deferred to runtime
Missing logging for startup events
Search queries logged (may contain sensitive content)

Remediation Priority

Fix critical OAuth token validation: Use proper parameter encoding
Remove dev mode from production: Use separate staging environment
Remove PII from logs: Sanitize user context before logging
Add URL validation: Whitelist schemes and reject internal IP ranges
Add exception message sanitization: Generic messages to clients, details in debug logs
Add CSRF protection: Validate token endpoint requests

Positive Findings

Scope-based access control is well-structured
CORS configuration is appropriately restrictive
Secrets are properly stored in Google Secret Manager
Configuration uses pydantic-settings with validation
Token expiration is properly verified
No hardcoded credentials in source code

Security Score: D (High-severity issues present, no critical architectural flaws but multiple implementation vulnerabilities requiring fixes before production deployment)

Findings

🔴 [CRITICAL] Unvalidated external HTTP request to Google tokeninfo endpoint with user-supplied token. The token string is passed directly as a query parameter without URL encoding, creating a potential injection vector.

File: src/knowledge_base/mcp/oauth/token_validator.py:line 231
Category: A02_Cryptographic_Failures
Impact: An attacker could craft a malicious token parameter to inject arbitrary query parameters or potentially redirect the request to a different endpoint. Additionally, the token value itself is sent in plaintext over HTTPS in the URL query string rather than in the request body, increasing exposure.
Recommendation: Use httpx's params dict (which auto-encodes) instead of f-string interpolation. Alternatively, use POST with form data. Validate the token format before making the request (e.g., must be alphanumeric, max length).

🔴 [CRITICAL] Exception error messages may include sensitive token information in the InvalidTokenError message: f'Invalid authorized party: {claims['azp']}'

File: src/knowledge_base/mcp/oauth/token_validator.py:line 165
Category: A09_Logging_Monitoring
Impact: If this exception is caught and logged (as it may be in the MCP server's exception handler), claims data including potentially sensitive user identifiers could be exposed in logs or error responses.
Recommendation: Never include claims data in error messages. Use generic messages like 'Token validation failed' and log details separately at DEBUG level without sensitive content.

🟠 [HIGH] Dev mode in production bypasses all OAuth token validation. If MCP_DEV_MODE is ever set to 'true' in production (via environment variable), all authentication is skipped.

File: src/knowledge_base/mcp/server.py:line 114
Category: A04_Insecure_Design
Impact: An attacker with environment variable modification access could enable dev mode and access the MCP server without any valid token. All scope restrictions are bypassed.
Recommendation: Remove dev mode from production deployment. Use a separate test/staging server with dev mode instead. Or require a separate feature flag that cannot be environment-configured in production.

🟠 [HIGH] Scope granting logic uses email domain matching without verifying email_verified claim first. Line 61 checks email_verified before granting kb.write, but the concatenated scope list already includes kb.read from line 56 which was added based on email_verified check that might have failed.

File: src/knowledge_base/mcp/oauth/resource_server.py:line 56
Category: A01_Broken_Access_Control
Impact: An attacker with an unverified email address claiming to be @keboola.com could potentially trigger the logic path that grants kb.read. The code flow at lines 52-68 has a logic issue: if email_verified=False, scopes remain empty, but the subsequent kb.write check at line 67 uses email_verified again redundantly.
Recommendation: Restructure the logic to be explicit: Only assign scopes if email_verified is True. Return early if email_verified is False. Make the logic flow clearer and less error-prone.

🟠 [HIGH] Google opaque token validation via httpx.get() does not validate HTTPS certificate or check for redirect loops. The request could be intercepted or redirected.

File: src/knowledge_base/mcp/oauth/token_validator.py:line 213
Category: A02_Cryptographic_Failures
Impact: An attacker could perform a man-in-the-middle attack if the HTTPS connection is compromised, or Google's token endpoint could be spoofed via DNS hijacking.
Recommendation: Explicitly set verify=True in httpx.get() call. Consider using httpx.AsyncClient with default settings which should verify HTTPS. Add timeout and error handling for connection failures.

🟠 [HIGH] User email addresses are logged in tool execution: logger.info(f'Executing tool: {tool_name}, user: {user.get('sub', 'unknown')}') and elsewhere user email is embedded in log messages.

File: src/knowledge_base/mcp/tools.py:line 263
Category: A09_Logging_Monitoring
Impact: User email addresses (which are PII) are written to application logs. This violates privacy principles and the documented security anti-patterns for the repository.
Recommendation: Log only user.get('sub') or a hashed identifier, never email addresses. Create a helper function to sanitize user context before logging.

🟠 [HIGH] The search query is logged at info level: logger.info(f'Searching for: '{query[:100]}...''). If the query contains sensitive information, it will be exposed in logs.

File: src/knowledge_base/core/qa.py:line 28
Category: A09_Logging_Monitoring
Impact: User search queries (which may contain internal process information, product details, or other sensitive context) are stored in application logs.
Recommendation: Remove query logging or log at DEBUG level only. If logging is necessary for debugging, hash or truncate the query and note that it may contain sensitive content.

🟠 [HIGH] The oauth_token endpoint proxies client credentials (MCP_OAUTH_CLIENT_SECRET) to Google without validating the incoming request's origin or CSRF token. The token request could be forged if the endpoint is accessible.

File: src/knowledge_base/mcp/server.py:line 230
Category: A08_Integrity_Failures
Impact: An attacker could potentially craft a malicious token request and proxy it to Google, consuming resources or attempting authorization code reuse attacks.
Recommendation: Validate CSRF tokens on the token endpoint. Verify the request origin matches expected values. Implement rate limiting on token exchanges.

🟠 [HIGH] Log message 'Google OAuth: granted default scopes for verified user' at info level could be expanded with user identifiers in practice, leaking PII.

File: src/knowledge_base/mcp/oauth/resource_server.py:line 61
Category: A09_Logging_Monitoring
Impact: Current code logs at info level are relatively safe, but the pattern encourages logging user information.
Recommendation: Ensure logger calls never include user email or other PII. Use structured logging (which is already set up) but omit sensitive fields.

🟡 [MEDIUM] CORS middleware allows all origins from claude.ai: allow_origins=['https://claude.ai', 'https://www.claude.ai']. This is correct but should be verified that subdomain wildcards are not needed.

File: src/knowledge_base/mcp/server.py:line 99
Category: A05_Misconfiguration
Impact: If someone controls a subdomain of claude.ai, they could access the MCP server. This is likely acceptable for Claude.AI integration but should be documented.
Recommendation: If only specific Claude.AI endpoints should access the MCP server, restrict the CORS origins further. Consider adding rate limiting per origin.

🟡 [MEDIUM] Neo4j connection URI in Terraform uses network_ip from GCE instance: bolt://${google_compute_instance.neo4j_prod.network_interface[0].network_ip}:7687. This is hardcoded port 7687 but Terraform shows port 7687 (non-standard). Verify this matches Neo4j's actual configured port.

File: deploy/terraform/cloudrun-mcp.tf:line 145
Category: A04_Insecure_Design
Impact: If the port is misconfigured or Neo4j listens on a different port, connection will fail. No validation is performed.
Recommendation: Verify Neo4j is actually configured to listen on port 7687. Use a variable for the port number so it can be easily updated.

🟡 [MEDIUM] The /token endpoint error response from Google is returned directly without filtering. If Google returns detailed error messages, they could leak information about the authorization server's internal state.

File: src/knowledge_base/mcp/server.py:line 282
Category: A05_Misconfiguration
Impact: Detailed error messages from Google could help an attacker understand the authorization flow better.
Recommendation: Log full error details for debugging, but return a generic error message to the client: 'Token exchange failed'. Only include field-level errors that are safe.

🟡 [MEDIUM] The LLM provider name is logged at info level: logger.info(f'Using LLM provider: {llm.provider_name}'). While not directly sensitive, this reveals system architecture.

File: src/knowledge_base/core/qa.py:line 30
Category: A09_Logging_Monitoring
Impact: An attacker gains insight into which LLM provider is in use, which could help them target attacks.
Recommendation: Log at DEBUG level only, or remove this log entirely.

🟡 [MEDIUM] The ingest_document tool does not validate the URL scheme or destination. A user with kb.write scope could potentially ingest from file:// URLs or internal IP addresses.

File: src/knowledge_base/mcp/tools.py:line 196
Category: A08_Integrity_Failures
Impact: Server-Side Request Forgery (SSRF): An attacker could use the ingest_document tool to download from internal resources, scan internal networks, or access local files.
Recommendation: Validate the URL: reject file://, gopher://, internal IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8). Whitelist allowed domains or require HTTPS for remote URLs.

🟡 [MEDIUM] The _unauthorized_response and _forbidden_response methods return HTTP status codes but don't include rate limiting information or hints about retrying.

File: src/knowledge_base/mcp/oauth/resource_server.py:line 130
Category: A04_Insecure_Design
Impact: Error handling is basic; if an attacker is rate-limited or if there's a transient failure, the client gets no indication of the issue.
Recommendation: Add Retry-After header to responses. Consider implementing and advertising rate limiting via headers.

🟡 [MEDIUM] Error messages in tool execution include raw exception strings: f'Error executing {tool_name}: {str(e)}'. If the exception contains sensitive information, it will be exposed.

File: src/knowledge_base/mcp/tools.py:line 351
Category: A09_Logging_Monitoring
Impact: Detailed exception messages may leak information about internal systems or data structures.
Recommendation: Log the full exception at DEBUG level. Return a generic error message to the client: 'Tool execution failed. Please try again later.'

🟡 [MEDIUM] MCPSettings requires MCP_OAUTH_CLIENT_ID and MCP_OAUTH_CLIENT_SECRET with no default values, but if these are missing at runtime, the model_validator will raise an error. However, the error is only caught if the settings object is instantiated. If instantiation is deferred, this could silently fail.

File: src/knowledge_base/mcp/config.py:line 11
Category: A05_Misconfiguration
Impact: The application might start with incomplete configuration, leading to runtime errors later rather than fail-fast behavior.
Recommendation: Ensure settings are validated at application startup in the lifespan function or main entry point before the app is ready to accept requests.

🟢 [LOW] The sync validate() method and async validate_async() method both exist but they don't share the same JWKS cache timing logic in practice. The sync method uses self.jwks which might be stale.

File: src/knowledge_base/mcp/oauth/token_validator.py:line 115
Category: A07_Authentication_Failures
Impact: Token validation might succeed with cached but expired JWKS in some code paths.
Recommendation: Ensure both methods use the same cache validation. Consider consolidating into a single async method.

🟢 [LOW] The oauth_callback endpoint returns HTML with an embedded token code and state in JavaScript. While HTML-escaped, this could be improved.

File: src/knowledge_base/mcp/server.py:line 199
Category: A05_Misconfiguration
Impact: Minor XSS risk if escaping is incomplete. The pattern is standard but could be more secure.
Recommendation: Use a JSON response with CORS headers instead of HTML. Let the client handle the redirect.

🟢 [LOW] The lifespan function does not explicitly log startup completion or errors.

File: src/knowledge_base/mcp/server.py:line 360
Category: A09_Logging_Monitoring
Impact: No audit trail for server startup/shutdown events.
Recommendation: Add logging at INFO level for startup completion and at ERROR level for startup failures.

🟢 [LOW] Secret Manager secrets are created with lifecycle ignore_changes for secret_data. This means Terraform won't track secret changes, which is good for security, but the REPLACE_ME placeholder should be replaced before deployment.

File: deploy/terraform/cloudrun-mcp.tf:line 71
Category: A04_Insecure_Design
Impact: If REPLACE_ME is actually used, authentication will fail with unhelpful errors.
Recommendation: Add a validation check or document that these must be manually updated before applying. Consider using a separate secrets file or Secret Manager update script.

🔵 [INFO] The main() function configures logging with a JSON format. This is good for structured logging, but ensure that sensitive fields (tokens, API keys) are never logged anywhere.

File: src/knowledge_base/mcp/server.py:line 380
Category: A09_Logging_Monitoring
Impact: N/A - informational only.
Recommendation: Continue using structured logging and ensure all logging calls are reviewed for sensitive data.

🔵 [INFO] The get_tools_for_scopes function uses ANY logic: if any required scope matches, the tool is granted. This is correct, but document that kb.read alone does not grant kb.write access even if both are in OAUTH_SCOPES.

File: src/knowledge_base/mcp/tools.py:line 135
Category: A01_Access_Control
Impact: N/A - informational. The logic is correct.
Recommendation: Add a comment explaining the ANY (OR) logic for scope matching to prevent future confusion.

OWASP Top 10 Checklist

Category	Status
A01 Access Control	❌ FAIL
A02 Crypto Failures	❌ FAIL
A03 Injection	❌ FAIL
A04 Insecure Design	❌ FAIL
A05 Misconfiguration	❌ FAIL
A06 Vulnerable Components	✅ PASS
A07 Auth Failures	✅ PASS
A08 Integrity Failures	❌ FAIL
A09 Logging Monitoring	❌ FAIL
A10 Ssrf	❌ FAIL

🤖 Security review powered by Claude

github-actions bot approved these changes Feb 22, 2026

View reviewed changes

github-actions bot requested changes Feb 22, 2026

View reviewed changes

Gemini Agent added 2 commits February 23, 2026 22:08

Fix OAuth AS metadata to use https behind reverse proxy

12fd5ad

Cloud Run terminates TLS at the load balancer, so FastAPI sees HTTP. Use X-Forwarded-Proto header to construct correct external URLs.

github-actions bot approved these changes Feb 23, 2026

View reviewed changes

github-actions bot requested changes Feb 23, 2026

View reviewed changes

github-actions bot requested changes Feb 24, 2026

View reviewed changes

github-actions bot approved these changes Feb 24, 2026

View reviewed changes

github-actions bot requested changes Feb 24, 2026

View reviewed changes

github-actions bot approved these changes Feb 24, 2026

View reviewed changes

github-actions bot requested changes Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add MCP server with OAuth 2.1 for Claude.AI integration#22

Add MCP server with OAuth 2.1 for Claude.AI integration#22
manana2520 wants to merge 6 commits intomainfrom
feature/mcp-server

manana2520 commented Feb 22, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

manana2520 commented Feb 22, 2026

Summary

User Interaction Scenarios

Setup Required

Test plan

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🔒 Security Review

Security Review Summary: MCP Server OAuth 2.1 Integration

Critical Issues (5 findings)

High Severity Issues (4 findings)

Medium Severity Issues (6 findings)

Accepted Patterns (Correctly Implemented)

Recommendations Before Merge

Score Justification: D

Findings

🔴 [CRITICAL] Exception details logged in logger.warning() may expose TokenValidationError details including token claims or validation failures

🔴 [CRITICAL] Exception details logged in logger.error() during token validation without sanitization

🔴 [CRITICAL] Email address logged in logger.warning() when email_verified is False

🔴 [CRITICAL] Email address logged in logger.info() during token validation

🔴 [CRITICAL] Exception details including full exception object logged in logger.warning() during token validation

🟠 [HIGH] Dev mode disables OAuth token validation entirely when MCP_DEV_MODE=true

🟠 [HIGH] Scope-based access control logic grants write access based on email domain (@keboola.com) without additional verification

🟠 [HIGH] Google opaque access token validation uses external Google API call (tokeninfo endpoint) without caching or rate limiting

🟡 [MEDIUM] MCP_OAUTH_CLIENT_SECRET has empty string default while marked as optional

🟡 [MEDIUM] User email from token claims used directly in chunk metadata and page title without sanitization

🟡 [MEDIUM] User email logged in tool execution logging without rate limiting or audit trail

🟡 [MEDIUM] NEO4J_PASSWORD stored as plaintext environment variable in Cloud Run service definition

🟡 [MEDIUM] search_knowledge query parameter used directly in search without input validation or sanitization

🟡 [MEDIUM] Dev mode bypass is visible in production code without clear enforcement mechanism to prevent accidental enablement

🟢 [LOW] CORS middleware allows origins=['*'] with allow_credentials=True

🟢 [LOW] TEST_USER_EMAIL environment variable used in dev mode without documentation

🔵 [INFO] Google ID token validation does not verify azp (authorized party) claim is present as part of required claims

OWASP Top 10 Checklist

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🔒 Security Review

Security Review Summary: MCP Server OAuth Integration

Decision: REQUEST_CHANGES

Critical Issues (5 findings)

High Severity Issues (4 findings)

Medium Severity Issues (2 findings)

Recommendations

Findings

🔴 [CRITICAL] Email addresses (PII) logged at warning level in token validation error handler: logger.warning(f"Token validation failed: {e}")

🔴 [CRITICAL] Full exception details logged at error level in token validation handler: logger.error(f"Unexpected error during token validation: {e}")

🔴 [CRITICAL] User email address (PII) logged at warning level when email not verified: logger.warning(f"Google token email not verified: {claims.get('email')}

🔴 [CRITICAL] User email address (PII) logged at info level during token validation: logger.info(f"Google access token validated for: {normalized.get('email')}

🔴 [CRITICAL] Full exception details logged at warning level during token validation: logger.warning(f"Token validation failed: {type(e).__name__}: {e}")

🟠 [HIGH] Informational logging of user email for scope decisions: Multiple logs at info level log email addresses when granting scopes. Line 205-206: logger.info(f"Google OAuth: granted default scopes for {email}") and logger.info(f"Google OAuth: granted write scope for internal user {email}")

🟠 [HIGH] Dev mode bypasses all OAuth token validation: When dev_mode=True, the middleware skips token validation entirely and grants all scopes without verification.

🟠 [HIGH] MCP OAuth secrets use placeholder values 'REPLACE_ME' in Terraform with lifecycle { ignore_changes = [secret_data] }. This means if the secret is never actually populated, the placeholder could be deployed.

🟡 [MEDIUM] User email address from OAuth context is used directly in url metadata without validation: url=f"mcp://user/{user_email}". User email is extracted from claims without additional validation.

🟡 [MEDIUM] MCP server logs OAuth configuration including the audience (client_id) at startup: logger.info(f"OAuth audience: {_get_oauth_audience()}")

🟡 [MEDIUM] CORS middleware configured with allow_origins=["*"] (all origins allowed) without additional protection on sensitive endpoints.

🟡 [MEDIUM] Email verification check is logged as warning but does NOT prevent token validation from succeeding: if not claims.get("email_verified", False): logger.warning(...); return claims

🟡 [MEDIUM] In dev mode, the test user email is logged: logger.info(f"MCP dev mode: using email {dev_email})

🟢 [LOW] MCP configuration requires MCP_OAUTH_CLIENT_ID and MCP_OAUTH_CLIENT_SECRET with no defaults, but no pydantic validation error message is provided to guide deployment.

🟢 [LOW] Tool execution logs the tool name and user email: logger.info(f"Executing tool: {tool_name}, user: {user.get('email', 'unknown')}")

OWASP Top 10 Checklist

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🔒 Security Review

Security Review Summary

Critical Issues (1)

High Severity Issues (3)

Medium Severity Issues (7)

Low Severity Issues (1)

Key Risks

🔴 [CRITICAL] Email addresses (PII) logged at warning level in token validation error handler: `logger.warning(f"Token validation failed: {e}")`

🔴 [CRITICAL] Full exception details logged at error level in token validation handler: `logger.error(f"Unexpected error during token validation: {e}")`

🔴 [CRITICAL] User email address (PII) logged at warning level when email not verified: `logger.warning(f"Google token email not verified: {claims.get('email')}`

🔴 [CRITICAL] User email address (PII) logged at info level during token validation: `logger.info(f"Google access token validated for: {normalized.get('email')}`

🔴 [CRITICAL] Full exception details logged at warning level during token validation: `logger.warning(f"Token validation failed: {type(e).name}: {e}")`

🟠 [HIGH] Dev mode bypasses all OAuth token validation: When `dev_mode=True`, the middleware skips token validation entirely and grants all scopes without verification.

🟠 [HIGH] MCP OAuth secrets use placeholder values 'REPLACE_ME' in Terraform with `lifecycle { ignore_changes = [secret_data] }`. This means if the secret is never actually populated, the placeholder could be deployed.

🟡 [MEDIUM] User email address from OAuth context is used directly in `url` metadata without validation: `url=f"mcp://user/{user_email}"`. User email is extracted from claims without additional validation.

🟡 [MEDIUM] MCP server logs OAuth configuration including the audience (client_id) at startup: `logger.info(f"OAuth audience: {_get_oauth_audience()}")`

🟡 [MEDIUM] CORS middleware configured with `allow_origins=["*"]` (all origins allowed) without additional protection on sensitive endpoints.

🟡 [MEDIUM] Email verification check is logged as warning but does NOT prevent token validation from succeeding: `if not claims.get("email_verified", False): logger.warning(...); return claims`

🟡 [MEDIUM] In dev mode, the test user email is logged: `logger.info(f"MCP dev mode: using email {dev_email}`)

🟢 [LOW] MCP configuration requires `MCP_OAUTH_CLIENT_ID` and `MCP_OAUTH_CLIENT_SECRET` with no defaults, but no pydantic validation error message is provided to guide deployment.

🟢 [LOW] Tool execution logs the tool name and user email: `logger.info(f"Executing tool: {tool_name}, user: {user.get('email', 'unknown')}")`

🔴 [CRITICAL] Token validation error logged with exception type name: `logger.warning(f"Token validation failed: {type(e).name}")`

🔴 [CRITICAL] Unexpected error during token validation logged with exception type: `logger.error(f"Unexpected error during token validation: {type(e).name}")`

🟠 [HIGH] Google token email verification status logged at warning level: `logger.warning("Google token email not verified")`

🟠 [HIGH] Google access token validation success logged at debug level: `logger.debug("Google access token validated")`

🟠 [HIGH] Dev mode token validation skip logged at debug level: `logger.debug("MCP dev mode: skipping token validation")`

🟠 [HIGH] Token validation failure logged with exception type in production middleware: `logger.warning(f"Token validation failed: {type(e).name}")`

🟡 [MEDIUM] CORS middleware allows all origins: `allow_origins=["*"]`

🟡 [MEDIUM] OAuth /authorize endpoint uses `_map_to_google_scopes()` which strips all non-standard scopes, but the MCP spec may expect scope validation. Custom scopes like `kb.read`/`kb.write` are silently dropped without user/client awareness.

🟡 [MEDIUM] User input `user_email` from token claims used directly in constructing MCP URLs without validation: `url=f"mcp://user/{quote(user_email, safe='')}"`

🟡 [MEDIUM] Required credentials `MCP_OAUTH_CLIENT_ID` and `MCP_OAUTH_CLIENT_SECRET` are marked as required without default, but no validation ensures they are not empty strings after loading from environment.

🟡 [MEDIUM] OAuth token exchange logs request parameters including grant_type: `logger.info(f"OAuth token exchange: grant_type={token_params.get('grant_type', 'N/A')}")`

🟡 [MEDIUM] OAuth authorize endpoint logs client redirect_uri and scope information: `logger.info(f"OAuth authorize: redirecting to Google (scope={google_scopes}, redirect_uri={params.get('redirect_uri', 'N/A')})")`

🟡 [MEDIUM] Secret Manager secrets use lifecycle `ignore_changes = [secret_data]` with placeholder value `"REPLACE_ME"`

🟡 [MEDIUM] Internal users (@keboola.com) automatically granted `kb.write` scope based on email domain check without verification of email_verified status

🟢 [LOW] Successful OAuth client registration logs client details: `logger.info(f"OAuth client registration: name={client_name}, redirect_uris={redirect_uris}")`

🟢 [LOW] Application startup logs OAuth configuration including issuer and existence of client_id: `logger.info(f"OAuth issuer: {mcp_settings.MCP_OAUTH_ISSUER}")` and `logger.info("OAuth audience configured")`

🔵 [INFO] Comments document that Google tokens use `client_id` as audience, which is correct for ID tokens but opaque access tokens may not follow this pattern. The code assumes all Google tokens use client_id as audience.