Skip to content

Conversation

@buremba
Copy link
Member

@buremba buremba commented Jan 11, 2026

Summary

Migrate from thread-scoped volumes and user-scoped credentials to space-based multi-tenant architecture where users/groups share persistent workspace across all their threads.

Key changes:

  • Volumes: peerbot-workspace-{threadId}peerbot-workspace-{spaceId}
  • MCP Credentials: mcp:credential:{userId}:{mcpId}mcp:credential:{spaceId}:{mcpId}
  • Claude Credentials: claude:credential:{userId}claude:credential:{spaceId}

Space ID format:

Context Space ID Hash Input
DM/User user-{hash8} {platform}:user:{userId}
Group/Channel group-{hash8} {platform}:group:{channelId}

Changes:

  • Add space-resolver module for consistent spaceId generation
  • Update all credential stores (MCP, Claude) to use spaceId
  • Update OAuth flows to include spaceId in state
  • Update deployment managers to use space-based volumes
  • Update worker token to include spaceId
  • Add WhatsApp platform adapter
  • Clean up unused files and dependencies

Test plan

  • Verify space resolution works correctly for DM and group contexts
  • Test OAuth flow stores credentials with correct spaceId
  • Verify volume naming uses spaceId
  • Test multi-user access to shared space credentials

Note

Implements space-scoped multi-tenancy and modernizes platform/infra tooling.

  • Multi-tenant refactor: propagate spaceId across core types and module interfaces; switch credential stores to claude:credential:{spaceId} and mcp:credential:{spaceId}:{mcpId}; include spaceId in worker tokens and OAuth state/flows; update MCP config/status to be space-scoped
  • Platform: add WhatsApp adapter and deps (@whiskeysockets/baileys, qrcode-terminal); docs reflect WhatsApp as primary platform
  • Infra/Helm: add Redis subchart and queue env wiring; expose HTTP proxy 8118; add startup/liveness/readiness probes, secrets checksum annotation; stricter NetworkPolicies enforcing gateway-proxied egress and worker isolation; namespace-scoped worker RBAC; optional metrics ServiceMonitor; values tuned (securityContext read-only, Gatekeeper stubs, WhatsApp/allowlist settings)
  • Runtime/dev: gateway Dockerfile switches to Bun install + Node tsx runner; sidecar-first dev flow, Makefile streamlined (build-worker, process mgmt); remove docker-compose.dev; logging fallback in core (USE_SIMPLE_LOGGER) and safer Sentry init
  • Docs/CLI: update AGENTS.md and template README; minor CLI compose tweaks

Written by Cursor Bugbot for commit 16b8723. This will update automatically on new commits. Configure here.

buremba and others added 2 commits November 9, 2025 01:22
- Remove build-only profile from worker service so it auto-builds on docker compose up
- Add worker image validation on gateway startup with helpful error messages
- Add upgrade instructions to README template (pull images, rebuild worker, update env vars)
- Fixes issue where users see 'No such image' error on first message after init

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Migrate from thread-scoped volumes and user-scoped credentials to
space-based multi-tenant architecture where users/groups share
persistent workspace across all their threads.

Key changes:
- Volumes: peerbot-workspace-{threadId} → peerbot-workspace-{spaceId}
- Credentials: mcp:credential:{userId}:{mcpId} → mcp:credential:{spaceId}:{mcpId}
- Claude credentials: claude:credential:{userId} → claude:credential:{spaceId}

Space ID format:
- DM/User context: user-{hash8} (hash of {platform}:user:{userId})
- Group/Channel context: group-{hash8} (hash of {platform}:group:{channelId})

Changes:
- Add space-resolver module for consistent spaceId generation
- Update all credential stores to use spaceId
- Update OAuth flows to include spaceId in state
- Update deployment managers to use space-based volumes
- Update worker token to include spaceId
- Add WhatsApp platform adapter
- Clean up unused files and dependencies
@claude
Copy link
Contributor

claude bot commented Jan 11, 2026

Claude finished @buremba's task —— View job


PR Review: Multi-Tenant Space Architecture

I've completed my comprehensive review of this PR implementing the migration from thread-scoped to space-based multi-tenant architecture. Here's my detailed feedback:

Review Tasks:

  • Analyze core space architecture and resolver implementation
  • Review credential store migration (MCP and Claude)
  • Examine OAuth flow changes and security implications
  • Check deployment manager updates for volume naming
  • Review WhatsApp platform integration
  • Assess error handling and edge cases
  • Evaluate performance implications
  • Check for potential race conditions or concurrency issues
  • Review test coverage and suggest improvements
  • Provide structured feedback and recommendations

Strengths

1. Well-Designed Space Architecture

  • Clean abstraction: The space-resolver.ts provides a clear, consistent way to map platform contexts to space IDs
  • Good hashing strategy: Using SHA256 with 8-char truncation balances uniqueness with K8s label compatibility
  • Platform-aware logic: Different space ID generation for DM vs Group contexts is well thought out

2. Comprehensive Migration

  • Consistent patterns: All credential stores properly migrated from userId to spaceId keys
  • Backward compatibility: Fallback logic in deployment managers (spaceId || threadId) ensures smooth transition
  • OAuth flows updated: Both Claude and MCP OAuth properly include spaceId in state management

3. Security Considerations

  • Proper token validation: OAuth state tokens include expiry and validation in both Claude and MCP modules
  • Encrypted state: Using encrypt()/decrypt() for OAuth state tokens is good security practice
  • Platform isolation: Space IDs properly namespace credentials between platforms

⚠️ Issues & Concerns

1. Critical: No Migration Strategy for Existing Data

Severity: HIGH

The PR changes credential key patterns but provides no migration path for existing users:

  • Old pattern: claude:credential:{userId}, mcp:credential:{userId}:{mcpId}
  • New pattern: claude:credential:{spaceId}, mcp:credential:{spaceId}:{mcpId}

Impact: Existing users will lose all their OAuth credentials and need to re-authenticate.

Recommendation: Add a migration script or startup migration logic:

// Example migration logic needed
async migrateCredentials() {
  // 1. Find all old-format credentials
  // 2. For each userId, determine their spaceId(s)
  // 3. Copy credentials to new keys
  // 4. Clean up old keys after verification
}

2. Potential Race Conditions in Volume Creation

File: packages/gateway/src/orchestration/impl/docker-deployment.ts:171-233

The volume creation logic has race condition mitigation but could be improved:

// Current approach handles 409 conflicts but has timing issues
if (createError.statusCode === 409) {
  logger.info(`Volume ${volumeName} was created by another thread`);
}

Recommendation: Use atomic operations or distributed locks for volume creation.

3. Hash Collision Risk

File: packages/gateway/src/spaces/space-resolver.ts:19-21

Using only 8 characters of SHA256 creates collision risk:

  • 8 chars hex = 4.3 billion combinations
  • Birthday paradox: ~50% collision chance at √(4.3B) ≈ 65k users

Recommendation: Consider increasing to 12-16 characters or add collision detection.

4. Missing Input Validation

File: packages/gateway/src/spaces/space-resolver.ts:30-46

The resolveSpace function lacks input validation:

export function resolveSpace(context: SpaceContext): ResolvedSpace {
  const { platform, userId, channelId, isGroup } = context;
  // No validation for required fields!

Recommendation: Add validation:

if (!platform || !userId || !channelId) {
  throw new Error("Missing required context fields");
}

5. Inconsistent Error Handling

File: packages/gateway/src/orchestration/impl/k8s-deployment.ts:274-297

RuntimeClass availability check clears the config on failure but doesn't properly propagate the decision:

// This silently degrades security without proper alerting
this.config.worker.runtimeClassName = undefined;

Recommendation: Add proper alerting for security degradation.


🚀 Performance Considerations

1. Positive: Shared Volumes

The space-based approach enables volume sharing across threads, reducing storage overhead and improving session continuity.

2. Concern: Increased Redis Key Lookups

With space-based credentials, the system may need additional lookups to resolve space from context. Consider caching frequently accessed space resolutions.


🔒 Security Assessment

1. Improved Multi-Tenancy

Space-based isolation is better than thread-based for security, preventing credential leakage between different contexts.

2. OAuth Security Looks Good

  • Proper token expiry (5 minutes)
  • CSRF protection with encrypted state
  • Platform-specific callback validation

3. Volume Security

Docker volume permission fixing (chown 1001:1001) is appropriate for the worker user setup.


🧪 Test Coverage Issues

Critical Gap: No Tests for Core Migration Logic

The PR lacks tests for:

  • Space resolution logic (space-resolver.ts)
  • Credential migration paths
  • Volume naming changes
  • OAuth state management with spaceId

Recommendation: Add comprehensive tests:

// Example tests needed
describe('SpaceResolver', () => {
  it('should generate consistent space IDs for same context');
  it('should handle platform-specific group detection');
  it('should prevent hash collisions');
});

📝 Code Quality

1. Good Patterns

  • ✅ Consistent use of factory pattern for platforms
  • ✅ Clean separation of concerns
  • ✅ Good error handling hierarchy with BaseError

2. Areas for Improvement

  • WhatsApp Integration: Large files with complex logic could benefit from better decomposition
  • Magic Numbers: Hash length (8) and token expiry (5 min) should be configurable
  • Logging: Some operations lack structured logging for debugging

🔧 Recommendations

Immediate (Pre-Merge):

  1. Add credential migration logic - Critical for existing users
  2. Increase hash length to 12-16 chars to reduce collision risk
  3. Add input validation to resolveSpace()
  4. Add integration tests for space resolution

Post-Merge:

  1. Monitor hash collisions in production
  2. Add performance metrics for space resolution
  3. Consider caching for frequently accessed space mappings
  4. Add migration dashboard to track credential migration progress

🎯 Overall Assessment

This is a well-architected migration to a more scalable multi-tenant model. The space-based approach is conceptually sound and addresses real scalability issues. However, the lack of migration strategy for existing data is a critical blocker that must be addressed before merge.

Recommendation: Address the migration strategy and collision risk, then this PR will significantly improve the platform's multi-tenant capabilities.


@mesa-dot-dev
Copy link

mesa-dot-dev bot commented Jan 11, 2026

Mesa Description

TL;DR

Migrates from thread-scoped volumes and user-scoped credentials to a space-based multi-tenant architecture where users/groups share persistent workspaces across all their threads.

What changed?

  • Added space-resolver module for consistent spaceId generation (format: user-{hash8} for DMs, group-{hash8} for groups)
  • Updated credential stores (MCP, Claude) to use spaceId instead of userId
  • Updated OAuth flows to include spaceId in state parameter
  • Updated deployment managers to use space-based volume naming (peerbot-workspace-{spaceId})
  • Updated worker token to include spaceId
  • Added WhatsApp platform adapter
  • Migrated volume naming from peerbot-workspace-{threadId} to peerbot-workspace-{spaceId}
  • Migrated MCP credentials from mcp:credential:{userId}:{mcpId} to mcp:credential:{spaceId}:{mcpId}
  • Migrated Claude credentials from claude:credential:{userId} to claude:credential:{spaceId}
  • Cleaned up unused files and dependencies

Description generated by Mesa. Update settings

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of 9c33352...16b8723

Analysis

  1. Hash Collision Risk: The 8-character SHA256 space ID (32 bits entropy) creates significant collision probability at scale (~1% at 77,000 spaces), potentially causing unauthorized workspace sharing, credential leakage, and privacy violations. Increase hash length to 12-16 characters or implement collision detection.

  2. Unauthorized Credential Sharing: Group space design allows ALL members to use ANY member's authenticated credentials without consent, violating OAuth provider terms and creating security vulnerabilities when group membership changes. Implement user-scoped credentials or explicit consent flows.

  3. Missing Volume Lifecycle Management: Space PVCs are never deleted, causing unbounded storage growth, orphaned resources, and potential compliance issues. Implement cleanup policies based on activity.

  4. Broken Backwards Compatibility: No migration logic exists for user-scoped credentials, forcing users to re-authenticate without explanation and losing existing credentials. Implement credential migration or dual-lookup during transition.

  5. Brittle Platform Detection: Space resolution relies on string prefix/suffix matching for platform detection without proper validation or error handling for unexpected formats.

Tip

Help

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

0 files reviewed | 4 comments | Edit Agent SettingsRead Docs

// Don't throw - deployment deletion should succeed even if PVC cleanup fails
}
}
// NOTE: Space PVCs are NOT deleted on deployment deletion
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium

Medium: Unbounded PVC Growth

Space PVCs are intentionally never deleted (for persistence across threads), but this creates operational and compliance concerns:

Issues:

  1. Unbounded Storage Growth: Every unique space creates a PVC that persists indefinitely
  2. Orphaned Resources: Inactive spaces continue consuming storage resources
  3. Data Retention: May violate data retention policies (GDPR, etc.) if user data persists after account deletion
  4. Cost: Cloud storage costs grow linearly with number of unique spaces ever created

Recommendations:

  1. Implement PVC cleanup policy based on last access timestamp
  2. Add monitoring for PVC count and total storage usage
  3. Create admin API/tool to list and cleanup inactive space PVCs
  4. Document data retention policy and cleanup procedures
  5. Consider adding PVC expiration annotations that can be extended on access

Example cleanup policy: Delete PVCs not accessed in 90 days (configurable)

Agent: 🏛 Architecture • Fix in Cursor • Fix in Claude

Prompt for Agent
Task: Address review feedback left on GitHub.
Repository: buremba/peerbot#91
File: packages/gateway/src/orchestration/impl/k8s-deployment.ts#L713
Action: Open this file location in your editor, inspect the highlighted code, and resolve the issue described below.

Feedback:
**Medium: Unbounded PVC Growth**

Space PVCs are intentionally never deleted (for persistence across threads), but this creates operational and compliance concerns:

**Issues:**
1. **Unbounded Storage Growth**: Every unique space creates a PVC that persists indefinitely
2. **Orphaned Resources**: Inactive spaces continue consuming storage resources
3. **Data Retention**: May violate data retention policies (GDPR, etc.) if user data persists after account deletion
4. **Cost**: Cloud storage costs grow linearly with number of unique spaces ever created

**Recommendations:**
1. Implement PVC cleanup policy based on last access timestamp
2. Add monitoring for PVC count and total storage usage
3. Create admin API/tool to list and cleanup inactive space PVCs
4. Document data retention policy and cleanup procedures
5. Consider adding PVC expiration annotations that can be extended on access

Example cleanup policy: Delete PVCs not accessed in 90 days (configurable)

// Generate worker authentication token with platform info
// Check both top-level teamId (WhatsApp) and platformMetadata.teamId (Slack)
const teamId = messageData.teamId || platformMetadata?.teamId;
const spaceId = messageData.spaceId || threadId; // Fall back to threadId for backwards compatibility
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High

High: Missing Credential Migration Path

This fallback to threadId provides backwards compatibility for deployment naming, but there's no corresponding migration logic for existing credentials.

Problem:
Existing credentials stored as:

  • claude:credential:{userId}
  • mcp:credential:{userId}:{mcpId}

Will NOT be accessible after this deployment because the new code looks for:

  • claude:credential:{spaceId}
  • mcp:credential:{spaceId}:{mcpId}

Impact:

  • All users will lose access to their existing OAuth credentials
  • Users must re-authenticate without explanation
  • Poor user experience during rollout

Recommendations:

  1. Implement credential migration script that runs during deployment
  2. Add dual-lookup fallback: check spaceId first, fall back to userId for transition period
  3. Add migration timeline and deprecation warnings
  4. Consider adding a migration flag to track which credentials have been migrated

Agent: 🏛 Architecture • Fix in Cursor • Fix in Claude

Prompt for Agent
Task: Address review feedback left on GitHub.
Repository: buremba/peerbot#91
File: packages/gateway/src/orchestration/base-deployment-manager.ts#L225
Action: Open this file location in your editor, inspect the highlighted code, and resolve the issue described below.

Feedback:
**High: Missing Credential Migration Path**

This fallback to `threadId` provides backwards compatibility for deployment naming, but there's no corresponding migration logic for existing credentials.

**Problem:**
Existing credentials stored as:
- `claude:credential:{userId}`
- `mcp:credential:{userId}:{mcpId}`

Will NOT be accessible after this deployment because the new code looks for:
- `claude:credential:{spaceId}`  
- `mcp:credential:{spaceId}:{mcpId}`

**Impact:**
- All users will lose access to their existing OAuth credentials
- Users must re-authenticate without explanation
- Poor user experience during rollout

**Recommendations:**
1. Implement credential migration script that runs during deployment
2. Add dual-lookup fallback: check spaceId first, fall back to userId for transition period
3. Add migration timeline and deprecation warnings
4. Consider adding a migration flag to track which credentials have been migrated

* Uses first 8 chars of SHA256 for uniqueness with K8s label compatibility.
*/
export function hashPlatformId(id: string): string {
return createHash("sha256").update(id).digest("hex").substring(0, 8);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High

Critical: Hash Collision Risk

Using only 8 characters (32 bits) of SHA256 creates a realistic collision risk. By the birthday paradox, collision probability reaches ~1% at around 77,000 spaces, and 50% at ~400,000 spaces.

For a production system with thousands of users across multiple platforms, collisions could cause:

  • Different users/groups sharing the same workspace volume
  • Credential leakage between unrelated spaces
  • Data privacy violations

Recommendations:

  1. Increase hash length to at least 12-16 characters (48-64 bits)
  2. Implement collision detection and retry with salt
  3. Add monitoring/alerting for hash collisions
  4. Consider including a timestamp component to reduce collision probability

Agent: 🏛 Architecture • Fix in Cursor • Fix in Claude

Prompt for Agent
Task: Address review feedback left on GitHub.
Repository: buremba/peerbot#91
File: packages/gateway/src/spaces/space-resolver.ts#L20
Action: Open this file location in your editor, inspect the highlighted code, and resolve the issue described below.

Feedback:
**Critical: Hash Collision Risk**

Using only 8 characters (32 bits) of SHA256 creates a realistic collision risk. By the birthday paradox, collision probability reaches ~1% at around 77,000 spaces, and 50% at ~400,000 spaces.

For a production system with thousands of users across multiple platforms, collisions could cause:
- Different users/groups sharing the same workspace volume
- Credential leakage between unrelated spaces  
- Data privacy violations

**Recommendations:**
1. Increase hash length to at least 12-16 characters (48-64 bits)
2. Implement collision detection and retry with salt
3. Add monitoring/alerting for hash collisions
4. Consider including a timestamp component to reduce collision probability

* Detect if context represents a group/channel based on platform heuristics.
* Use when isGroup is not explicitly available.
*/
export function isGroupContext(platform: string, channelId: string): boolean {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium

Medium: Brittle Platform Detection Heuristics

Using string prefix/suffix matching for group detection is fragile:

Risks:

  1. Slack/WhatsApp could change their ID format in the future
  2. No validation for unexpected formats
  3. Silent failures could cause incorrect space resolution (users in groups treated as DMs or vice versa)
  4. Other platforms may not follow predictable patterns

Specific Concerns:

  • Slack: What about private channels, threads, or new channel types?
  • WhatsApp: What about broadcast lists or status updates?
  • Future platforms: No extensibility mechanism

Recommendations:

  1. Add validation and error logging for unexpected channel ID formats
  2. Make platform detection configurable/extensible
  3. Add explicit isGroup parameter to the context (don't rely on heuristics)
  4. Return an error/warning when heuristic detection is uncertain
  5. Add unit tests for various channel ID formats

Agent: 🏛 Architecture • Fix in Cursor • Fix in Claude

Prompt for Agent
Task: Address review feedback left on GitHub.
Repository: buremba/peerbot#91
File: packages/gateway/src/spaces/space-resolver.ts#L52
Action: Open this file location in your editor, inspect the highlighted code, and resolve the issue described below.

Feedback:
**Medium: Brittle Platform Detection Heuristics**

Using string prefix/suffix matching for group detection is fragile:

**Risks:**
1. Slack/WhatsApp could change their ID format in the future
2. No validation for unexpected formats
3. Silent failures could cause incorrect space resolution (users in groups treated as DMs or vice versa)
4. Other platforms may not follow predictable patterns

**Specific Concerns:**
- Slack: What about private channels, threads, or new channel types?
- WhatsApp: What about broadcast lists or status updates?
- Future platforms: No extensibility mechanism

**Recommendations:**
1. Add validation and error logging for unexpected channel ID formats
2. Make platform detection configurable/extensible
3. Add explicit `isGroup` parameter to the context (don't rely on heuristics)
4. Return an error/warning when heuristic detection is uncertain
5. Add unit tests for various channel ID formats

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/buremba/peerbot/blob/16b8723b218d3fd3bc4af0b83cc1600030350b9c/packages/gateway/src/orchestration/message-consumer.ts#L124-L125
P1 Badge Check credentials by spaceId instead of userId

Claude credentials are now keyed by spaceId (claude:credential:{spaceId}), but the orchestrator still calls hasCredentials(data.userId) here; because spaceIds are hashed (user-…/group-…), this check will always fail even when a space is already authenticated, so every message will trigger the auth prompt and existing credentials are never reused. Use data.spaceId (with a threadId fallback if needed) for the credential lookup.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +522 to +526
platformMetadata: {
jid: context.chatJid,
senderJid: context.senderJid,
senderE164: context.senderE164,
senderName: context.senderName,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include spaceId in WhatsApp metadata for auth flow

The WhatsApp auth adapter derives the space from platformMetadata.spaceId and otherwise falls back to channelId; this payload’s platformMetadata omits spaceId, so OAuth state/credentials are stored under the raw chat JID instead of the hashed spaceId. Subsequent lookups use the hashed spaceId and won’t find those credentials, resulting in an auth loop for WhatsApp chats. Add spaceId to platformMetadata (or recompute it in the adapter) so storage and lookup keys match.

Useful? React with 👍 / 👎.

Comment on lines +248 to 252
// Use spaceId for volume naming (shared across threads in same space)
// Fall back to threadId for backwards compatibility
const spaceId = messageData?.spaceId || threadId;

// Determine if running in Docker and resolve project paths

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Space-scoped volumes still yield thread-scoped workdirs

Docker deployments now mount a space-scoped volume at /workspace, but the worker still derives its working directory from THREAD_ID/sessionKey (see WorkspaceManager), so each new thread in the same space will use a fresh /workspace/<threadId> subdir and won’t see files from previous threads. That undermines the stated goal of a shared workspace across threads; consider switching the worker workspace directory to spaceId when SPACE_ID is provided.

Useful? React with 👍 / 👎.

@buremba buremba merged commit abc195f into main Jan 12, 2026
21 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant