Skip to content

poc: Embedded OpenFGA for fine-grained authorization#335

Draft
edsonmichaque wants to merge 9 commits intomainfrom
poc/embedded-openfga-fine-grained-authz
Draft

poc: Embedded OpenFGA for fine-grained authorization#335
edsonmichaque wants to merge 9 commits intomainfrom
poc/embedded-openfga-fine-grained-authz

Conversation

@edsonmichaque
Copy link

@edsonmichaque edsonmichaque commented Mar 8, 2026

Summary

Proof-of-concept for embedded OpenFGA as the fine-grained authorization backend for AI Studio. Replaces the current boolean flags (IsAdmin) and SQL JOIN-based group filtering with a unified relationship graph (Zanzibar-style ReBAC).

  • Embedded engine: In-process OpenFGA server with in-memory datastore — no external service dependency
  • Abstract interface: Authorizer interface with domain-neutral terminology (Subject/Relation/Resource), fully decoupled from OpenFGA specifics
  • Clean package split: authz/ for interface + middleware (no OpenFGA imports), authz/openfga/ for the implementation
  • Incremental sync: Syncer translates service-layer mutations into Grant/Revoke calls in real-time — no stale permissions
  • Full authorization model: Covers system roles, groups, catalogues (LLM/Data/Tool), resources (LLMs, Datasources, Tools), apps (ownership + sharing), chats, plugins, plugin resources (composite IDs), submissions (UGC), and filters
  • FullSync: Startup sync reads 12 GORM tables and populates the relationship graph
  • Feature toggle: OPENFGA_ENABLED env var — when disabled, NoopAuthorizer defers to legacy auth
  • Shadow mode: Middleware that runs new auth alongside legacy, logs discrepancies, never blocks
  • Enforcement middleware: RequireSystemAdmin, RequireSSOAdmin, RequireRelation, RequireCanUse, RequireCanAdmin
  • Security: Colon injection prevention in composite IDs via ValidateID() and first-colon parsing

Authorization model access chain

user:10 --member--> group:1 --assigned_group--> catalogue:1 --parent_catalogue--> llm:5
∴ user:10 can_use llm:5

Package structure

authz/                          # Abstract — no OpenFGA imports
  authz.go                      # Authorizer interface, Relationship type, helpers
  noop.go                       # NoopAuthorizer (legacy passthrough)
  syncer.go                     # Incremental sync (service mutations → Grant/Revoke)
  middleware.go                 # Gin enforcement middleware
  shadow.go                     # Shadow mode (log discrepancies, don't block)
authz/openfga/                  # Implementation — OpenFGA-specific
  store.go                      # Embedded OpenFGA Store implementing Authorizer
  sync.go                       # Database-to-graph full sync (startup)
  config.go                     # Factory (NewFromEnv) with feature toggle
  model.fga                     # Authorization model DSL (fully commented)

Syncer — incremental permission updates

The Syncer keeps the authorization store in sync after startup by providing
methods for every permission-relevant mutation in the service layer:

Category Methods
User lifecycle OnUserCreated, OnUserUpdated, OnUserDeleted
Group membership OnUserAddedToGroup, OnUserRemovedFromGroup, OnGroupMembersReplaced
Catalogue assignments OnCatalogueAssignedToGroup, OnCatalogueRemovedFromGroup
Resource-catalogue OnResourceAddedToCatalogue, OnResourceRemovedFromCatalogue
Ownership OnOwnershipSet, OnOwnershipChanged
App sharing OnAppShared, OnAppUnshared
Chat groups OnChatGroupAssigned, OnChatGroupRemoved
Submissions OnSubmissionCreated, OnReviewerAssigned
Plugins OnPluginInstalled, OnPluginResourceGroupAssigned/Removed

All methods are no-ops when the authorizer is disabled.

Migration plan (6 phases)

  1. Infrastructure — embedded engine, model, sync, tests
  2. Shadow mode — run alongside legacy, log discrepancies
  3. Replace admin checks
  4. Replace resource access (catalogue-based filtering)
  5. Fine-grained permissions (app sharing, submission review)
  6. New capabilities (delegated admin, field-level access)

Test plan

  • 46 tests passing across authz/ and authz/openfga/
  • Full relationship chain tests (user -> group -> catalogue -> resource)
  • Ownership tests (datasource, app)
  • App sharing (direct user, group members)
  • Submission access (submitter vs reviewer)
  • Plugin installer permissions
  • Composite ID handling (plugin resources)
  • Grant/Revoke/GrantAndRevoke atomicity
  • NoopAuthorizer always allows
  • Security: colon injection rejected, empty IDs rejected
  • Syncer: incremental grant/revoke for all mutation types
  • Syncer: diff-based group membership replacement
  • Syncer: no-op when authorizer is disabled

Add pkg/authz/ package with embedded OpenFGA server for fine-grained
relationship-based authorization. This replaces the current boolean/JOIN-based
access control model with a Zanzibar-style authorization system.

Phase 1 delivers infrastructure with no behavioral changes:

- model.fga: Authorization model with 13 types covering all resource relationships
  (system, user, group, catalogue, data_catalogue, tool_catalogue, llm, datasource,
  tool, app, chat, plugin, plugin_resource, submission, filter)
- store.go: Embedded in-memory OpenFGA server with Check, ListObjects, WriteTuples
- sync.go: FullSync scans 12 GORM tables to populate all relationship tuples
- middleware.go: Gin middleware (RequireSystemAdmin, RequireCanUse, RequireRelation)
- 18 integration tests verifying every relationship path in 57ms

Addresses roadmap items from features/UserManagement.md:
- Fine-grained permissions within groups (read, write, admin)
- Group hierarchies for nested permissions
- Resource ownership models
- App sharing (features/Apps.md)
@probelabs
Copy link

probelabs bot commented Mar 8, 2026

This pull request introduces a proof-of-concept for a new fine-grained authorization system using an embedded OpenFGA engine. It replaces the current authorization mechanism, which relies on boolean flags like IsAdmin and SQL-based group filtering, with a modern, flexible Relationship-Based Access Control (ReBAC) model.

The system is designed to be decoupled and incrementally adoptable. An Authorizer interface abstracts the underlying implementation, and a NoopAuthorizer is used when the feature is disabled via the OPENFGA_ENABLED environment variable, ensuring no impact on existing functionality.

Key components include:

  • An in-process OpenFGA server with an in-memory store, avoiding external service dependencies.
  • A declarative authorization model (model.fga) that defines all relationships and permissions for resources like users, groups, catalogues, and apps.
  • A scalable FullSync process that populates the authorization graph from 12 GORM tables on application startup.
  • A Syncer component with methods to incrementally update permissions in real-time as database mutations occur.
  • A "shadow mode" middleware that allows running the new authorization checks alongside the legacy system to log and compare results without blocking requests, facilitating a safe migration.
  • A suite of new Gin middleware for enforcing permissions (e.g., RequireRelation).

Files Changed Analysis

The changes are substantial, with over 2,800 lines of new code, but are well-encapsulated within a new authz/ package. This indicates a cohesive, modular design.

  • authz/: This new top-level package contains the core, implementation-agnostic logic. It defines the Authorizer interface (authz.go), Gin middleware for enforcement (middleware.go), the Syncer for real-time updates (syncer.go), and the shadow mode logic (shadow.go).
  • authz/openfga/: This sub-package holds the concrete OpenFGA implementation. It includes the store logic (store.go), the startup database synchronization (sync.go), the authorization model itself (model.fga), and configuration (config.go).
  • *_test.go: A significant portion of the new code consists of a comprehensive test suite covering the authorization model, store logic, middleware, and the syncer.
  • services/: Modifications to user_service.go and group_service.go demonstrate how the AuthzSyncer is integrated to keep permissions up-to-date after database writes.
  • go.mod & go.sum: These files were modified to include the openfga/openfga dependency and its transitive dependencies.

Architecture & Impact Assessment

  • What this PR accomplishes
    This PR establishes the complete architectural foundation for migrating AI Studio to a scalable, fine-grained authorization system. It provides all the necessary components—the engine, the model, data synchronization, and an incremental adoption path—to move away from a rigid, hardcoded permission model.

  • Key technical changes introduced

    1. Embedded Authorization Engine: Integrates OpenFGA as an in-process library, providing low-latency authorization checks without the operational overhead of a separate microservice.
    2. Declarative Authorization Model: Access control logic is externalized into a human-readable .fga file, making it easier to understand, audit, and modify permissions without changing application code.
    3. Hybrid Data Synchronization: Implements a two-part sync strategy: a FullSync at startup for baseline consistency and an event-driven Syncer for real-time updates to prevent permission staleness.
    4. Phased Migration Tooling: The feature toggle and shadow mode are critical architectural components that de-risk the migration by allowing for validation in a production environment before full cutover.
  • Affected system components

    • Authorization: A new, parallel authorization system is introduced, which is disabled by default.
    • Application Startup: The FullSync process adds a new step to the application's startup sequence. It will issue a significant number of read queries across 12 tables, increasing the initial load on the database.
    • Service Layer: Business logic that modifies permissions (e.g., adding a user to a group) is now responsible for calling the AuthzSyncer to keep the authorization store consistent.

Component Interaction Diagram

graph TD
    subgraph "Startup"
        DB[(GORM DB)] --|Reads 12 tables in batches|--> FullSync("authz/openfga.FullSync")
        FullSync --|Populates graph|--> FGAStore{Embedded OpenFGA Store}
    end

    subgraph "Runtime"
        ServiceLayer[Service Layer] --|e.g., user added to group|--> DBWrite("1. DB Write")
        ServiceLayer --|Calls syncer method|--> Syncer("2. authz.Syncer")
        Syncer --|Grants/revokes relationship|--> FGAStore
    end

    subgraph "API Request"
        Request[HTTP Request] --> Middleware("authz.Middleware")
        Middleware --|"Check(user, can_use, llm:5)"|--> Authorizer("authz.Authorizer")
        Authorizer --|Evaluates against model & data|--> FGAStore
        FGAStore --|Allowed/Denied|--> Authorizer
        Authorizer --> Middleware
        Middleware --|"Abort or c.Next()"|--> APIHandler
    end
Loading

Scope Discovery & Context Expansion

  • Broader Scope: This change is foundational and will eventually touch every part of the application that has any concept of permissions. The authz/syncer.go file serves as a clear blueprint of all permission-altering business logic that will need to be instrumented (e.g., user creation, group management, app sharing).

  • Context Expansion & Next Steps: The immediate next steps after merging this foundational work would be:

    1. Integrate the Syncer: A codebase-wide effort is needed to identify all locations where database mutations matching the Syncer methods occur and add the appropriate calls.
    2. Deploy in Shadow Mode: The shadow.go middleware should be applied to existing API endpoints. This will allow the team to monitor logs for discrepancies between the legacy and new authorization systems, providing confidence in the model's correctness before making it active.
    3. Begin Enforcement: Once shadow mode shows consistent results, a phased rollout can begin by replacing legacy permission checks with the new RequireRelation middleware on a per-endpoint basis.
Metadata
  • Review Effort: 5 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-03-08T13:34:56.198Z | Triggered by: pr_updated | Commit: 68ded61

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link

probelabs bot commented Mar 8, 2026

\n\n

Architecture Issues (5)

Severity Location Issue
🔴 Critical services/user_service.go:292
The call to `OnGroupMembersReplaced` incorrectly uses `user.ID` as the `groupID` parameter. The purpose of `OnGroupMembersReplaced` is to sync the list of users within a specific group. Passing a user ID will result in writing incorrect authorization relationships, attempting to add members to a 'group' that is actually a user.
💡 SuggestionThe logic for syncing a user's group memberships upon update is flawed. Instead of calling `OnGroupMembersReplaced`, you should calculate the groups added and removed for the user and then call `OnUserAddedToGroup` and `OnUserRemovedFromGroup` for each respective group change. The call to `OnGroupMembersReplaced` should be removed from the `UpdateUser` function.
🟠 Error services/user_service.go:157
The `AuthzSyncer` is consistently called with `context.Background()` instead of the request context. This is an anti-pattern that breaks context propagation, losing critical information like request deadlines, cancellation signals, and distributed tracing spans. The `Authorizer` interface methods are designed to accept a context, and this should be passed down from the service entry point.
💡 SuggestionRefactor the service methods (e.g., `CreateUser`, `UpdateUser`) to accept a `context.Context` parameter. This context should then be passed down to all subsequent calls, including the `AuthzSyncer` methods. This change needs to be applied to all calls to the syncer across `services/user_service.go` and `services/group_service.go`.
🟠 Error authz/syncer.go:249-251
The syncer's internal helper methods (`grant`, `revoke`, `grantAndRevoke`) log errors from the authorizer but do not return them. This causes failures in updating authorization data to fail silently from the perspective of the calling service. This can lead to a state of inconsistency between the application's database and the authorization store, potentially causing incorrect permissions to be enforced.
💡 SuggestionThe helper methods should return the error from the `authz.Grant/Revoke` calls. The public `Syncer` methods should, in turn, return these errors to the service layer. The service layer must then handle these errors appropriately, for instance, by failing the entire operation and returning an HTTP 500 error to prevent data inconsistency.
🟡 Warning authz/openfga/sync.go:134
The `syncGroupMemberRels` and generic `syncJoinTable` functions use `OFFSET`-based pagination to read from join tables during the full sync. For large tables, this method of pagination is inefficient and can lead to progressively slower queries and long application startup times, as the database has to scan and discard a growing number of rows for each subsequent page.
💡 SuggestionRefactor the sync logic for join tables to use keyset pagination instead of offset. If the join tables have a unique, ordered primary key, you can paginate using a `WHERE id > ?` clause. If not, consider an alternative strategy, such as iterating through the primary resource (e.g., all groups) and fetching their associated relations (e.g., users) in batches.
🟡 Warning authz/openfga/store.go:142
The `ListResources` and `ListResourcesByName` functions rely on the underlying OpenFGA `ListObjects` API, which is not paginated and has a server-side limit on the number of results returned (typically 1000). This presents a scalability risk for any feature that needs to list all resources a user can access if that number can exceed the limit. The result set will be incomplete without any indication of that to the caller.
💡 SuggestionWhile this is a limitation of the underlying dependency, it's an important architectural constraint to acknowledge. Add documentation to the `ListResources` and `ListResourcesByName` interface methods in `authz/authz.go` to explicitly state that the results are bounded and the API is not paginated. This will ensure that developers using the interface are aware of this limitation and can design features to avoid unbounded listing.

Performance Issues (2)

Severity Location Issue
🟠 Error authz/openfga/store.go:127-141
The `ListResourcesByName` function, which is used to list resources a user has access to, does not handle pagination. It makes a single call to the OpenFGA `ListObjects` API and returns only the first page of results. The OpenFGA backend has a default page size limit (e.g., 1000). This will lead to incorrect, truncated results for any user with access to more resources than the page size, causing critical bugs in features that rely on listing all accessible items.
💡 SuggestionModify `ListResourcesByName` to handle pagination. This involves calling the `ListObjects` API in a loop, passing the `continuation_token` from each response to the next request, and accumulating the results until an empty token is returned. Be mindful of potential memory usage if a user can access a very large number of resources.
🟡 Warning authz/openfga/sync.go:170-198
The generic `syncJoinTable` function, and `syncGroupMemberRels` (line 135), use offset-based pagination to read from join tables during the initial data sync. This method becomes progressively slower on large tables as the offset increases, because the database must scan and discard all rows up to the offset. This can significantly slow down application startup time.
💡 SuggestionReplace offset-based pagination with keyset pagination (the "seek method"). For join tables, which often have composite primary keys, this can be implemented by ordering by the primary key columns and using the last seen values in the `WHERE` clause (e.g., `WHERE (col_a, col_b) > (?, ?) ORDER BY col_a, col_b`). This requires a suitable index on the columns being used for ordering.

Quality Issues (2)

Severity Location Issue
🟠 Error services/user_service.go:158-161
Throughout the service layer integrations in `user_service.go` and `group_service.go`, calls to the `AuthzSyncer` are made using `context.Background()`. This breaks the chain of context propagation from the incoming HTTP request. As a result, request-scoped information like trace IDs, deadlines, and cancellation signals are lost. This negatively impacts observability, debugging, and the system's ability to gracefully handle request cancellations or timeouts.
💡 SuggestionModify the service methods in `user_service.go` and `group_service.go` to accept a `context.Context` as their first argument. This context should be passed down from the API handlers (e.g., `c.Request.Context()`) and then used in all subsequent calls to the `AuthzSyncer`. For example, change `func (s *Service) CreateUser(dto UserDTO) (*models.User, error)` to `func (s *Service) CreateUser(ctx context.Context, dto UserDTO) (*models.User, error)` and call the syncer with `s.AuthzSyncer.OnUserCreated(ctx, ...)`.
🟡 Warning authz/openfga/sync.go:136-173
The full database sync for several tables (e.g., `user_groups`, `group_catalogues`) uses offset-based pagination (`Offset(offset).Limit(syncBatchSize)`). This method can lead to poor performance and high database load during application startup, as the database has to scan and discard `offset` rows for each batch. For large tables, this will significantly increase startup time.
💡 SuggestionRefactor the sync functions for join tables (`syncGroupMemberRels` and the generic `syncJoinTable`) to use keyset pagination instead of offset-based pagination. For tables with a composite primary key, you can order by both columns (e.g., `ORDER BY group_id, user_id`) and use the last seen pair in the `WHERE` clause of the next query to fetch the next page efficiently.

Powered by Visor from Probelabs

Last updated: 2026-03-08T13:34:20.116Z | Triggered by: pr_updated | Commit: 68ded61

💡 TIP: You can chat with Visor using /visor ask <your question>

…tecture

- Add OPENFGA_ENABLED env var toggle (disabled by default, legacy auth used)
- Add NoopAuthorizer that always allows when OpenFGA is disabled
- Add NewFromEnv() factory that creates Store or NoopAuthorizer based on config
- Add ListObjectsStr() for types with non-numeric composite IDs (plugin_resource)
- Change ListObjects() to return error (not silent skip) on non-numeric IDs
- Add shadow mode middleware (ShadowCheckAdmin, ShadowCheckResource,
  ShadowCheckOwnership) that logs discrepancies without blocking requests
- Add Enabled() to Authorizer interface for runtime feature detection
- 27 tests passing (was 18)
- Rename Tuple → Relationship (Subject/Relation/Resource fields)
- Rename WriteTuples/DeleteTuples → Grant/Revoke/GrantAndRevoke
- Rename ListObjects → ListResources, ListObjectsStr → ListResourcesByName
- Rename CheckStr → CheckByName, ResourceStr → ResourceByName
- Rename helper functions: UserStr → SubjectUser, GroupStr → SubjectGroup,
  ObjectStr → ResourceID, ParseObjectID → ParseResourceNumericID,
  ParseObjectStr → ParseResourceID
- Add ResourceByName() with colon validation to prevent injection
- Add validateID() security check in plugin resource sync
- Use first-colon delimiter in ParseResourceID (not last) for safety
- Remove OpenFGA references from comments, logs, and variable names
- Add detailed comments to model.fga for every type and relation
- Middleware uses domain-neutral resourceType parameter naming
- Shadow mode logs use "authz" instead of "openfga" in field names
Split the authorization package into abstract interface and implementation:

- authz/ — Authorizer interface, Relationship type, helpers, NoopAuthorizer,
  middleware, shadow mode. No OpenFGA imports.
- authz/openfga/ — Store implementation, model.fga, sync, config (NewFromEnv).
  All OpenFGA-specific code isolated here.

This allows swapping the authorization backend without touching the interface
or middleware. Consumers import authz for the interface and authz/openfga
only at the composition root.
Syncer translates service-layer mutations into Grant/Revoke calls,
keeping the authorization store in sync with the database after startup.

Methods cover all mutation points:
- User lifecycle: OnUserCreated, OnUserUpdated, OnUserDeleted
- Group membership: OnUserAddedToGroup, OnUserRemovedFromGroup, OnGroupMembersReplaced
- Catalogue assignments: OnCatalogueAssignedToGroup, OnCatalogueRemovedFromGroup
- Resource-catalogue: OnResourceAddedToCatalogue, OnResourceRemovedFromCatalogue
- Ownership: OnOwnershipSet, OnOwnershipChanged
- App sharing: OnAppShared, OnAppUnshared
- Chat groups: OnChatGroupAssigned, OnChatGroupRemoved
- Submissions: OnSubmissionCreated, OnReviewerAssigned
- Plugins: OnPluginInstalled, OnPluginResourceGroupAssigned/Removed

All methods are no-ops when the authorizer is disabled.
Remove implementation-specific details from the abstract authz package:
- Package doc no longer references OpenFGA or OPENFGA_ENABLED
- Remove SubjectGroupMembers (#member is OpenFGA userset syntax)
- Decouple middleware and shadow mode from models.User via UserIDFromContext
- Remove domain-specific middleware (RequireSystemAdmin, RequireSSOAdmin,
  RequireCanUse, RequireCanAdmin) — keep only generic RequireRelation
- Parameterize all shadow check functions instead of hardcoding relations
FullSync:
- Stream database rows in batches (1000 rows) using keyset pagination
- Grant relationships incrementally per batch instead of accumulating all
- Consolidate 3 user queries in collectSystemRels into a single query
- Extract reusable syncJoinTable/syncOwnerTable helpers to reduce duplication
- Peak memory usage is now constant regardless of database size

ListResources pagination:
- Add ListResourcesPage and ListResourcesByNamePage to Authorizer interface
- OpenFGA ListObjects has no native cursor pagination, so implementation
  fetches all results, sorts for stable ordering, and applies client-side
  windowing with offset-based tokens
- Results from ListResources/ListResourcesByName remain bounded by the
  server's configured OPENFGA_LIST_OBJECTS_MAX_RESULTS (default 1000)
- Add 16 missing Syncer tests: OnUserRemovedFromGroup, OnCatalogueRemovedFromGroup,
  OnResourceAddedToCatalogue, OnResourceRemovedFromCatalogue, OnOwnershipSet,
  OnAppShared, OnAppUnshared, OnChatGroupAssigned, OnChatGroupRemoved,
  OnSubmissionCreated, OnReviewerAssigned, OnPluginInstalled,
  OnPluginResourceGroupAssigned, OnPluginResourceGroupRemoved,
  OnUserUpdated SSO promotion/demotion
- Add 7 middleware tests: RequireRelation allowed/denied/no-user/invalid-id/error,
  RequireRelationByName allowed/denied
- Add 6 shadow mode tests: ShadowCheck never-blocks/disabled/no-user,
  ShadowCheckResource never-blocks, ShadowCheckOwnership never-blocks/stores-result
- Add noop pagination test

All 21 Syncer methods now have test coverage (75 total tests).
Integration:
- Add Authz and AuthzSyncer fields to services.Service
- Add SetAuthorizer() method (follows existing setter pattern)
- Close authorization backend in Service.Cleanup()

User service hooks:
- CreateUser: sync system membership, admin/SSO roles, group memberships
- UpdateUser: detect admin/SSO promotion/demotion, sync group changes
- DeleteUser: revoke all system-level roles
- UpdateGroupUsers: sync group membership replacements

Group service hooks:
- AddUserToGroup / RemoveUserFromGroup
- AddCatalogue/DataCatalogue/ToolCatalogue to group (and remove)

Also:
- Add SubjectGroupMembers to openfga package (OpenFGA-specific #member
  userset syntax needed for group-based app sharing)
- Sort results in ListResourcesByNamePage for deterministic pagination
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants