Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata form 7: Access control and deletion behaviour #1540

Merged
merged 29 commits into from
Sep 30, 2024

Conversation

SofiaSazonova
Copy link
Contributor

@SofiaSazonova SofiaSazonova commented Sep 12, 2024

Feature or Bugfix

  • Feature

Detail

  • When entity is deleted all attached MF are deleted
  • When Org or Env is deleted MF with visibility in this org|env is deleted. And all connected attached MFs
  • Resource policies: CREATE_METADATA_FORM (create MF with visibility inside this entity), ATTACH_METADATA_FORM
  • Migrations for entity owners to have all permissions
  • Triggers to delete dependencies for MF deletion and entity deletion

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@SofiaSazonova SofiaSazonova linked an issue Sep 12, 2024 that may be closed by this pull request
@SofiaSazonova SofiaSazonova marked this pull request as ready for review September 12, 2024 13:02
Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did initial review before testing functionality - requesting some changes

@noah-paige
Copy link
Contributor

With the new way of handling deletes of metadata forms with deletes of resources a couple things come to mind:

  • (1) We still have a dependency between MF module and other resources (i.e. metadata form tables must exist for org, env, dataset deletes to work) but since we always create all tables in DB (and I do not think we plan to change this behavior in the foreseeable future) I think this is okay

  • (2) Right now when we create a new form or attach a new form we do not create any permissions for the user/group on the form. I

    • Instead we protect backend resolvers for metadata forms by either the MetadataFormAccessService.can_perform() method (which checks if owner or not of the visibility entity) OR checking if has group has ATTACH_METADATA_FORM or CREATE_METADATA_FORM on the resource itself
    • I see the addition of some new permissions like UPDATE_METADATA_FORM_FIELD, EDIT_METADATA_FORM, etc. that are not currently used but if we do plan on assigning these permissions on the form moving forward this would have to also be cleaned up in DB trigger (just calling out the complexity of these triggers could expand)

What if we had a generic EventSystem class (in core or base?):

class DataallEventSystem:
    def __init__(self):
        self._subscribers = {}

    def subscribe(self, event: str, callback: Callable[..., None]) -> None:
        if event not in self._subscribers:
            self._subscribers[event] = []
        self._subscribers[event].append(callback)

    def publish(self, event: str, *args: Any, **kwargs: Any) -> None:
        if event in self._subscribers:
            for callback in self._subscribers[event]:
                callback(*args, **kwargs)

And then each MF Form resource could publish on delete:

class EnviornmentService:
    def __init__(self, event_system: DataallEventSystem):
        self.event_system = event_system
    
    ...

    def delete_environment(self, uri: str) -> None:
        self.event_system.publish("environment_deleted", uri)

And then MetadataFromService and AttachedMetadataFormService could import event system and subscribe:

class MetadataFormService:
    @classmethod
    def subscribe(cls, event_system: DataallEventSystem) -> None:
        event_system.subscribe("environment_deleted", cls.on_delete_env)
        event_system.subscribe("organization_deleted", cls.on_delete_org)
        event_system.subscribe("dataset_deleted", cls.on_delete_dataset)

    @classmethod
    def add_dependency(self, cls):
        self.dependencies.add(cls)

    def on_delete_env(self, env_uri):
        # Delete MF Forms and Attached Forms for env_uri
    ...

And lastly __init__.py for MF Module Interface would call:

MetadataFormService.subscribe(DataallEventSystem)
AttachedMetadataFormService(DataallEventSystem)

Not sure if this pub/sub approach is best but this way we can have multiple publishers and multiple subscribers (similar to EnvironmentResourceManager but not dedicated to just Environments and more flexible)

Just thinking out loud here - not sure if this is the right approach for the ask
cc @SofiaSazonova @dlpzx @petrkalos

Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments - after some minor fixes testing and functionality look good

Left a big comment on design of the delete triggers but on second thought it may not be worth the effort to implement it the way I propose unless we really see a need for future work on MF forms or elsewhere

Triggers seem easy enough to create / delete so easily reversible too if needed

@dlpzx
Copy link
Contributor

dlpzx commented Sep 16, 2024

With the new way of handling deletes of metadata forms with deletes of resources a couple things come to mind:

* (1) We still have a dependency between MF module and other resources (i.e. metadata form tables must exist for org, env, dataset deletes to work) but since we always create all tables in DB (and I do not think we plan to change this behavior in the foreseeable future) I think this is okay

* (2) Right now when we create a new form or attach a new form we do not create any permissions for the user/group on the form. I
  
  * Instead we protect backend resolvers for metadata forms by either the `MetadataFormAccessService.can_perform()` method (which checks if owner or not of the visibility entity) OR checking if has group has ATTACH_METADATA_FORM or CREATE_METADATA_FORM on the resource itself
  * I see the addition of some new permissions like UPDATE_METADATA_FORM_FIELD, EDIT_METADATA_FORM, etc. that are not currently used but if we do plan on assigning these permissions on the form moving forward this would have to also be cleaned up in DB trigger (just calling out the complexity of these triggers could expand)

What if we had a generic EventSystem class (in core or base?):

class DataallEventSystem:
    def __init__(self):
        self._subscribers = {}

    def subscribe(self, event: str, callback: Callable[..., None]) -> None:
        if event not in self._subscribers:
            self._subscribers[event] = []
        self._subscribers[event].append(callback)

    def publish(self, event: str, *args: Any, **kwargs: Any) -> None:
        if event in self._subscribers:
            for callback in self._subscribers[event]:
                callback(*args, **kwargs)

And then each MF Form resource could publish on delete:

class EnviornmentService:
    def __init__(self, event_system: DataallEventSystem):
        self.event_system = event_system
    
    ...

    def delete_environment(self, uri: str) -> None:
        self.event_system.publish("environment_deleted", uri)

And then MetadataFromService and AttachedMetadataFormService could import event system and subscribe:

class MetadataFormService:
    @classmethod
    def subscribe(cls, event_system: DataallEventSystem) -> None:
        event_system.subscribe("environment_deleted", cls.on_delete_env)
        event_system.subscribe("organization_deleted", cls.on_delete_org)
        event_system.subscribe("dataset_deleted", cls.on_delete_dataset)

    @classmethod
    def add_dependency(self, cls):
        self.dependencies.add(cls)

    def on_delete_env(self, env_uri):
        # Delete MF Forms and Attached Forms for env_uri
    ...

And lastly __init__.py for MF Module Interface would call:

MetadataFormService.subscribe(DataallEventSystem)
AttachedMetadataFormService(DataallEventSystem)

Not sure if this pub/sub approach is best but this way we can have multiple publishers and multiple subscribers (similar to EnvironmentResourceManager but not dedicated to just Environments and more flexible)

Just thinking out loud here - not sure if this is the right approach for the ask cc @SofiaSazonova @dlpzx @petrkalos
Answering to the downsides of using database triggers:
(1) I do not think it will be a problem for our own development.
(2) If triggers go further than deletes on a dependent table - if a delete implements business logic outside of database rules (foreign keys) I think we should avoid them. The main reason is that they can be difficult to debug.
(3) Additional concern - As a developer I might find it difficult to understand how metadata forms are deleted, it would not occur to me to look into each migration script for business logic code. Could we move the specific trigger definition to the metadata forms module? In addition, for the case we need to update a trigger, if we have the trigger code in the module we could track the history more easily (using git) than searching inside the migration scripts.

Finally, regarding @noah-paige pub/sub model vs current db triggers; db triggers are good enough for MF now. We should revisit the decision:

  • if MF needs to be applied to new/unexpected resources (e.g. customer external modules)
  • if the db triggers need to implement a complex logic or need to be updated often
    In those cases I think the pub/sub model is a great candidate, it would be great if we could substitute other interfaces we are currently using for module communication (Environments - Resources, Datasets - Shares) and have a generic way to handle intermodule event dependencies

@SofiaSazonova
Copy link
Contributor Author

@noah-paige @dlpzx
I like the idea of pub/sub model. We can revisit EnvironmentResources as well to unify the idea of intermodule communications.
But I don't think, we should include it into MF Prs. It's quite a big work on it's own, that requires a lot of tests as well.
I tend to keep triggers for now and then revisit it with new mechanism when it's ready.

@noah-paige
Copy link
Contributor

One additional bug found during testing (thanks to @rbernotas) - when a user a part of the admin group tries to view a Metadata Tab for a resource, for instance a data.all Organization, the use receives an error 'NoneType' object is not iterable back from listEntityMetadataForm

  • This error stems from L144 at dataall/backend/dataall/modules/metadata_forms/db/metadata_form_repository.py (i.e. orgs = list(set(user_org_uris).intersection(set(entity_orgs_uris))))
  • It is because user_org_uris and user_env_uris are set to None by get_user_admin_status_orgs_and_envs_() for admin users
  • I think we need to do something similar to what we did for entity_org_uris and entity_envs_uris and set the following:
        user_orgs_uris =  user_orgs_uris or []
         user_envs_uris = user_envs_uris or []

in query_entity_metadata_forms

@noah-paige
Copy link
Contributor

Thanks for the latest changes @SofiaSazonova. 3 pending items I am seeing that I think need to be resolved for this PR:

  • (1) Downgrade Migration Script introduces errors (link)
  • (2) DA Admins can not attach Org or Env visibility filters to an Entity (link)
  • (3) Too many Forms showing as available when listing for an entity because of query filters (link)

++ 1 nit comment on triggers migration (link)

Afterwards I can re-test and think should be good for this PR

Additionally - there are some additional enhancements or fixes that we may want to pick up for MF Forms. I will add to the open issue #1553

@SofiaSazonova
Copy link
Contributor Author

@noah-paige

  1. Downgrades: I added fixes for this migration and redshift one (discussed it with @dlpzx )
  2. I rolled back the queries as they were a couple iterations ago. May be it's not short, but it works as it should. It's also not bulky. That solves both problems with visibility, you mentioned
  3. About MF permissions: I just want to have all triggers in sam place, so I left it there.

@noah-paige
Copy link
Contributor

Thanks for the latest changes @SofiaSazonova. 3 pending items I am seeing that I think need to be resolved for this PR:

* (1) Downgrade Migration Script introduces errors ([link](https://github.com/data-dot-all/dataall/pull/1540#discussion_r1772025624))

* (2) DA Admins can not attach Org or Env visibility filters to an Entity ([link](https://github.com/data-dot-all/dataall/pull/1540#discussion_r1772011954))

* (3) Too many Forms showing as available when listing for an entity because of query filters ([link](https://github.com/data-dot-all/dataall/pull/1540/files/f63d539a44716527fbaf533865a970402475e2cc..7a04e7d3572af0d5e4a493b63e0f094cf65754de#r1759456519))

++ 1 nit comment on triggers migration (link)

Afterwards I can re-test and think should be good for this PR

Additionally - there are some additional enhancements or fixes that we may want to pick up for MF Forms. I will add to the open issue #1553

@SofiaSazonova - latest testing

  • (1) Bug in Downgrade (link)
  • (2) Fixed
  • (3) Fixed
  • ++ sounds good on keeping the trigger in migration

@SofiaSazonova
Copy link
Contributor Author

@noah-paige I hope, this is the last one and I haven't missed smth now.

Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes look good - tested the migration script upgrade/downgrade and UI functionality on AWS Deployment

thanks @SofiaSazonova

@SofiaSazonova SofiaSazonova merged commit 1e2c388 into data-dot-all:main Sep 30, 2024
9 checks passed
SofiaSazonova pushed a commit that referenced this pull request Oct 1, 2024
…1601)

### Feature or Bugfix
- Bugfix

### Detail
In `backend/migrations/versions/075d344ae2cc_mf_triggers.py` the
database routines do not specify the database schema. As a result, when
the routines are executed they fail to find the table.

In this PR we read the schema name that matches the environment variable
ENVNAME when defining the database triggers.

### Relates
- #1540

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@SofiaSazonova SofiaSazonova deleted the metadata-form-7 branch October 3, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata Forms: access control and deletion behaviour
3 participants