Skip to content

Commit

Permalink
Remove policies-updater ECS task (#1046)
Browse files Browse the repository at this point in the history
### Feature or Bugfix
- Bugfix

### Detail
In data.all there are 7 ECS Tasks:
Tasks currently being used:
- cdkproxy -- on demand - it deploys CDK stacks in Environment accounts
- share-manager -- on demand - it executes sharing actions
- stacks-updater -- on schedule every 1day - Every day environment and
dataset stacks are updated

Tasks that need to be reviewed:
- subscriptions -- on schedule every 15mins - it tries to poll message
from subscriptions queue. The queue is empty and we are not posting any
messages. We could consider subscriptions to be legacy at the moment.
- catalog-indexer -- on schedule every 6hours - it reads all active
items from RDS and indexes them in the Catalog. It does not look for
deleted items.
- tables-syncer -- on schedule every 15mins - it reads all active
datasets. With boto3 it reads the Glue tables in that database and syncs
the Glue tables with the registered tables in data.all. It upserts in
OpenSearch and grant LF permissions.

Tasks that currently are not used and need to be removed:
- policies-updater -- on schedule every 15mins - it reapplies shares on
imported buckets. It is legacy from folder sharing based on bucket
policies. It uses the generic ecs-tasks-role

In this PR the task policies-updater is removed

Tested in AWS
- [X] CICD pipeline succeeds
- [X] ECS CFN stack is updated and deleted policies-updater task. It
also deletes log-group. All other tasks remain untouched

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
  • Loading branch information
dlpzx authored Feb 8, 2024
1 parent e1e9e08 commit 752e824
Show file tree
Hide file tree
Showing 3 changed files with 0 additions and 306 deletions.
172 changes: 0 additions & 172 deletions backend/dataall/modules/datasets/tasks/bucket_policy_updater.py

This file was deleted.

23 changes: 0 additions & 23 deletions deploy/stacks/container.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,6 @@ def __init__(

self.add_catalog_indexer_task()
self.add_sync_dataset_table_task()
self.add_bucket_policy_updater_task()
self.add_subscription_task()
self.add_share_management_task()

Expand Down Expand Up @@ -302,28 +301,6 @@ def add_subscription_task(self):
)
self.ecs_task_definitions_families.append(subscriptions_task.task_definition.family)

@run_if(["modules.datasets.active"])
def add_bucket_policy_updater_task(self):
update_bucket_policies_task, update_bucket_task_def = self.set_scheduled_task(
cluster=self.ecs_cluster,
command=['python3.9', '-m', 'dataall.modules.datasets.tasks.bucket_policy_updater'],
container_id=f'container',
ecr_repository=self._ecr_repository,
environment=self._create_env('DEBUG'),
image_tag=self._cdkproxy_image_tag,
log_group=self.create_log_group(
self._envname, self._resource_prefix, log_group_name='policies-updater'
),
schedule_expression=Schedule.expression('rate(15 minutes)'),
scheduled_task_id=f'{self._resource_prefix}-{self._envname}-policies-updater-schedule',
task_id=f'{self._resource_prefix}-{self._envname}-policies-updater',
task_role=self.task_role,
vpc=self._vpc,
security_group=self.scheduled_tasks_sg,
prod_sizing=self._prod_sizing,
)
self.ecs_task_definitions_families.append(update_bucket_policies_task.task_definition.family)

@run_if(["modules.datasets.active"])
def add_sync_dataset_table_task(self):
sync_tables_task, sync_tables_task_def = self.set_scheduled_task(
Expand Down
111 changes: 0 additions & 111 deletions tests/modules/datasets/tasks/test_dataset_policies.py

This file was deleted.

0 comments on commit 752e824

Please sign in to comment.