Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on log retention and allowed origins, fix cdkproxy docs #1554

Merged
merged 2 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 4 additions & 13 deletions pages/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,21 +171,12 @@ The data.all `base.api` package contains the `gql` sub-package to support GraphQ

#### cdkproxy
This package contains the code associated with the deployment of CDK stacks that correspond to data.all resources.
`cdkproxy` is a package that exposes a REST API to run registered cloudformation stacks using AWS CDK. It is deployed as a docker container running on AWS ECS.
`cdkproxy` is a package that runs registered cloudformation stacks using AWS CDK. It is bundled as a docker image and run as a AWS ECS task which is triggered on infastrcutre as code (IaC) operations on data.all (e.g. CRUD of data.all resources).

When a data.all resource is created, the API sends an HTTP request
to the docker service and the code runs the appropriate stack using `cdk` cli.
When an API request is made to create a data.all resource, such as a new dataset, the data.all backend sends a new message to an SQS Queue to asynchronously be read off the queue and start a new cdkproxy ECS task.
The code uses a `cdk` cli wrapper to register infrastructure and manage cdk commands, and runs the appropriate stack using `cdk` cli to deploy the IaC of the respective data.all resource.

These stacks are deployed with the `cdk` cli wrapper
The API itself consists of 4 actions/paths:

- GET / : checks if the server is running
- POST /stack/{stackid} : creates or updates the stack
- DELETE /stack/{stackid} : deletes the stack
- GET /stack/{stackid] : returns stack status

The webserver is running on docker, using Python's [FASTAPI](https://fastapi.tiangolo.com/)
web framework and running using [uvicorn](https://www.uvicorn.org/) ASGI server.
For local data.all deployments, a webserver runs on docker using Python's [FASTAPI](https://fastapi.tiangolo.com/) web framework and [uvicorn](https://www.uvicorn.org/) ASGI server. Subsequnetly, data.all sends POST API Requests to the `cdkproxy` web server to start the data.all infrastructure task.

### core/ <a name="core"></a>
Core contains those functionalities that are indispensable to run data.all. Customization of the core should be limited
Expand Down
7 changes: 7 additions & 0 deletions pages/deploy/deploy_aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ of our repository. Open it, you should be seen something like:
"repository_source": "string_VERSION_CONTROL_SERVICE|(codecommit, codestar_connection) DEFAULT=codecommit",
"repo_string": "string_REPOSITORY_IN_GITHUB_OWNER/REPOSITORY|DEFAULT=awslabs/aws-dataall, REQUIRED if repository_source=codestar_connection",
"repo_connection_arn": "string_CODESTAR_SOURCE_CONNECTION_ARN_FOR_GITHUB_arn:aws:codestar-connections:region:account-id:connection/connection-id|DEFAULT=None, REQUIRED if repository_source=codestar_connection",
"log_retention_duration": "string_LOG_RETENTION_DURATION|DEFAULT=TWO_YEARS",
"DeploymentEnvironments": [
{
"envname": "string_ENVIRONMENT_NAME|REQUIRED",
Expand All @@ -193,6 +194,7 @@ of our repository. Open it, you should be seen something like:
"enable_cw_canaries": "boolean_SET_CLOUDWATCH_CANARIES_FOR_FRONTEND_TESTING|DEFAULT=false",
"shared_dashboards_sessions": "string_TYPE_SESSION_SHARED_DASHBOARDS|(reader, anonymous) DEFAULT=anonymous",
"enable_pivot_role_auto_create": "boolean_ENABLE_PIVOT_ROLE_AUTO_CREATE_IN_ENVIRONMENT|DEFAULT=false",
"allowed_origins": "string_TYPE_DOMAIN_ORIGIN|DEFAULT=*",
"enable_update_dataall_stacks_in_cicd_pipeline": "boolean_ENABLE_UPDATE_DATAALL_STACKS_IN_CICD_PIPELINE|DEFAULT=false",
"enable_opensearch_serverless": "boolean_USE_OPENSEARCH_SERVERLESS|DEFAULT=false",
"cognito_user_session_timeout_inmins": "integer_COGNITO_USER_SESSION_TIMEOUT_INMINS|DEFAULT=43200",
Expand Down Expand Up @@ -235,6 +237,8 @@ and find 2 examples of cdk.json files.
| source | Optional | The version control source for the repository. It can take 2 values 'codecommit' or 'codestar_connection'. (default: 'codecommit') |
| repo_string | Optional | The repository path as string. Required if source='codestar_connection' (default: 'awslabs/aws-dataall') |
| repo_connection_arn | Optional | The arn of the CodeStar connection connecting with the source repository. Required if source='codestar_connection'(default: None) |
| log_retention_duration | Optional | The CloudWatch log retention days for all data.all compute log groups (e.g. Lambda and ECS Tasks), VPC flow logs, and API Activity logs - this parameter is specified as a string value of one of the AWS CDK enum RetentionDays members (default: `TWO_YEARS`) |

| **Deployment environments Parameters** | **Optional/Required** | **Definition** |
| ---------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| envname | REQUIRED | The name of the deployment environment (e.g dev, qa, prod,...). It must be in lower case without any special character. |
Expand All @@ -258,6 +262,7 @@ and find 2 examples of cdk.json files.
| cognito_user_session_timeout_inmins | Optional | The number of minutes to set the refresh token validity time for user session's in Cognito before a user must re-login to the data.all UI (default: 43200 - i.e. 30 days) |
| reauth_config | Optional | A dictionary containing a list of API operations that require a user to re-authenticate before proceedind (`reauth_apis`) and a time to live (`ttl`) for how long a user's re-auth session is valid to perform re-auth APIs before having to re-authenticate again |
| custom_auth | Optional | A dictionary containing set of parameters to setup external IDP ( Authentication and Authorization) in data.all. Custom Auth Configuration : `provider`, `url`, `redirect_url`, `client_id`, `response_types`, `scopes`, `jwks_url`, `claims_mapping` (Nested dictionary containing configuration : `user_id`, `email`). All the configurations are required if setting data.all with an external OIDC supported IDP |
| allowed_origins | Optional | A string origin to be specified as the `Access-Control-Allow-Origin` response header when returning responses from bakend (default: `'*'`) |

**Example 1**: Basic deployment: this is an example of a minimum configured cdk.json file.

Expand Down Expand Up @@ -300,6 +305,7 @@ deploy to 2 deployments accounts.
"git_release": true,
"quality_gate": false,
"resource_prefix": "da",
"log_retention_duration": "SIX_YEARS",
"DeploymentEnvironments": [
{
"envname": "dev",
Expand Down Expand Up @@ -332,6 +338,7 @@ deploy to 2 deployments accounts.
"enable_update_dataall_stacks_in_cicd_pipeline": true,
"enable_opensearch_serverless": true,
"cognito_user_session_timeout_inmins": 240,
"allowed_origins": "https://example.com",
"reauth_config": {
"reauth_apis": ["CreateDataset", "ImportDataset", "deleteDataset"],
"ttl": 10
Expand Down