diff --git a/website/versioned_docs/version-v18.3.5/README.md b/website/versioned_docs/version-v18.3.5/README.md new file mode 100644 index 00000000000..e5dab16cdae --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/README.md @@ -0,0 +1,36 @@ +--- +id: cumulus-docs-readme +slug: / +title: Introduction +hide_title: false +--- + +This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns. + +This documentation includes both guidelines, examples, and source code docs. It is accessible at . + +--- + +## Navigating the Cumulus Docs + +### Get To Know Cumulus + +* Getting Started - [here](getting-started.md) - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment. +* General Cumulus Documentation - [here](README.md) <- you're here + +### Cumulus Reference Docs + +* Cumulus API Documentation - [here](https://nasa.github.io/cumulus-api) +* Cumulus Developer Documentation - [here](https://github.com/nasa/cumulus) - READMEs throughout the main repository. +* Data Cookbooks - [here](data-cookbooks/about-cookbooks.md) + +### Auxiliary Guides + +* Integrator Guide - [here](integrator-guide/about-int-guide.md) +* Operator Docs - [here](operator-docs/about-operator-docs.md) + +--- + +## Contributing + +Please refer to: for information. We thank you in advance. diff --git a/website/versioned_docs/version-v18.3.5/adding-a-task.md b/website/versioned_docs/version-v18.3.5/adding-a-task.md new file mode 100644 index 00000000000..b814bf7fc63 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/adding-a-task.md @@ -0,0 +1,19 @@ +--- +id: adding-a-task +title: Contributing a Task +hide_title: false +--- + +We're tracking reusable Cumulus tasks [in this list](tasks.md) and, if you've got one you'd like to share with others, you can add it! + +Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm. + +## The tasks.md file is generated in the build process + +The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder. + +:::caution + +Do not edit the docs/tasks.md file directly. + +::: diff --git a/website/versioned_docs/version-v18.3.5/api.md b/website/versioned_docs/version-v18.3.5/api.md new file mode 100644 index 00000000000..7706b2a6208 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/api.md @@ -0,0 +1,7 @@ +--- +id: api +title: Cumulus API +hide_title: false +--- + +Read the Cumulus API documentation at [https://nasa.github.io/cumulus-api](https://nasa.github.io/cumulus-api) diff --git a/website/versioned_docs/version-v18.3.5/architecture.md b/website/versioned_docs/version-v18.3.5/architecture.md new file mode 100644 index 00000000000..a1ab0b944ad --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/architecture.md @@ -0,0 +1,72 @@ +--- +id: architecture +title: Architecture +hide_title: false +--- + +## Architecture + +Below, find a diagram with the components that comprise an instance of Cumulus. + +![Architecture diagram of a Cumulus deployment](assets/cumulus-arch-diagram-2023.png) + +This diagram details all of the major architectural components of a Cumulus deployment. + +While the diagram can feel complex, it can easily be digested in several major components: + +### Data Distribution + +End Users can access data via Cumulus's `distribution` submodule, which includes ASF's [thin egress application](https://github.com/asfadmin/thin-egress-app), this provides authenticated data egress, temporary S3 links and other statistics features. + +#### Data search + +End user exposure of Cumulus's holdings is expected to be provided by an external service. + +For NASA use, this is assumed to be [CMR]() in this diagram. + +### Data ingest + +#### Workflows + +The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS [Step Function](https://aws.amazon.com/step-functions/) workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured [Cumulus message](./workflows/cumulus-task-message-flow), utilizing built-in processes to report status of granules, PDRs, executions, etc to the [Data Persistence](#data-persistence) components. + +Workflows can optionally report granule metadata to [CMR](), and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example. + +#### Data persistence + +Cumulus entity state data is stored in a [PostgreSQL](https://www.postgresql.org/) compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. + +#### Data discovery + +Discovering data for ingest is handled via workflow step components using Cumulus `provider` and `collection` configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more. + +#### Database + +Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. + +##### PostgreSQL Database Schema Diagram + +![ERD of the Cumulus Database](assets/db_schema/relationships.real.large.png) + +### Maintenance + +System maintenance personnel have access to manage ingest and various portions of Cumulus via an [AWS API gateway](), as well as the operator [dashboard](https://github.com/nasa/cumulus-dashboard). + +## Deployment Structure + +Cumulus is deployed via [Terraform](https://www.terraform.io/) and is organized internally into two separate top-level modules, as well as several external modules. + +### Cumulus + +The [Cumulus module](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus), which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the `Data Persistence` portion of this diagram. + +### Data persistence + +The [data persistence](https://github.com/nasa/cumulus/tree/master/tf-modules/data-persistence) module provides the `Data Persistence` portion of the diagram. + +### Other modules + +Other modules are provided as artifacts on the [release](https://github.com/nasa/cumulus/releases) page for use in users configuring their own deployment and contain extracted subcomponents of the [cumulus](#cumulus) module. For more on these components see the [components documentation](deployment/components). + +For more on the specific structure, examples of use and how to deploy and more, please see the [deployment](deployment) docs as well as the [cumulus-template-deploy](https://github.com/nasa/cumulus-template-deploy) repo +. diff --git a/website/versioned_docs/version-v18.3.5/assets/APIGateway-Delete-Stage.png b/website/versioned_docs/version-v18.3.5/assets/APIGateway-Delete-Stage.png new file mode 100644 index 00000000000..311f49694c7 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/APIGateway-Delete-Stage.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/AWS-Cross-account-log-delivery-and-metrics.png b/website/versioned_docs/version-v18.3.5/assets/AWS-Cross-account-log-delivery-and-metrics.png new file mode 100644 index 00000000000..b49096afb4d Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/AWS-Cross-account-log-delivery-and-metrics.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/CD_run_bulk_modal.png b/website/versioned_docs/version-v18.3.5/assets/CD_run_bulk_modal.png new file mode 100644 index 00000000000..0dbb9f8ecd9 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/CD_run_bulk_modal.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/KinesisLambdaTriggerConfiguration.png b/website/versioned_docs/version-v18.3.5/assets/KinesisLambdaTriggerConfiguration.png new file mode 100644 index 00000000000..addffacd25c Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/KinesisLambdaTriggerConfiguration.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/add_lifecycle_rule.png b/website/versioned_docs/version-v18.3.5/assets/add_lifecycle_rule.png new file mode 100644 index 00000000000..6ba762d3c91 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/add_lifecycle_rule.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/add_rule.png b/website/versioned_docs/version-v18.3.5/assets/add_rule.png new file mode 100644 index 00000000000..d32150e0018 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/add_rule.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/aws_bucket_console_example.png b/website/versioned_docs/version-v18.3.5/assets/aws_bucket_console_example.png new file mode 100644 index 00000000000..74d2bda6202 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/aws_bucket_console_example.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/browse_processing_1.png b/website/versioned_docs/version-v18.3.5/assets/browse_processing_1.png new file mode 100755 index 00000000000..ba82850ee7b Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/browse_processing_1.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/browse_processing_2.png b/website/versioned_docs/version-v18.3.5/assets/browse_processing_2.png new file mode 100755 index 00000000000..a98e7b6b812 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/browse_processing_2.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/browse_processing_3.png b/website/versioned_docs/version-v18.3.5/assets/browse_processing_3.png new file mode 100755 index 00000000000..80a84f2172b Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/browse_processing_3.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/browse_processing_4.png b/website/versioned_docs/version-v18.3.5/assets/browse_processing_4.png new file mode 100755 index 00000000000..160a635a109 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/browse_processing_4.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/browse_processing_5.png b/website/versioned_docs/version-v18.3.5/assets/browse_processing_5.png new file mode 100755 index 00000000000..abfe5ecec01 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/browse_processing_5.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/bulk-granules-modal.png b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-modal.png new file mode 100644 index 00000000000..3478396d9e5 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-modal.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/bulk-granules-query-1.png b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-query-1.png new file mode 100644 index 00000000000..97b30e09dd6 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-query-1.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/bulk-granules-query-2.png b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-query-2.png new file mode 100644 index 00000000000..1a859f4a40c Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-query-2.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/bulk-granules-submitted.png b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-submitted.png new file mode 100644 index 00000000000..b74c626172d Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/bulk-granules-submitted.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_collection.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_collection.png new file mode 100644 index 00000000000..e5bdcedbc93 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_collection.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_collection_filled.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_collection_filled.png new file mode 100644 index 00000000000..0f18e536638 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_collection_filled.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_collection_overview.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_collection_overview.png new file mode 100644 index 00000000000..62bfb638382 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_collection_overview.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_discover_rule_form.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_discover_rule_form.png new file mode 100644 index 00000000000..dce7912d9d8 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_discover_rule_form.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_provider_form.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_provider_form.png new file mode 100644 index 00000000000..b23ab45cce4 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_provider_form.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_rule.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule.png new file mode 100644 index 00000000000..95cc96be350 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_filled.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_filled.png new file mode 100644 index 00000000000..c7c137d6f5a Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_filled.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_form_blank.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_form_blank.png new file mode 100644 index 00000000000..53586800592 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_form_blank.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_overview.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_overview.png new file mode 100644 index 00000000000..0b78afe5e6e Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_rule_overview.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_add_s3_provider_form.png b/website/versioned_docs/version-v18.3.5/assets/cd_add_s3_provider_form.png new file mode 100644 index 00000000000..0894635897e Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_add_s3_provider_form.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_bulk_delete.png b/website/versioned_docs/version-v18.3.5/assets/cd_bulk_delete.png new file mode 100644 index 00000000000..1c5c813fb83 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_bulk_delete.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_collection.png b/website/versioned_docs/version-v18.3.5/assets/cd_collection.png new file mode 100644 index 00000000000..2d8980cc866 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_collection.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_collections_page.png b/website/versioned_docs/version-v18.3.5/assets/cd_collections_page.png new file mode 100644 index 00000000000..0426e01414e Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_collections_page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_execute_publish.png b/website/versioned_docs/version-v18.3.5/assets/cd_execute_publish.png new file mode 100644 index 00000000000..aa7b6b8882a Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_execute_publish.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_execute_updateconstraints.png b/website/versioned_docs/version-v18.3.5/assets/cd_execute_updateconstraints.png new file mode 100644 index 00000000000..fa9f06a50f7 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_execute_updateconstraints.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_executions_page.png b/website/versioned_docs/version-v18.3.5/assets/cd_executions_page.png new file mode 100644 index 00000000000..a9041913e6f Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_executions_page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_granules.png b/website/versioned_docs/version-v18.3.5/assets/cd_granules.png new file mode 100644 index 00000000000..3a446598185 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_granules.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_granules_page.png b/website/versioned_docs/version-v18.3.5/assets/cd_granules_page.png new file mode 100644 index 00000000000..e06b566670d Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_granules_page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_operations_page.png b/website/versioned_docs/version-v18.3.5/assets/cd_operations_page.png new file mode 100644 index 00000000000..a3a23821da5 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_operations_page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_provider_page.png b/website/versioned_docs/version-v18.3.5/assets/cd_provider_page.png new file mode 100644 index 00000000000..95c1ab833ab Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_provider_page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_reingest_bulk.png b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_bulk.png new file mode 100644 index 00000000000..75d3f001e42 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_bulk.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_reingest_collection.png b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_collection.png new file mode 100644 index 00000000000..3ee4991ee6f Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_collection.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_reingest_granule_modal.png b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_granule_modal.png new file mode 100644 index 00000000000..4c90432192c Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_granule_modal.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_reingest_modal_bulk.png b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_modal_bulk.png new file mode 100644 index 00000000000..a936abba630 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_reingest_modal_bulk.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_rules_page.png b/website/versioned_docs/version-v18.3.5/assets/cd_rules_page.png new file mode 100644 index 00000000000..68aac3ad42d Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_rules_page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_run_bulk_granules.png b/website/versioned_docs/version-v18.3.5/assets/cd_run_bulk_granules.png new file mode 100644 index 00000000000..1bba53801a9 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_run_bulk_granules.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_running_granules.png b/website/versioned_docs/version-v18.3.5/assets/cd_running_granules.png new file mode 100644 index 00000000000..77dcf4adbf4 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_running_granules.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_workflow_json.png b/website/versioned_docs/version-v18.3.5/assets/cd_workflow_json.png new file mode 100644 index 00000000000..3320bd21519 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_workflow_json.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cd_workflow_page.png b/website/versioned_docs/version-v18.3.5/assets/cd_workflow_page.png new file mode 100644 index 00000000000..d2f9d51a490 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cd_workflow_page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cloudwatch-retention.png b/website/versioned_docs/version-v18.3.5/assets/cloudwatch-retention.png new file mode 100644 index 00000000000..d4c13e621a0 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cloudwatch-retention.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cnm_create_kinesis_stream.jpg b/website/versioned_docs/version-v18.3.5/assets/cnm_create_kinesis_stream.jpg new file mode 100644 index 00000000000..6437fbcd444 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cnm_create_kinesis_stream.jpg differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cnm_create_kinesis_stream_V2.jpg b/website/versioned_docs/version-v18.3.5/assets/cnm_create_kinesis_stream_V2.jpg new file mode 100644 index 00000000000..03c6dfc551e Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cnm_create_kinesis_stream_V2.jpg differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cnm_success_example.png b/website/versioned_docs/version-v18.3.5/assets/cnm_success_example.png new file mode 100644 index 00000000000..74dfbc8c85f Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cnm_success_example.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/configure-release-branch-test.png b/website/versioned_docs/version-v18.3.5/assets/configure-release-branch-test.png new file mode 100644 index 00000000000..d85286d5710 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/configure-release-branch-test.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cumulus-arch-diagram-2023.png b/website/versioned_docs/version-v18.3.5/assets/cumulus-arch-diagram-2023.png new file mode 100644 index 00000000000..6927b87da56 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cumulus-arch-diagram-2023.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cumulus-arch-diagram.png b/website/versioned_docs/version-v18.3.5/assets/cumulus-arch-diagram.png new file mode 100644 index 00000000000..5a1aacbcdc5 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cumulus-arch-diagram.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cumulus-task-message-flow.png b/website/versioned_docs/version-v18.3.5/assets/cumulus-task-message-flow.png new file mode 100644 index 00000000000..4f939650b7f Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cumulus-task-message-flow.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/cumulus_configuration_and_message_schema_diagram.png b/website/versioned_docs/version-v18.3.5/assets/cumulus_configuration_and_message_schema_diagram.png new file mode 100644 index 00000000000..90d7f631e36 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/cumulus_configuration_and_message_schema_diagram.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/db_schema/relationships.real.compact.png b/website/versioned_docs/version-v18.3.5/assets/db_schema/relationships.real.compact.png new file mode 100644 index 00000000000..eace03582b0 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/db_schema/relationships.real.compact.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/db_schema/relationships.real.large.png b/website/versioned_docs/version-v18.3.5/assets/db_schema/relationships.real.large.png new file mode 100644 index 00000000000..820a7f8642f Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/db_schema/relationships.real.large.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/gibs_ingest_granules_workflow.png b/website/versioned_docs/version-v18.3.5/assets/gibs_ingest_granules_workflow.png new file mode 100644 index 00000000000..1ebbc6a357f Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/gibs_ingest_granules_workflow.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/granule_not_found_report.png b/website/versioned_docs/version-v18.3.5/assets/granule_not_found_report.png new file mode 100644 index 00000000000..9d01d907435 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/granule_not_found_report.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/hello_world_workflow.png b/website/versioned_docs/version-v18.3.5/assets/hello_world_workflow.png new file mode 100644 index 00000000000..4aa5a54d30c Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/hello_world_workflow.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/icon_note.svg b/website/versioned_docs/version-v18.3.5/assets/icon_note.svg new file mode 100644 index 00000000000..16593039812 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/assets/icon_note.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/website/versioned_docs/version-v18.3.5/assets/icon_warning.svg b/website/versioned_docs/version-v18.3.5/assets/icon_warning.svg new file mode 100644 index 00000000000..2ab53271b42 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/assets/icon_warning.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/website/versioned_docs/version-v18.3.5/assets/ingest_diagram.png b/website/versioned_docs/version-v18.3.5/assets/ingest_diagram.png new file mode 100644 index 00000000000..59fc5ffe112 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/ingest_diagram.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/ingest_diagram_gibs.png b/website/versioned_docs/version-v18.3.5/assets/ingest_diagram_gibs.png new file mode 100644 index 00000000000..01b02ebb224 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/ingest_diagram_gibs.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/interface_diagram.png b/website/versioned_docs/version-v18.3.5/assets/interface_diagram.png new file mode 100644 index 00000000000..133421f36da Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/interface_diagram.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/interfaces.svg b/website/versioned_docs/version-v18.3.5/assets/interfaces.svg new file mode 100644 index 00000000000..1c726762efb --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/assets/interfaces.svg @@ -0,0 +1,3 @@ + + +
SNS message
SNS message
Kinesis stream
Kinesis stream
SQS queue for input messages
SQS queue for inpu...
Scheduled rule (via Cloudwatch)
Scheduled...
One-time rule (invoked via API)
One-time rule...
messageConsumer Lambda
messageCon...
SQS queue for executions
SQS queue for...
Cumulus workflow execution
Cumulus workflow e...
reportExecutions SNS topic
reportExecutions...
Workflow is triggered and queued
Workflow is triggered and queued
Workflow executes
Workflow executes
Workflow data is reported
Workflow data is reported
reportGranules SNS topic
reportGranules S...
reportPdrs
SNS topic
reportPdrs...
RDS Execution record
RDS Execut...
RDS Granule record
RDS Granul...
RDS PDR record
RDS PDR re...
SfSqsEventToDbRecords
Lambda
SfSqsEvent...
SQS queue for reporting
SQS queue for...
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/website/versioned_docs/version-v18.3.5/assets/inventory_report.png b/website/versioned_docs/version-v18.3.5/assets/inventory_report.png new file mode 100644 index 00000000000..3023a23ab36 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/inventory_report.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/kibana-create-index-pattern-1.png b/website/versioned_docs/version-v18.3.5/assets/kibana-create-index-pattern-1.png new file mode 100644 index 00000000000..c17c745b722 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/kibana-create-index-pattern-1.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/kibana-create-index-pattern-2.png b/website/versioned_docs/version-v18.3.5/assets/kibana-create-index-pattern-2.png new file mode 100644 index 00000000000..aac2a14dd45 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/kibana-create-index-pattern-2.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/kibana-discover-page.png b/website/versioned_docs/version-v18.3.5/assets/kibana-discover-page.png new file mode 100644 index 00000000000..2af6c9500a8 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/kibana-discover-page.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/kibana-discover-query.png b/website/versioned_docs/version-v18.3.5/assets/kibana-discover-query.png new file mode 100644 index 00000000000..b7318763a2a Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/kibana-discover-query.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/kibana-inspect-query.png b/website/versioned_docs/version-v18.3.5/assets/kibana-inspect-query.png new file mode 100644 index 00000000000..a583df81ef6 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/kibana-inspect-query.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/kibana-inspect-request.png b/website/versioned_docs/version-v18.3.5/assets/kibana-inspect-request.png new file mode 100644 index 00000000000..97150104a5e Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/kibana-inspect-request.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/kinesis-workflow.png b/website/versioned_docs/version-v18.3.5/assets/kinesis-workflow.png new file mode 100644 index 00000000000..4b4f05966d7 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/kinesis-workflow.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/lifecycle_1.png b/website/versioned_docs/version-v18.3.5/assets/lifecycle_1.png new file mode 100644 index 00000000000..fc5a1a2da0d Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/lifecycle_1.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/lifecycle_2.png b/website/versioned_docs/version-v18.3.5/assets/lifecycle_2.png new file mode 100644 index 00000000000..a8f58ef41b7 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/lifecycle_2.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/lifecycle_4.png b/website/versioned_docs/version-v18.3.5/assets/lifecycle_4.png new file mode 100644 index 00000000000..d20ba324fed Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/lifecycle_4.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/lifecycle_5.png b/website/versioned_docs/version-v18.3.5/assets/lifecycle_5.png new file mode 100644 index 00000000000..1520ce82861 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/lifecycle_5.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/new_execution.png b/website/versioned_docs/version-v18.3.5/assets/new_execution.png new file mode 100644 index 00000000000..56ad3319405 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/new_execution.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/pgadmin_create_server.png b/website/versioned_docs/version-v18.3.5/assets/pgadmin_create_server.png new file mode 100644 index 00000000000..076f6f30493 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/pgadmin_create_server.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/pgadmin_query_tool.png b/website/versioned_docs/version-v18.3.5/assets/pgadmin_query_tool.png new file mode 100644 index 00000000000..83e31f9a7db Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/pgadmin_query_tool.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/pgadmin_retrieve_btn.png b/website/versioned_docs/version-v18.3.5/assets/pgadmin_retrieve_btn.png new file mode 100644 index 00000000000..86b610c1f76 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/pgadmin_retrieve_btn.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/pgadmin_retrieve_values.png b/website/versioned_docs/version-v18.3.5/assets/pgadmin_retrieve_values.png new file mode 100644 index 00000000000..179ea550e33 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/pgadmin_retrieve_values.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/pgadmin_server_connection.png b/website/versioned_docs/version-v18.3.5/assets/pgadmin_server_connection.png new file mode 100644 index 00000000000..958c5ca7977 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/pgadmin_server_connection.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/pgadmin_ssh_config.png b/website/versioned_docs/version-v18.3.5/assets/pgadmin_ssh_config.png new file mode 100644 index 00000000000..94a5eee5570 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/pgadmin_ssh_config.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/queue-workflow.png b/website/versioned_docs/version-v18.3.5/assets/queue-workflow.png new file mode 100644 index 00000000000..8b65e87f460 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/queue-workflow.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/queued-execution-throttling.png b/website/versioned_docs/version-v18.3.5/assets/queued-execution-throttling.png new file mode 100644 index 00000000000..28ea238c4c6 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/queued-execution-throttling.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/raw/interfaces.drawio b/website/versioned_docs/version-v18.3.5/assets/raw/interfaces.drawio new file mode 100644 index 00000000000..ecfc0a04278 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/assets/raw/interfaces.drawio @@ -0,0 +1 @@ +7V1dd6I8Hv80nrN7oQcSIOGyre289ll3nN3p7M0cKqkyRbAQqj6ffhMBgSRWZgYQTzsXowkQ8Pd/fwkdwKvl5l3krBa3oUv8AdDczQCOBwAYFsTsg89s0xmAgJbOzCPPTef0YmLq/U2yyfy0xHNJXDmRhqFPvVV1chYGAZnRypwTReG6etpD6FfvunLmRJqYzhxfnv3muXSRzmJTK+bfE2++yO+sa9mRpZOfnE3EC8cN16UpeD2AV1EY0vTbcnNFfI5ejkt63c2Bo/sHi0hA61xgfrucP6K/Pr57WHxcRNf2KridDLNVnh0/yX5w9rB0myMQhUngEr6INoCX64VHyXTlzPjRNSM6m1vQpc9GOvsqP1R+BxJRsilNZQ/5joRLQqMtOyU7OtQty0wv2uYzmpGBuC5owKmQTi7KBNBgdqqTUX6+v0MBDvuS4aPG6mZC119/jqffby//Hq+/+D8/f/1P41g9eL5/FfphxMZBGJA9fBJWCkQPwpdfkWNnIxk6M2ffMnK21gBwSiazGgbOdeLF7tzmOM4QUbN0CTXLVqCGcQOo+bd4NtzcTscucv63CaznT7fvh6grdvtT7CSOUwgrhFCBXY5649jh49iRwL3g9qFAo8JWJeAYONH2jg9Gmgbzie8c9RHKh+NNRoV0tC2PJiTy2O8iUTZ5EHbiVoyRDHpZhhXMmM9FxHeo91w1YSqQsztMQo89SYWmWpWmEAurxGESzUh2YdnqSGsBjBhuurTeCGrFP1BdnTrRnFBp9R0v7OH4fYUEjrNHmFDfC8jV3q3QyhzCB1xGPOYofHbuiT8JY496YcCO3YeUhsvSCRe+N+cHaMgl0slGM0Z8zhFlTmMuworff7mZc3dq5KxjOIqDWJDeAYBj+wIZ/BJ2ouuxpdqQbMOGgmQDWSsia6Qww8Vs49YEKohn+ZSTx3uuENF6SrhzdfkQBnQY71zLC3YCMFabHUT5cfZtzj+nf02540bimDuF2aLsGXfrpqdIfMLApAIRaRQ+EoEeCuUrMYLIL0vPdfltlIq9qvpdL2JcmvIfk0xOm92vzvxpYDTFEiY0qiyBc8VeYgndtGWOAEZL/FDDTvZHmB/ZY8SeQqBvTMyxbVmgMR4JrjUyDNnRwQpb3ZZzmAd6LcjzpwxsvkZEnOV5iXTZG2lHlg1BvVsmzNmjLM0IKaS5La+3RpDVH2mOn3pjmk3Lqmma8ysbJ53sVU3/zQ3qU0LYBM+9RHyJYJXQwszGPRW/diTO0kT9a5imQuaADjqUOeOcZG7mh4m7duhsIYseMu0r/rvbFT3dEokIIVYQEYNOxc+UxW/GyJP4jEQsJGSf7OMfz57DPq72IP7zNcmfAYWwVrdVhNN1RWzdmvTJ2bF/BWRIvSUpkc0LnhkJOCFTAl5MPrwqylm6VaUc0ORQVDcV8taep6JyXPuYoAOaXcUO2brK5kBVOh2DluCzz8no+M7y3nVOFLXpQI7a0Mm9PV3lqafxFdcAtQM3KWrLHENG9ThZEu4yfk7RL6K39Aa9Dt/KWvDFnO8vcYIh+Y+WaaksmKayYA1k5NSsoErJ9VMT4ip6ho7qakLQRHlMDV8N/7tUrZj5Thx7sypiZOPRu9L3tDphZqOiOMEH29JALE3sax1DbaRp1qBS7NAtc9BkuSOtIQwO54YyHNJqQA1T3Jv6CRTrHczkVlf5hfqJKawllc2aq5ao0ZUji57wJ6jyp2E1W447yp/oXPnT0KQYFiM0QoKFqM+kulgwxBoaGahbPq3RZ9A5n2pVFkXHNGgtZnwpc3qGzKiLytK0GuNDhDWxEtk2G55VQep0KWxdTMZAWw7puw5q5MYRVQ6bbMgs4QR5Vdlr3UJVggGkzJ6Zilaf9rIwNfIIfYg99h2jLydhIFJpWtRA7VxdsqmRw+qRrqJk9eMhCWap6J1EbSFNLLzZcARkQnZde5OzMVfJMvETXv1eh9Hjg8+cnpLmekWKCyEgVd3YlIJoQO8ygZzLWtf+6kFMj7uYGfv2xXW0Rc/R1H/Xc8RilA31bqMXUCOJVpcbCg74XjpyIHrZRyqjfWzyvXys+UAlo/7xSKVn7Ca7rKZtiNFF/VhFw8JqliGt1jbTNZF6PBDqHs0V5qor76reqy+sw5dZVhFw/z471k7i9I0bARb8ccP67bgZAKG+atgdq7/z2O4AbFDFSbXZQV0/QK35ETV2O/TIiT+R5w6ZRw7EnKeFRhhL9Cs5h833tKtJKEexEVmFEb0uZRq0tD+d4c5U8Ktx36FuS9VvC4x0Rdt5LpvdtO2ovPdfq39D60Dj8rciYNu1L9PIm89JtOv/cQL+/y4f5Z5tSRxaTSWlTCw0t+iaLgd2Fuiyh9k6vEWlSdZIY3neQfvauQAbuqgjlGwAO20Qs1TOdfNs4DrUyRVFajQKxXAf/dFdXj1nwTrKxcj3RnbDVWdVXnJW3o+5Q8na2Z4mX8sMhJjYydsnTpetRU3viG8r2DHldjNQu2GqtYAHnUusKOyVhNbpoXuLFWvFilCKFU1ct+W2xTgRHYoT30VOkPjkFUeJkqoysLZvaDpZlIjPq8DaI3kDyvp45xKXew+SxE3cKC672blr/CZ/ueeFcQ/kT5Wl6Z+rYAp96bYBFMCpevpb8xTweTT1i8hZpqHqSeoYuxpltR5iB/PY8oTIyZHRlzHXp9dF44oWkVkYua9HudpCbQ5hueak57tIulGrchiRkinzQ9+INDSQXFjqlki27HumRJqMv7wRaMgcEeXe75ZIpHylpGwWeHfBNBsyL3MRzsPA8a+LWQGo4pzPIXf6dxT7SSjdZvA5CQ0HDfWMCa0PR/st7hcxouMf28fx3e2lfxc9xBZRtv8cft+mTM7afRN1KfPiU/bchA+h2FdsAEVDgtKIa6CJqEmN3lm9xueUu7uBIYe+hl4/9G0iR60moSpyamJ/9/Rh+hRfPzPMvobj+y87K6SMpJu42ZGt42da3mpsRzkwxayLKmJvazO5MdlcJR+eHs2x9222MT/Mtfc3dV4W3FZH4kFEy6Yqvrj55AfXk/8u7n88/QzuPhpakr9vti+tgYbYWYrErZi1OwMl+7J/YWDzjYEvQdt7OwykTQbYQMpNBq29XUCJ31m9Z+VkuxFB0Xq9Jx9Q1g/aM8NK8in8UNWOxDQn7QXzMzBeTe3rAUjuDNSwSuRa25KoJlkN5/fsdnK8pJvL5vHFYKov9hGKu8RNUf3Wto/7VOh+KVNYqjn7qIySa6Tp+6Pfuacv6XdwjeAlPqbf/+iPYui6sGvT1nSlcgcy04lvvm8s+XNWr908HeWETS6WJuc3OiVbjbfhvJFNIhvU5Ib9hsjGhsVfUUrVavHHqOD1/wE= \ No newline at end of file diff --git a/website/versioned_docs/version-v18.3.5/assets/rds_cluster_update.jpg b/website/versioned_docs/version-v18.3.5/assets/rds_cluster_update.jpg new file mode 100644 index 00000000000..0d2489a2658 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/rds_cluster_update.jpg differ diff --git a/website/versioned_docs/version-v18.3.5/assets/rec_reports_overview.png b/website/versioned_docs/version-v18.3.5/assets/rec_reports_overview.png new file mode 100644 index 00000000000..0a531785afa Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/rec_reports_overview.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/rerun_execution.png b/website/versioned_docs/version-v18.3.5/assets/rerun_execution.png new file mode 100644 index 00000000000..6eda09912ea Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/rerun_execution.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/schedule-workflows.png b/website/versioned_docs/version-v18.3.5/assets/schedule-workflows.png new file mode 100644 index 00000000000..669aefed2fd Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/schedule-workflows.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/secrets_manager_update.jpg b/website/versioned_docs/version-v18.3.5/assets/secrets_manager_update.jpg new file mode 100644 index 00000000000..867bf41e508 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/secrets_manager_update.jpg differ diff --git a/website/versioned_docs/version-v18.3.5/assets/sips-discover-and-queue-pdrs-execution.png b/website/versioned_docs/version-v18.3.5/assets/sips-discover-and-queue-pdrs-execution.png new file mode 100755 index 00000000000..b5fd347a3fd Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/sips-discover-and-queue-pdrs-execution.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/sips-ingest-granule.png b/website/versioned_docs/version-v18.3.5/assets/sips-ingest-granule.png new file mode 100755 index 00000000000..b84e026eaee Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/sips-ingest-granule.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/sips-parse-pdr.png b/website/versioned_docs/version-v18.3.5/assets/sips-parse-pdr.png new file mode 100755 index 00000000000..5e89882f7d7 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/sips-parse-pdr.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/sips-provider.png b/website/versioned_docs/version-v18.3.5/assets/sips-provider.png new file mode 100755 index 00000000000..ad5bad2bed1 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/sips-provider.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/sqs_message_attribute.png b/website/versioned_docs/version-v18.3.5/assets/sqs_message_attribute.png new file mode 100644 index 00000000000..69bf9c5187d Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/sqs_message_attribute.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/workflow-fail.png b/website/versioned_docs/version-v18.3.5/assets/workflow-fail.png new file mode 100644 index 00000000000..45d72a0b983 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/workflow-fail.png differ diff --git a/website/versioned_docs/version-v18.3.5/assets/workflow_reporting_diagram.png b/website/versioned_docs/version-v18.3.5/assets/workflow_reporting_diagram.png new file mode 100644 index 00000000000..b29dd30b5e8 Binary files /dev/null and b/website/versioned_docs/version-v18.3.5/assets/workflow_reporting_diagram.png differ diff --git a/website/versioned_docs/version-v18.3.5/configuration/cloudwatch-retention.md b/website/versioned_docs/version-v18.3.5/configuration/cloudwatch-retention.md new file mode 100644 index 00000000000..ee29f47b7aa --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/configuration/cloudwatch-retention.md @@ -0,0 +1,106 @@ +--- +id: cloudwatch-retention +title: Cloudwatch Retention +hide_title: false +--- + +Our lambdas dump logs to [AWS CloudWatch](https://aws.amazon.com/cloudwatch/). By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention. + +## aws-cli + +In addition to getting your aws-cli [set-up](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html), there are two values you'll need to acquire. + +1. `log-group-name`: the name of the log group who's retention policy (retention time) you'd like to change. We'll use `/aws/lambda/KinesisInboundLogger` in our examples. +2. `retention-in-days`: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the [aws logs documentation](https://docs.aws.amazon.com/cli/latest/reference/logs/put-retention-policy.html). + +For example, if we wanted to set log retention to 30 days on our `KinesisInboundLogger` lambda, we would write: + +```bash +aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30 +``` + +:::note more about the aws-cli log command + +The aws-cli log command that we're using is explained in detail [here](https://docs.aws.amazon.com/cli/latest/reference/logs/put-retention-policy.html). + +::: + +## AWS Management Console + +Changing the log retention policy in the AWS Management Console is a fairly simple process: + +1. Navigate to the CloudWatch service in the AWS Management Console. +2. Click on the `Logs` entry on the sidebar. +3. Find the Log Group who's retention policy you're interested in changing. +4. Click on the value in the `Expire Events After` column. +5. Enter/Select the number of days you'd like to retain logs in that log group for. + +![Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs](../assets/cloudwatch-retention.png) + +## Terraform + +Cumulus modules create cloudwatch log groups and manage log retention for a subset of lambdas and tasks. These log groups have a default log retention time, but, there are two optional variables which can be set to change the default retention period for all or specific Cumulus managed cloudwatch log groups through deployment. For cloudwatch log groups which are not managed by Cumulus modules, the retention period is indefinite or `Never Expire` by AWS, cloudwatch log configurations for all Cumulus lambdas and tasks will be added in a future release. + +There are optional variables that can be set during deployment of cumulus modules to configure +the retention period (in days) of cloudwatch log groups for lambdas and tasks which the `cumulus`, `cumulus_distribution`, and `cumulus_ecs_service` modules supports (using the `cumulus` module as an example): + +```tf +module "cumulus" { + # ... other variables + default_log_retention_days = var.default_log_retention_days + cloudwatch_log_retention_periods = var.cloudwatch_log_retention_periods +} +``` + +By setting the below variables in `terraform.tfvars` and deploying, the cloudwatch log groups will be instantiated or updated with the new retention value. + +### default_log_retention_periods + +The variable `default_log_retention_days` can be configured in order to set the default log retention for all cloudwatch log groups managed by Cumulus in case a custom value isn't used. The log groups will use this value for their retention, and if this value is not set either, the retention will default to 30 days. For example, if a user would like their log groups of the Cumulus module to have a retention period of one year, deploy the respective modules with the variable in the example below. + +#### Example + +```tf +default_log_retention_periods = 365 +``` + +### cloudwatch_log_retention_periods + +The retention period (in days) of cloudwatch log groups for specific lambdas and tasks can be set +during deployment using the `cloudwatch_log_retention_periods` terraform map variable. In order to +configure these values for respective cloudwatch log groups, uncomment the `cloudwatch_log_retention_periods` variable and add the retention values listed below corresponding to the group's retention you want to change. The following values are supported correlating to their lambda/task name, (i.e. "/aws/lambda/prefix-DiscoverPdrs" would have the retention variable "DiscoverPdrs" ) + +- ApiEndpoints +- AsyncOperationEcsLogs +- DiscoverPdrs +- DistributionApiEndpoints +- EcsLogs +- granuleFilesCacheUpdater +- HyraxMetadataUpdates +- ParsePdr +- PostToCmr +- PrivateApiLambda +- publishExecutions +- publishGranules +- publishPdrs +- QueuePdrs +- QueueWorkflow +- replaySqsMessages +- SyncGranule +- UpdateCmrAccessConstraints + +:::note + +`EcsLogs` is used for all cumulus_ecs_service tasks cloudwatch log groups + +::: + +#### Example + +```tf +cloudwatch_log_retention_periods = { + ParsePdr = 365 +} +``` + +The retention periods are the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the [aws logs documentation](https://docs.aws.amazon.com/cli/latest/reference/logs/put-retention-policy.html). diff --git a/website/versioned_docs/version-v18.3.5/configuration/collection-storage-best-practices.md b/website/versioned_docs/version-v18.3.5/configuration/collection-storage-best-practices.md new file mode 100644 index 00000000000..7deac366dfa --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/configuration/collection-storage-best-practices.md @@ -0,0 +1,184 @@ +--- +id: collection-storage-best-practices +title: Collection Cost Tracking and Storage Best Practices +hide_title: false +--- + +Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level. + +## Calculating Storage By Collection + +### By bucket + +Usage by bucket can be obtained in your [AWS Billing Dashboard](https://console.aws.amazon.com/billing/home) via an [S3 Usage Report](https://docs.aws.amazon.com/AmazonS3/latest/dev/aws-usage-report.html). You can download your usage report for a period of time and review your storage and requests at the bucket level. + +Bucket metrics can also be found in the [AWS CloudWatch Metrics Console](https://console.aws.amazon.com/cloudwatch/home#metricsV2:graph=~();namespace=~'AWS*2fS3) (also see [Using Amazon CloudWatch Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html)). + +Navigate to `Storage Metrics` and select the `BucketName` for all buckets you are interested in. The available metrics are `BucketSizeInBytes` and `NumberOfObjects`. + +In the `Graphed metrics` tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats. + +Alternatively you can query CloudWatch using the CLI. + +This command will return the average number of bytes in the bucket `test-bucket` for 7/31/2019: + +```bash +aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage +``` + +The result looks like: + +```json +{ + "Datapoints": [ + { + "Timestamp": "2019-07-31T00:00:00Z", + "Average": 150996467959.0, + "Unit": "Bytes" + } + ], + "Label": "BucketSizeBytes" +} +``` + +### By key prefix + +AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder `example-folder` in bucket `sample-bucket`: + +`aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'` + +Note that this can be a long-running operation for large buckets. + +## Calculating Cost By Collection + +### NASA NGAP Environment + +If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the `Financials` section of your account information. This is calculated on a monthly basis. + +There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket. + +### Outside of NGAP + +You can enabled [S3 Cost Allocation Tags](https://docs.aws.amazon.com/AmazonS3/latest/dev/CostAllocTagging.html) and tag your buckets. From there, you can view the cost breakdown in your [AWS Billing Dashboard](https://console.aws.amazon.com/billing/home) via the Cost Explorer. Cost Allocation Tagging is available at the bucket level. + +There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket. + +## Storage Configuration + +Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the [deployment process](../deployment/#create-s3-buckets). + +In your Cumulus [collection configuration](data-management-types#collections), you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket. + +Note that in the collection configuration, the `bucket` field is the key to the `buckets` variable in the deployment's `.tfvars` file. + +### Organizing By Bucket + +You can specify separate groups of buckets for each collection, which could look like the example below. + +```json +{ + "name": "MOD09GQ", + "version": "006", + "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "files": [ + { + "bucket": "MOD09GQ-006-protected", + "regex": "^.*\\.hdf$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf" + }, + { + "bucket": "MOD09GQ-006-private", + "regex": "^.*\\.hdf\\.met$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met" + }, + { + "bucket": "MOD09GQ-006-protected", + "regex": "^.*\\.cmr\\.xml$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml" + }, + { + "bucket": "MOD09GQ-006-public", + "regex": "^*\\.jpg$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg" + } + ] +} +``` + +Additional collections would go to different buckets. + +### Organizing by Key Prefix + +Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the `url_path` in the collection configuration. In this simplified collection configuration example, the `url_path` field is set at the top level so that all files go to a path prefixed with the collection name and version. + +```json +{ + "name": "MOD09GQ", + "version": "006", + "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "files": [ + { + "bucket": "protected", + "regex": "^.*\\.hdf$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf" + }, + { + "bucket": "private", + "regex": "^.*\\.hdf\\.met$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met" + }, + { + "bucket": "protected", + "regex": "^.*\\.cmr\\.xml$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml" + }, + { + "bucket": "public", + "regex": "^*\\.jpg$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg" + } + ] +} +``` + +In this case, the path to all the files would be: `MOD09GQ___006/` in their respective buckets. + +The `url_path` can be overidden directly on the file configuration. The example below produces the same result. + +```json +{ + "name": "MOD09GQ", + "version": "006", + "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "files": [ + { + "bucket": "protected", + "regex": "^.*\\.hdf$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}" + }, + { + "bucket": "private", + "regex": "^.*\\.hdf\\.met$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}" + }, + { + "bucket": "protected-2", + "regex": "^.*\\.cmr\\.xml$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}" + }, + { + "bucket": "public", + "regex": "^*\\.jpg$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}" + } + ] +} +``` diff --git a/website/versioned_docs/version-v18.3.5/configuration/data-management-types.md b/website/versioned_docs/version-v18.3.5/configuration/data-management-types.md new file mode 100644 index 00000000000..bfbec593963 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/configuration/data-management-types.md @@ -0,0 +1,305 @@ +--- +id: data-management-types +title: Cumulus Data Management Types +hide_title: false +--- + +## What Are The Cumulus Data Management Types + +- `Collections`: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest. +- `Granules`: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules. +- `Providers`: Providers generate and distribute input data that Cumulus obtains and sends to workflows. +- `Rules`: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow. +- `Workflows`: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data. +- `Executions`: Executions are records of a workflow. +- `Reconciliation Reports`: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts. + +## Interaction + +- **Providers** tell Cumulus where to get new data - i.e. S3, HTTPS +- **Collections** tell Cumulus where to store the data files +- **Rules** tell Cumulus when to trigger a workflow execution and tie providers and collections together + +## Managing Data Management Types + +The following are created via the dashboard or API: + +- **Providers** +- **Collections** +- **Rules** +- **Reconciliation reports** + +**Granules** are created by workflow executions and then can be managed via the dashboard or API. + + An **execution** record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API. + +**Workflows** are created and managed via the Cumulus deployment. + +## Configuration Fields + +### Schemas + +Looking at our API schema [definitions](https://github.com/nasa/cumulus/tree/master/packages/api/lib/schemas.js) can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document. + +:::note + +The schemas are _extremely_ useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation. + +::: + +### Providers + +- [Provider schema](https://github.com/nasa/cumulus/tree/master/packages/api/lib/schemas.js) (`module.exports.provider`) +- [Provider API](https://nasa.github.io/cumulus-api/?language=Python#list-providers) +- [Sample provider configurations](https://github.com/nasa/cumulus/tree/master/example/data/providers) + +:::note + +- While _connection_ configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider. +- There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a **per-workflow** basis. For example, see the [`httpListTimeout` configuration on the `discover-granules` task](https://github.com/nasa/cumulus/blob/master/tasks/discover-granules/schemas/config.json#L84) + +::: + +#### Provider Configuration + +The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers: + +
+ Configuration by provider type + +##### S3 + +|Key |Type |Required|Description| +|:---:|:----|:------:|-----------| +|id|string|Yes|Unique identifier for the provider| +|globalConnectionLimit|integer|No|Integer specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited | +|maxDownloadTime|integer|No|Maximum download time in seconds for all granule files on a sync granule task. The timeout is used together with globalConnectionLimit to limit concurrent downloads. | +|protocol|string|Yes|The protocol for this provider. Must be `s3` for this provider type. | +|host|string|Yes|S3 Bucket to pull data from | + +##### http + +|Key |Type |Required|Description| +|:---:|:----|:------:|-----------| +|id|string|Yes|Unique identifier for the provider| +|globalConnectionLimit|integer|No|Integer specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited | +|maxDownloadTime|integer|No|Maximum download time in seconds for all granule files on a sync granule task. The timeout is used together with globalConnectionLimit to limit concurrent downloads. | +|protocol|string|Yes|The protocol for this provider. Must be `http` for this provider type | +|host|string|Yes|The host to pull data from (e.g. `nasa.gov`) +|username|string|No|Configured username for basic authentication. Cumulus encrypts this using KMS and uses it in a `Basic` auth header if needed for authentication | +|password|string|_Only if username is specified_|Configured password for basic authentication. Cumulus encrypts this using KMS and uses it in a `Basic` auth header if needed for authentication | +|port|integer|No|Port to connect to the provider on. Defaults to `80`| +|allowedRedirects|string[]|No|Only hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port. +|certificateUri|string|No|SSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate + +##### https + +|Key |Type |Required|Description| +|:---:|:----|:------:|-----------| +|id|string|Yes|Unique identifier for the provider| +|globalConnectionLimit|integer|No|Integer specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited | +|maxDownloadTime|integer|No|Maximum download time in seconds for all granule files on a sync granule task. The timeout is used together with globalConnectionLimit to limit concurrent downloads. | +|protocol|string|Yes|The protocol for this provider. Must be `https` for this provider type | +|host|string|Yes|The host to pull data from (e.g. `nasa.gov`) | +|username|string|No|Configured username for basic authentication. Cumulus encrypts this using KMS and uses it in a `Basic` auth header if needed for authentication | +|password|string|_Only if username is specified_|Configured password for basic authentication. Cumulus encrypts this using KMS and uses it in a `Basic` auth header if needed for authentication | +|port|integer|No|Port to connect to the provider on. Defaults to `443` | +|allowedRedirects|string[]|No|Only hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port. +|certiciateUri|string|No|SSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate + +##### ftp + +|Key |Type |Required|Description| +|:---:|:----|:------:|-----------| +|id|string|Yes|Unique identifier for the provider| +|globalConnectionLimit|integer|No|Integer specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited | +|maxDownloadTime|integer|No|Maximum download time in seconds for all granule files on a sync granule task. The timeout is used together with globalConnectionLimit to limit concurrent downloads. | +|protocol|string|Yes|The protocol for this provider. Must be `ftp` for this provider type | +|host|string|Yes|The ftp host to pull data from (e.g. `nasa.gov`) | +|username|string|No|Username to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to `anonymous` if not defined | +|password|string|No|Password to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to `password` if not defined | +|port|integer|No|Port to connect to the provider on. Defaults to `21` + +##### sftp + +|Key |Type |Required|Description| +|:---:|:----|:------:|-----------| +|id|string|Yes|Unique identifier for the provider| +|globalConnectionLimit|integer|No|Integer specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited | +|maxDownloadTime|integer|No|Maximum download time in seconds for all granule files on a sync granule task. The timeout is used together with globalConnectionLimit to limit concurrent downloads. | +|protocol|string|Yes|The protocol for this provider. Must be `sftp` for this provider type | +|host|string|Yes|The ftp host to pull data from (e.g. `nasa.gov`) | +|username|string|No|Username to use to connect to the sftp server.| +|password|string|No|Password to use to connect to the sftp server. | +|port|integer|No|Port to connect to the provider on. Defaults to `22` | +|privateKey|string|No|filename assumed to be in s3://bucketInternal/stackName/crypto | +|cmKeyId|string|No|AWS KMS Customer Master Key arn or alias | + +
+ +### Collections + +- [Collection schema](https://github.com/nasa/cumulus/tree/master/packages/api/lib/schemas.js) (`module.exports.collection`) +- [Collection API](https://nasa.github.io/cumulus-api/?language=Python#list-collections) +- [Sample collection configurations](https://github.com/nasa/cumulus/tree/master/example/data/collections) + +
+ Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json) + +|Key |Value |Required |Description| +|:---:|:-----:|:--------:|-----------| +|name |`"MOD09GQ"`|Yes|The name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard| +|version|`"006"`|Yes|A version tag for the collection| +|granuleId|`"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"`|Yes|The regular expression used to validate the granule ID extracted from filenames according to the `granuleIdExtraction`| +|granuleIdExtraction|"(MOD09GQ\\..*)(\\.hdf|\\.cmr|_ndvi\\.jpg)"|Yes|The regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.| +|sampleFileName|`"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"`|Yes|An example filename belonging to this collection| +|files|`` of files defined [here](#files-object)|Yes|Describe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)| +|dataType|`"MOD09GQ"`|No|Can be specified, but this value will default to the collection_name if not| +|duplicateHandling|`"replace"`|No|("replace"|"version"|"skip") determines granule duplicate handling scheme| +|ignoreFilesConfigForDiscovery|`false` (default)|No|By default, during discovery only files that match one of the regular expressions in this collection's `files` attribute (see above) are ingested. Setting this to `true` will ignore the `files` attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching `granuleIdExtraction`) will be ingested even when they don't match a regular expression in the `files` attribute at _discovery_ time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.) +|process|`"modis"`|No|Example options for this are found in the ChooseProcess step definition in [the IngestAndPublish workflow definition](https://github.com/nasa/cumulus/tree/master/example/cumulus-tf/ingest_and_publish_granule_workflow.tf)| +|meta|`` of MetaData for the collection|No|MetaData for the collection. This metadata will be available to workflows for this collection via the [Cumulus Message Adapter](workflows/input_output.md). +|url_path|`"{cmrMetadata.Granule.Collection.ShortName}/`
`{substring(file.fileName, 0, 3)}"`|No|Filename without extension| + +#### files-object + +|Key |Value |Required |Description| +|:---:|:-----:|:--------:|-----------| +|regex|`"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"`|Yes|Regular expression used to identify the file| +|sampleFileName|`MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"`|Yes|Filename used to validate the provided regex| +|type|`"data"`|No|Value to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task| +|bucket|`"internal"`|Yes|Name of the bucket where the file will be stored| +|url_path|`"${collectionShortName}/{substring(file.fileName, 0, 3)}"`|No|Folder used to save the granule in the bucket. Defaults to the collection `url_path`| +|checksumFor|`"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"`|No|If this is a checksum file, set `checksumFor` to the `regex` of the target file.| + +
+ +### Rules + +- [Rule schema](https://github.com/nasa/cumulus/tree/master/packages/api/lib/schemas.js) (`module.exports.rule`) +- [Rule API](https://nasa.github.io/cumulus-api/?language=Python#list-rules) +- [Sample Kinesis rule](https://github.com/nasa/cumulus/blob/master/example/data/rules/L2_HR_PIXC_kinesisRule.json) +- [Sample SNS rule](https://github.com/nasa/cumulus/blob/master/example/spec/parallel/testAPI/snsRuleDef.json) +- [Sample SQS rule](https://github.com/nasa/cumulus/blob/master/example/spec/parallel/testAPI/data/rules/sqs/MOD09GQ_006_sqsRule.json) + +Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in [Kinesis](data-cookbooks/cnm-workflow.md), SNS messages or SQS messages. + +
+Rule configuration + +|Key |Value |Required|Description| +|:---:|:-----:|:------:|-----------| +|name|`"L2_HR_PIXC_kinesisRule"`|Yes|Name of the rule. This is the name under which the rule will be listed on the dashboard| +|workflow|`"CNMExampleWorkflow"`|Yes|Name of the workflow to be run. A list of available workflows can be found on the Workflows page| +|provider|`"PODAAC_SWOT"`|No|Configured provider's ID. This can be found on the Providers dashboard page| +|collection|`` collection object shown [below](#collection-object)|Yes|Name and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page| +|payload|``|No|The payload to be passed to the workflow| +|meta|`` of MetaData for the rule|No|MetaData for the rule. This metadata will be available to workflows for this rule via the [Cumulus Message Adapter](workflows/input_output.md). +|rule|`` rule type and associated values - discussed [below](#rule-object)|Yes|Object defining the type and subsequent attributes of the rule| +|state|`"ENABLED"`|No|("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to `"ENABLED"`.| +|queueUrl|`https://sqs.us-east-1.amazonaws.com/1234567890/queue-name`|No|URL for SQS queue that will be used to schedule workflows for this rule +|tags|`["kinesis", "podaac"]`|No|An array of strings that can be used to simplify search| + +#### collection-object + +|Key |Value |Required|Description| +|:---:|:-----:|:------:|-----------| +|name|`"L2_HR_PIXC"`|Yes|Name of a collection defined/configured in the Collections dashboard page| +|version|`"000"`|Yes|Version number of a collection defined/configured in the Collections dashboard page| + +#### meta-object + +|Key |Value |Required|Description| +|:---:|:-----:|:------:|-----------| +|retries|`3`|No|Number of retries on errors, for sqs-type rule only. Defaults to 3.| +|visibilityTimeout|`900`|No|VisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.| + +#### rule-object + +|Key|Value|Required|Description| +|:---:|:-----:|:------:|-----------| +|type|`"kinesis"`|Yes|("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired| +|value|` Object`|Depends|Discussion of valid values is [below](#rule-value)| + +#### rule-value + +The `rule - value` entry depends on the type of run: + +- If this is a onetime rule this can be left blank. [Example](data-cookbooks/hello-world.md/#execution) +- If this is a scheduled rule this field must hold a valid [cron-type expression or rate expression](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html). +- If this is a kinesis rule, this must be a configured `${Kinesis_stream_ARN}`. [Example](data-cookbooks/cnm-workflow.md#rule-configuration) +- If this is an sns rule, this must be an existing `${SNS_Topic_Arn}`. [Example](https://github.com/nasa/cumulus/blob/master/example/spec/parallel/testAPI/snsRuleDef.json) +- If this is an sqs rule, this must be an existing `${SQS_QueueUrl}` that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. [Example](https://github.com/nasa/cumulus/blob/master/example/spec/parallel/testAPI/data/rules/sqs/MOD09GQ_006_sqsRule.json) + +#### sqs-type rule features + +- When an SQS rule is triggered, the SQS message remains on the queue. +- The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing. +- The SQS message visibility timeout can be overridden by the rule. +- Upon successful workflow execution, the SQS message is removed from the queue. +- Upon failed execution(s), the workflow is run 3 or configured number of times. +- Upon failed execution(s), the visibility timeout will be set to 5s to allow retries. +- After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue. + +
+ +## Configuration Via Cumulus Dashboard + +### Create A Provider + +- In the Cumulus dashboard, go to the `Provider` page. + +![Screenshot of Create Provider form](../assets/cd_provider_page.png) + +- Click on `Add Provider`. +- Fill in the form and then submit it. + +![Screenshot of Create Provider form](../assets/cd_add_provider_form.png) + +### Create A Collection + +- Go to the `Collections` page. + +![Screenshot of the Collections page](../assets/cd_collections_page.png) + +- Click on `Add Collection`. +- Copy and paste or fill in the collection JSON object form. + +![Screenshot of Add Collection form](../assets/cd_add_collection.png) + +- Once you submit the form, you should be able to verify that your new collection is in the list. + +### Create A Rule + +1. Go To Rules Page + + +- Go to the Cumulus dashboard, click on `Rules` in the navigation. +- Click `Add Rule`. + +![Screenshot of Rules page](../assets/cd_rules_page.png) + +2. Complete Form + +- Fill out the template form. + + +![Screenshot of a Rules template for adding a new rule](../assets/cd_add_rule_form_blank.png) + +For more details regarding the field definitions and required information go to [Data Cookbooks](https://nasa.github.io/cumulus/docs/data-cookbooks/setup#rules). + +:::note state field conditional + +If the state field is left blank, it defaults to `false`. + +::: + +#### Rule Examples + +- A rule form with completed required fields: + +![Screenshot of a completed rule form](../assets/cd_add_rule_filled.png) + +- A successfully added Rule: + +![Screenshot of created rule](../assets/cd_add_rule_overview.png) diff --git a/website/versioned_docs/version-v18.3.5/configuration/lifecycle-policies.md b/website/versioned_docs/version-v18.3.5/configuration/lifecycle-policies.md new file mode 100644 index 00000000000..37e952ff2e4 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/configuration/lifecycle-policies.md @@ -0,0 +1,151 @@ +--- +id: lifecycle-policies +title: Setting S3 Lifecycle Policies +hide_title: false +--- + +This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document. + +## Requirements + +* The AWS CLI installed and configured (if you wish to run the CLI example). See [AWS's guide to setting up the AWS CLI](https://docs.aws.amazon.com/AmazonS3/latest/dev/setup-aws-cli.html) for more on this. Please ensure the AWS CLI is in your shell path. +* You will need a S3 bucket on AWS. ***You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning***. +* An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies. + +## Examples + +### Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy + +This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console. + +### Command Line + +:::caution + +Please ensure you have the AWS CLI installed and configured for access prior to attempting this example. + +::: + +#### Create policy + +From any directory you chose, open an editor and add the following to a file named `exampleRule.json` + +```bash +{ + "Rules": [ + { + "Status": "Enabled", + "Filter": { + "Prefix": "" + }, + "Transitions": [ + { + "Days": 90, + "StorageClass": "STANDARD_IA" + } + ], + "NoncurrentVersionTransitions": [ + { + "NoncurrentDays": 90, + "StorageClass": "STANDARD_IA" + } + ] + "ID": "90DayS3IAExample" + } + ] +} +``` + +#### Set policy + +On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere). + +```bash +aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json +``` + +#### Verify policy has been set + +To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name): + +```bash + $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere + { + "Rules": [ + { + "Status": "Enabled", + "Filter": { + "Prefix": "" + }, + "Transitions": [ + { + "Days": 90, + "StorageClass": "STANDARD_IA" + } + ], + "NoncurrentVersionTransitions": [ + { + "NoncurrentDays": 90, + "StorageClass": "STANDARD_IA" + } + ] + "ID": "90DayS3IAExample" + } + ] + } +``` + +You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days. + +### Management Console + +#### Create Policy + +To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update): + +`https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview` + +You should see a screen similar to: + +![Screenshot of AWS console for an S3 bucket](../assets/aws_bucket_console_example.png) + +Click the "Management" Tab, then lifecycle button and press `+ Add lifecycle rule`: + +![Screenshot of "Management" tab of AWS console for an S3 bucket](../assets/add_lifecycle_rule.png) + +Give the rule a name (e.g. '90DayRule'), leaving the filter blank: + +![Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console](../assets/lifecycle_1.png) + +Click `next`, and mark `Current Version` and `Previous Versions`. + +Then for each, click `+ Add transition` and select `Transition to Standard-IA after` for the `Object creation` field, and set `90` for the `Days after creation`/`Days after objects become concurrent` field. Your screen should look similar to: + +![Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console](../assets/lifecycle_2.png) + +Click `next`, then next past the `Configure expiration` screen (we won't be setting this), and on the fourth page, click `Save`: + +![Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console](../assets/lifecycle_4.png) + +You should now see you have a rule configured for your bucket: + +![Screenshot of lifecycle rule appearing in the "Management" tab of AWS console for an S3 bucket](../assets/lifecycle_5.png) + +You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days. + +## Additional Information + +This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples. + +### Strategy Overview + +For a discussion of overall recommended strategy, please review the [Methodology for Data Lifecycle Management](https://wiki.earthdata.nasa.gov/display/CUMULUS/Methodology+for+Data+Lifecycle+Management) on the EarthData wiki. + +### AWS Documentation + +The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see: + +* [AWS - Guide on setting bucket lifecycle policies via the management Console](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html) +* [AWS - Guide on setting bucket lifecycle policies using the AWS CLI](https://docs.aws.amazon.com/AmazonS3/latest/dev/set-lifecycle-cli.html) +* [AWS - Object Lifecycle Management](https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html) +* [AWS - Lifecycle Configuration Examples](https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-configuration-examples.html) diff --git a/website/versioned_docs/version-v18.3.5/configuration/monitoring.md b/website/versioned_docs/version-v18.3.5/configuration/monitoring.md new file mode 100644 index 00000000000..0d6c3d4eed0 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/configuration/monitoring.md @@ -0,0 +1,79 @@ +--- +id: monitoring-readme +title: Monitoring Best Practices +hide_title: false +--- + +This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues. + +## Cumulus-provided resources and integrations for monitoring + +Cumulus provides a number set of resources that are useful for monitoring the system and its operation. + +### Cumulus Dashboard + +The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted [on Github](https://github.com/nasa/cumulus-dashboard/) and includes instructions on how to deploy and link it into your core Cumulus deployment. + +The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the [documentation](https://github.com/nasa/cumulus-dashboard/blob/master/README.md). + +### Cumulus-provided AWS resources + +Cumulus sets up CloudWatch log groups for all Core-provided tasks. + +#### Monitoring Lambda Functions + +Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups. + +#### Monitoring ECS services + +Each deployed `cumulus_ecs_service` module also includes a CloudWatch log group for the processes running on ECS. + +#### Monitoring workflows + +For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the [dead letter queue documentation](features/lambda_dead_letter_queue.md) + +## AWS recommendations + +AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus: + +- [EC2 Monitoring](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring_ec2.html) +- [Lambda Monitoring](https://docs.aws.amazon.com/lambda/latest/dg/lambda-monitoring.html) +- [CloudWatch documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/index.html) + +## Example: Setting up email notifications for CloudWatch logs + +Cumulus does not provide out-of-the-box support for email notifications at this time. +However, setting up email notifications on AWS is fairly straightforward in that the operative components are an [AWS SNS topic and a subscribed email address](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/US_SetupSNS.html). + +In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a [Lambda Function subscribed to the log group](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#LambdaFunctionExample) that will receive, filter and forward these messages to the SNS topic. + +As a very simple example, we could create a function that filters CloudWatch logs created by the `@cumulus/logger` package and sends email notifications for `error` and `fatal` log levels, adapting the example linked above: + +```js +const zlib = require('zlib'); +const { SNS } = require('@aws-sdk/client-sns'); +const { promisify } = require('util'); + +const gunzip = promisify(zlib.gunzip); +const sns = new SNS(); + +exports.handler = async (event) => { + const payload = Buffer.from(event.awslogs.data, 'base64'); + const decompressedData = await gunzip(payload); + const logData = JSON.parse(decompressedData.toString('ascii')); + return await Promise.all(logData.logEvents.map(async (logEvent) => { + const logMessage = JSON.parse(logEvent.message); + if (['error', 'fatal'].includes(logMessage.level)) { + return sns.publish({ + TopicArn: process.env.EmailReportingTopicArn, + Message: logEvent.message + }).promise(); + } + return Promise.resolve(); + })); +}; +``` + +After creating the SNS topic, We can deploy this code as a lambda function, [following the setup steps from Amazon](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#LambdaFunctionExample). Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the `--environment` option on `aws lambda create-function`. + +You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module. diff --git a/website/versioned_docs/version-v18.3.5/configuration/server_access_logging.md b/website/versioned_docs/version-v18.3.5/configuration/server_access_logging.md new file mode 100644 index 00000000000..f8081c95994 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/configuration/server_access_logging.md @@ -0,0 +1,38 @@ +--- +id: server_access_logging +title: S3 Server Access Logging +hide_title: false +--- + +## Via AWS Console + +[Enable server access logging for an S3 bucket][howtologging] + +## Via [AWS Command Line Interface][cli] + +1. Create a `logging.json` file with these contents, replacing `` with your stack's internal bucket name, and `` with the name of your cumulus stack. + + ```json + { + "LoggingEnabled": { + "TargetBucket": "", + "TargetPrefix": "/ems-distribution/s3-server-access-logs/" + } + } + ``` + +2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket. + + ```sh + aws s3api put-bucket-logging --bucket --bucket-logging-status file://logging.json + ``` + +3. Verify the logging policy exists on your buckets. + + ```sh + aws s3api get-bucket-logging --bucket + ``` + +[cli]: https://aws.amazon.com/cli/ "Amazon command line interface" +[howtologging]: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/server-access-logging.html "Amazon Console Instructions" +[awslogging]: https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html "Amazon S3 Server Access Logging" diff --git a/website/versioned_docs/version-v18.3.5/configuration/task-configuration.md b/website/versioned_docs/version-v18.3.5/configuration/task-configuration.md new file mode 100644 index 00000000000..dd1944eda56 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/configuration/task-configuration.md @@ -0,0 +1,93 @@ +--- +id: task-configuration +title: Configuration of Tasks +hide_title: false +--- + +The `cumulus` module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables: + +## cmr_search_client_config + +Configuration parameters for CMR search client for cumulus archive module tasks in the form: + +```hcl +_report_cmr_limit = +_report_cmr_page_size = + type = map(string) +``` + +More information about cmr limit and cmr page_size can be found from [@cumulus/cmr-client](https://github.com/nasa/cumulus/blob/master/packages/cmr-client/src/searchConcept.ts) and [CMR Search API document](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#query-parameters). + +Currently the following values are supported: + +- create_reconciliation_report_cmr_limit +- create_reconciliation_report_cmr_page_size + +### Example + +```tf +cmr_search_client_config = { + create_reconciliation_report_cmr_limit = 2500 + create_reconciliation_report_cmr_page_size = 250 +} +``` + +## elasticsearch_client_config + +Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form: + +```hcl +_es_scroll_duration = +_es_scroll_size = + type = map(string) +``` + +Currently the following values are supported: + +- create_reconciliation_report_es_scroll_duration +- create_reconciliation_report_es_scroll_size + +### Example + +```tf +elasticsearch_client_config = { + create_reconciliation_report_es_scroll_duration = "15m" + create_reconciliation_report_es_scroll_size = 2000 +} +``` + +## lambda_timeouts + +An optional configurable map of timeouts (in seconds) for cumulus lambdas in the form: + +```hcl +lambda_timeouts = { + = +} +``` + +### Example + +```tf +lambda_timeouts = { + sqsMessageRemover = 300 +} +``` + +## lambda_memory_sizes + +An optional configurable map of memory sizes (in MBs) for cumulus lambdas in the form: + +```hcl +lambda_memory_sizes = { + = +} +``` + +### Example + +```tf +lambda_memory_sizes = { + SyncGranule = 1024 +} +``` diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/about-cookbooks.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/about-cookbooks.md new file mode 100644 index 00000000000..6b0e841cd19 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/about-cookbooks.md @@ -0,0 +1,27 @@ +--- +id: about-cookbooks +title: About Cookbooks +hide_title: false +--- + +## Introduction + +The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms. + +## Setup + +The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit [Cumulus data management types](../configuration/data-management-types) for information on how to configure Cumulus data management types. + +## Adding a page + +As shown in detail in the "Add a New Page and Sidebars" section in [Cumulus Docs: How To's](../docs-how-to.md), you can add a new page to the data cookbook by creating a markdown (`.md`) file in the `docs/data-cookbooks` directory. The new page can then be linked to the sidebar by adding it to the `Data-Cookbooks` object in the `website/sidebar.json` file as `data-cookbooks/${id}`. + +## More about workflows + +[Workflow general information](workflows/README.md) + +[Input & Output](workflows/input_output.md) + +[Developing Workflow Tasks](workflows/developing-workflow-tasks.md) + +[Workflow Configuration How-to's](workflows/workflow-configuration-how-to.md) diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/browse-generation.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/browse-generation.md new file mode 100644 index 00000000000..9c2a83a7468 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/browse-generation.md @@ -0,0 +1,600 @@ +--- +id: browse-generation +title: Ingest Browse Generation +hide_title: false +--- + +This entry documents how to setup a workflow that utilizes Cumulus's built-in granule file type configuration such that on ingest the browse data is exported to CMR. + +We will discuss how to run a processing workflow against an inbound granule that has data but no browse generated. The workflow will generate a browse file and add the appropriate output values to the Cumulus message so that the built-in post-to-cmr task will publish the data appropriately. + +## Sections + +- [Prerequisites](#prerequisites) +- [Configure Cumulus](#configure-cumulus) +- [Configure Ingest](#configure-ingest) +- [Run Workflows](#run-workflows) +- [Build Processing Lambda](#build-processing-lambda) + +## Prerequisites + +### Cumulus + +This entry assumes you have a deployed instance of Cumulus v1.16.0 or later, and a working dashboard following the instructions in the [deployment documentation](../deployment). This entry also assumes you have some knowledge of how to configure Collections, Providers and Rules and basic Cumulus operation. + +Prior to working through this entry, you should be somewhat familiar with the [Hello World](hello-world) example the [Workflows](../workflows) section of the documentation, and [building Cumulus lambdas](../workflows/lambda). + +You should also review the [Data Cookbooks Setup](about-cookbooks#setup) portion of the documentation as it contains useful information on the inter-task message schema expectations. + +This entry will utilize the [dashboard application](https://github.com/nasa/cumulus-dashboard). You will need to have a dashboard deployed as described in the [Cumulus deployment documentation](../deployment) to follow the instructions in this example. + +If you'd prefer to _not_ utilize a running dashboard to add Collections, Providers and trigger Rules, you can set the Collection/Provider and Rule via the API, however in that instance you should be very familiar with the [Cumulus API](https://nasa.github.io/cumulus-api/) before attempting the example in this entry. + +### Common Metadata Repository + +You should be familiar with the [Common Metadata Repository](https://earthdata.nasa.gov/about/science-system-description/eosdis-components/common-metadata-repository) and already be set up as a provider with configured collections and credentials to ingest data into CMR. You should know what the collection name and version number are. + +### Source Data + +You should have data available for Cumulus to ingest in an S3 bucket that matches with CMR if you'd like to push a record to CMR UAT. + +For the purposes of this entry, we will be using a pre-configured MOD09GQ version 006 CMR collection. If you'd prefer to utilize the example processing code, using mocked up data files matching the file naming convention will suffice, so long as you also have a matching collection setup in CMR. + +If you'd prefer to ingest another data type, you will need to generate a processing lambda (see [Build Processing Lambda](#build-processing-lambda) below). + +--- + +## Configure Cumulus + +### CMR + +To run this example with successful exports to CMR you'll need to make sure the CMR configuration keys for the `cumulus` terraform module are configured per that module's [variables.tf file](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/variables.tf). + +### Workflows + +#### Summary + +For this example, you are going to be adding two workflows to your Cumulus deployment. + +- DiscoverGranulesBrowseExample + + This workflow will run the `DiscoverGranules` task, targeting the S3 bucket/folder mentioned in the prerequisites. The output of that task will be passed into QueueGranules, which will trigger the second workflow for each granule to be ingested. The example presented here will be a single granule with a .hdf data file and a .met metadata file only, however your setup may result in more granules, or different files. + +- CookbookBrowseExample + + This workflow will be triggered for each granule in the previous workflow. It will utilize the SyncGranule task, which brings the files into a staging location in the Cumulus buckets. + + The output from this task will be passed into the `ProcessingStep` step , which in this example will utilize the `FakeProcessingLambda` task we provide for testing/as an example in Core, however to use your own data you will need to write a lambda that generates the appropriate CMR metadata file and accepts and returns appropriate task inputs and outputs. + + From that task we will utilize a core task `FilesToGranules` that will transform the processing output event.input list/config.InputGranules into an array of Cumulus [granules](https://github.com/nasa/cumulus/blob/master/packages/api/lib/schemas.js) objects. + + Using the generated granules list, we will utilize the core task `MoveGranules` to move the granules to the target buckets as defined in the collection configuration. That task will transfer the files to their final storage location and update the CMR metadata files and the granules list as output. + + That output will be used in the `PostToCmr` task combined with the previously generated CMR file to export the granule metadata to CMR. + +#### Workflow Configuration + +Copy the following workflow deployment files to your deployment's main directory (including the referenced `.asl.json` files): + +- [`browse_example.tf` (`DiscoverGranulesBrowseExample` workflow)](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/browse_example.tf) +- [`cookbook_browse_example_workflow.tf` (`CookbookBrowseExample` workflow)](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/cookbook_browse_example_workflow.tf) + +:::note + +You should update the `source =` line to match the current Cumulus `workflow` module release artifact to the version of Cumulus you're deploying: + +```hcl +source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip" +``` + +::: + +A few things to keep in mind about tasks in the workflow being added: + +:::info helpful snippets: interpolated values + +In the snippets below, `${post_to_cmr_task_arn}` and `${fake_processing_task_arn}` are interpolated values referring to Terraform resources. See the example deployment code for the [`CookbookBrowseExample` workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/cookbook_browse_example_workflow.tf). + +::: + +- The CMR step in CookbookBrowseExample: + +```json + "CmrStep": { + "Parameters": { + "cma": { + "event.$": "$", + "task_config": { + "bucket": "{$.meta.buckets.internal.name}", + "stack": "{$.meta.stack}", + "cmr": "{$.meta.cmr}", + "launchpad": "{$.meta.launchpad}", + "input_granules": "{$.meta.input_granules}", + "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}" + } + } + }, + "Type": "Task", + "Resource": "${post_to_cmr_task_arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "Catch": [ + { + "ErrorEquals": [ + "States.ALL" + ], + "ResultPath": "$.exception", + "Next": "WorkflowFailed" + } + ], + "End": true + } +``` + +Note that, in the task, the `CmrStep.Parameters.cma.task_config.cmr` key will contain the values you configured in the `cmr` configuration section above. + +- The Processing step in CookbookBrowseExample: + +```json +"ProcessingStep": { + "Parameters": { + "cma": { + "event.$": "$", + "task_config": { + "bucket": "{$.meta.buckets.internal.name}", + "collection": "{$.meta.collection}", + "cmrMetadataFormat": "{$.meta.cmrMetadataFormat}", + "additionalUrls": "{$.meta.additionalUrls}", + "generateFakeBrowse": true, + "cumulus_message": { + "outputs": [ + { + "source": "{$.granules}", + "destination": "{$.meta.input_granules}" + }, + { + "source": "{$.files}", + "destination": "{$.payload}" + } + ] + } + } + } + }, + "Type": "Task", + "Resource": "${fake_processing_task_arn}", + "Catch": [ + { + "ErrorEquals": [ + "States.ALL" + ], + "ResultPath": "$.exception", + "Next": "WorkflowFailed" + } + ], + "Retry": [ + { + "ErrorEquals": [ + "States.ALL" + ], + "IntervalSeconds": 2, + "MaxAttempts": 3 + } + ], + "Next": "FilesToGranulesStep" +}, +``` + +If you're not ingesting mock data matching the example, or would like to use modify the example to ingest your own data please see the [build-lambda](#build-lambda) section below. You will need to configure a different lambda entry for your lambda and utilize it in place of the `Resource` defined in the example workflow. + +:::note + +`FakeProcessing` is the core provided browse/CMR generation Lambda we're using for the example in this entry. + +::: + +#### Lambdas + +All lambdas utilized in this example are provided in a standard deployment of Cumulus and require no additional configuration. + +#### Redeploy + +Once you've configured your CMR credentials, updated your workflow configuration, and updated your lambda configuration you should be able to redeploy your cumulus instance by running the following commands: + +**`terraform init`** + +You should expect to see output similar to: + +```sh +$ terraform init +Initializing modules... +Downloading https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip for cookbook_browse_example_workflow... +- cookbook_browse_example_workflow in .terraform/modules/cookbook_browse_example_workflow +Downloading https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip for discover_granules_browse_example_workflow... +- discover_granules_browse_example_workflow in .terraform/modules/discover_granules_browse_example_workflow + +Initializing the backend... + +Initializing provider plugins... + +Terraform has been successfully initialized! + +You may now begin working with Terraform. Try running "terraform plan" to see +any changes that are required for your infrastructure. All Terraform commands +should now work. + +If you ever set or change modules or backend configuration for Terraform, +rerun this command to reinitialize your working directory. If you forget, other +commands will detect it and remind you to do so if necessary. +``` + +**`terraform apply`** + +You should expect to see output similar to the following truncated example: + +```bash + +$ terraform apply +module.cumulus.module.archive.null_resource.rsa_keys: Refreshing state... [id=xxxxxxxxx] +data.terraform_remote_state.data_persistence: Refreshing state... +module.cumulus.module.archive.aws_cloudwatch_event_rule.daily_execution_payload_cleanup: Refreshing state... [id=xxxx] + +.... + +An execution plan has been generated and is shown below. +Resource actions are indicated with the following symbols: + + create + ~ update in-place +-/+ destroy and then create replacement + <= read (data resources) + +Terraform will perform the following actions: +{...} +Plan: 15 to add, 3 to change, 1 to destroy. + +Do you want to perform these actions? + Terraform will perform the actions described above. + Only 'yes' will be accepted to approve. + + Enter a value: yes + +{...} + +Apply complete! Resources: 15 added, 3 changed, 1 destroyed. +Releasing state lock. This may take a few moments... + +Outputs: + +archive_api_redirect_uri = {URL} +archive_api_uri = {URL} +distribution_redirect_uri = {URL} +distribution_url = {URL} +s3_credentials_redirect_uri = {URL} +``` + +--- + +## Configure Ingest + +Now that Cumulus has been updated updated with the new workflows and code, we will use the Cumulus dashboard to configure an ingest collection, provider and rule so that we can trigger the configured workflow. + +### Add Collection + +Navigate to the 'Collection' tab on the interface and add a collection. + +```json +{ + "name": "MOD09GQ", + "version": "006", + "process": "modis", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}", + "duplicateHandling": "replace", + "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$", + "granuleIdExtraction": "(MOD09GQ\\..*)(\\.hdf|\\.cmr|_ndvi\\.jpg|\\.jpg)", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "files": [ + { + "bucket": "protected", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "type": "data", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}" + }, + { + "bucket": "private", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf\\.met$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met", + "type": "metadata" + }, + { + "bucket": "protected-2", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.cmr\\.xml$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml" + }, + { + "bucket": "protected", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.jpg$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.jpg" + } + ] +} +``` + +:::note + +Even though our initial discover granules ingest brings in only the .hdf and .met files we've staged, we still configure the other possible file types for this collection's granules. + +::: + +### Add Provider + +Next navigate to the Provider tab and create a provider with the following values, using whatever name you wish, and the bucket the data was staged to as the host: + +```shell +Name: +Protocol: S3 +Host: {{data_source_bucket}} +``` + +### Add Rule + +Once you have your provider added, go to the Rules tab and add a rule with the +following values (using whatever name you wish, populating the workflow and +provider keys with the previously entered values) Note that you need to set the +"provider_path" to the path on your bucket (e.g. "/data") that you've staged +your mock/test data.: + +```json +{ + "name": "TestBrowseGeneration", + "workflow": "DiscoverGranulesBrowseExample", + "provider": "{{provider_from_previous_step}}", + "collection": { + "name": "MOD09GQ", + "version": "006" + }, + "meta": { + "provider_path": "{{path_to_data}}" + }, + "rule": { + "type": "onetime" + }, + "state": "ENABLED", + "updatedAt": 1553053438767 +} +``` + +--- + +## Run Workflows + +Once you've configured the Collection and Provider and added a onetime rule with an `ENABLED` state, you're ready to trigger your rule, and watch the ingest workflows process. + +Go to the Rules tab, click the rule you just created: + +![Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard](../assets/browse_processing_1.png) + +Then click the gear in the upper right corner and click "Rerun": + +![Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard](../assets/browse_processing_2.png) + +Tab over to executions and you should see the `DiscoverGranulesBrowseExample` workflow run, succeed, and then moments later the `CookbookBrowseExample` should run and succeed. + +![Screenshot of page listing executions in the Cumulus dashboard](../assets/browse_processing_3.png) + +### Results + +You can verify your data has ingested by clicking the successful workflow entry: + +![Screenshot of individual entry from table listing executions in the Cumulus dashboard](../assets/browse_processing_4.png) + +Select "Show Output" on the next page + +![Screenshot of "Show output" button from individual execution page in the Cumulus dashboard](../assets/browse_processing_5.png) + +and you should see in the payload from the workflow something similar to: + +```json +"payload": { + "process": "modis", + "granules": [ + { + "files": [ + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "type": "data", + "bucket": "cumulus-test-sandbox-protected", + "path": "data", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}", + "size": 1908635 + }, + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met", + "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met", + "type": "metadata", + "bucket": "cumulus-test-sandbox-private", + "path": "data", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}", + "size": 21708 + }, + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg", + "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg", + "type": "browse", + "bucket": "cumulus-test-sandbox-protected", + "path": "data", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}", + "size": 1908635 + }, + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml", + "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml", + "type": "metadata", + "bucket": "cumulus-test-sandbox-protected-2", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}" + } + ], + "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS", + "cmrConceptId": "G1222231611-CUMULUS", + "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606", + "cmrMetadataFormat": "echo10", + "dataType": "MOD09GQ", + "version": "006", + "published": true + } + ] +} +``` + +You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry + +--- + +## Build Processing Lambda + +This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task. + +To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file. + +You will then need to add the lambda to your Cumulus deployment as a `aws_lambda_function` Terraform resource. + +The discussion below outlines requirements for this lambda. + +### Inputs + +The incoming message to the task defined in the `ProcessingStep` as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter): + +#### Configuration + +- event.config.bucket -- the name of the bucket configured in `terraform.tfvars` as your `internal` bucket. + +- event.config.collection -- The full collection object we will configure in the [Configure Ingest](#configure-ingest) section. You can view the expected collection schema in the docs [here](../configuration/data-management-types.md) or in the source code [on github](https://github.com/nasa/cumulus/blob/master/packages/api/lib/schemas.js). You need this as available input _and_ output so you can update as needed. + +`event.config.additionalUrls`, `generateFakeBrowse` and `event.config.cmrMetadataFormat` from the example can be ignored as they're configuration flags for the provided example script. + +#### Payload + +The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed [here](https://github.com/nasa/cumulus/blob/master/tasks/move-granules/schemas/output.json). + +In our example, the payload would look like the following. **Note**: The types are set per-file based on what we configured in our collection, and were initially added as part of the `DiscoverGranules` step in the `DiscoverGranulesBrowseExample` workflow. + +```json + "payload": { + "process": "modis", + "granules": [ + { + "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606", + "dataType": "MOD09GQ", + "version": "006", + "files": [ + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "bucket": "cumulus-test-sandbox-internal", + "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "size": 1908635 + }, + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met", + "bucket": "cumulus-test-sandbox-internal", + "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met", + "size": 21708 + } + ] + } + ] + } +``` + +### Generating Browse Imagery + +The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf. + +The processing lambda you construct will need to do the following: + +- Create a browse image file based on the input data, and stage it to a location accessible to both this task and the `FilesToGranules` and `MoveGranules` tasks in a S3 bucket. +- Add the browse file to the input granule files, making sure to set the granule file's type to `browse`. +- Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by `FilesToGranules` as output from the task. + +### Generating/updating CMR metadata + +If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the `FilesToGranules` file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it. + +Both downstream tasks `MoveGranules`, `UpdateGranulesCmrMetadataFileLinks`, and `PostToCmr` expect a valid CMR file to be available if you want to export to CMR. + +### Expected Outputs for processing task/tasks + +In the above example, the critical portion of the output to `FilesToGranules` is the payload and meta.input_granules. + +In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules: + +```json + "task_config": { + "inputGranules": "{$.meta.input_granules}", + "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}" + } +``` + +Their expected values from the example above may be useful in constructing a processing task: + +#### payload + +The payload includes a full list of files to be 'moved' into the cumulus archive. The `FilesToGranules` task will take this list, merge it with the information from `InputGranules`, then pass that list to the `MoveGranules` task. The `MoveGranules` task will then move the files to their targets. The `UpdateGranulesCmrMetadataFileLinks` task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags. + +In the provided example, a payload being passed to the `FilesToGranules` task should be expected to look like: + +```json + "payload": [ + "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met", + "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg", + "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml" + ] +``` + +This list is the list of granules `FilesToGranules` will act upon to add/merge with the input_granules object. + +The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/`MoveGranules` task's roles have access and the filename matches the collection configuration. + +#### input_granules + +The `FilesToGranules` task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like: + +```json +"input_granules": [ + { + "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606", + "dataType": "MOD09GQ", + "version": "006", + "files": [ + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "bucket": "cumulus-test-sandbox-internal", + "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "size": 1908635 + }, + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met", + "bucket": "cumulus-test-sandbox-internal", + "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met", + "size": 21708 + }, + { + "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg", + "bucket": "cumulus-test-sandbox-internal", + "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg" + } + ] + } +], +``` diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/choice-states.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/choice-states.md new file mode 100644 index 00000000000..3182d3f7086 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/choice-states.md @@ -0,0 +1,58 @@ +--- +id: choice-states +title: Choice States +hide_title: false +--- + +Cumulus supports AWS Step Function [`Choice`](https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-choice-state.html) states. A `Choice` state enables branching logic in Cumulus workflows. + +`Choice` state definitions include a list of `Choice Rule`s. Each `Choice Rule` defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review [the AWS docs](https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-choice-state.html). + +If the comparison evaluates to `true`, the `Next` state is followed. + +## Example + +In [examples/cumulus-tf/parse_pdr_workflow.tf](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/parse_pdr_workflow.tf) the `ParsePdr` workflow uses a `Choice` state, `CheckAgainChoice`, to terminate the workflow once `meta.isPdrFinished: true` is returned by the `CheckStatus` state. + +The `CheckAgainChoice` state definition requires an input object of the following structure: + +```json +{ + "meta": { + "isPdrFinished": false + } +} +``` + +Given the above input to the `CheckAgainChoice` state, the workflow would transition to the `PdrStatusReport` state. + +```json +"CheckAgainChoice": { + "Type": "Choice", + "Choices": [ + { + "Variable": "$.meta.isPdrFinished", + "BooleanEquals": false, + "Next": "PdrStatusReport" + }, + { + "Variable": "$.meta.isPdrFinished", + "BooleanEquals": true, + "Next": "WorkflowSucceeded" + } + ], + "Default": "WorkflowSucceeded" +} +``` + +## Advanced: Loops in Cumulus Workflows + +Understanding the complete `ParsePdr` workflow is not necessary to understanding how `Choice` states work, but `ParsePdr` provides an example of how `Choice` states can be used to create a loop in a Cumulus workflow. + +In the complete `ParsePdr` workflow definition, the state `QueueGranules` is followed by `CheckStatus`. From `CheckStatus` a loop starts: Given `CheckStatus` returns `meta.isPdrFinished: false`, `CheckStatus` is followed by `CheckAgainChoice` is followed by `PdrStatusReport` is followed by `WaitForSomeTime`, which returns to `CheckStatus`. Once `CheckStatus` returns `meta.isPdrFinished: true`, `CheckAgainChoice` proceeds to `WorkflowSucceeded`. + +![Execution graph of SIPS ParsePdr workflow in AWS Step Functions console](../assets/sips-parse-pdr.png) + +## Further documentation + +For complete details on `Choice` state configuration options, see [the Choice state documentation](https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-choice-state.html). diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/cnm-workflow.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/cnm-workflow.md new file mode 100644 index 00000000000..55f20910d04 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/cnm-workflow.md @@ -0,0 +1,642 @@ +--- +id: cnm-workflow +title: CNM Workflow +hide_title: false +--- + +This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus. + +Prior to working through this entry you should be familiar with the [Cloud Notification Mechanism](https://github.com/podaac/cloud-notification-message-schema). + +## Sections + +- [Prerequisites](#prerequisites) +- [Configure the Workflow](#configure-the-workflow) +- [Execute the Workflow](#execute-the-workflow) +- [Verify Results](#verify-results) +- [Kinesis Record Error Handling](#kinesis-record-error-handling) + +--- + +## Prerequisites + +### Cumulus + +This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the [`cumulus` terraform module](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus) sourced from the [release page](https://github.com/nasa/cumulus/releases). + +### AWS CLI + +This entry assumes you have the [AWS CLI](https://aws.amazon.com/cli/) installed and configured. If you do not, please take a moment to review the documentation - particularly the [examples relevant to Kinesis](https://docs.aws.amazon.com/streams/latest/dev/fundamental-stream.html) - and install it now. + +### Kinesis + +This entry assumes you already have two [Kinesis](https://aws.amazon.com/kinesis/) data steams created for use as CNM notification and response data streams. + +If you do not have two streams setup, please take a moment to review the [Kinesis documentation](https://aws.amazon.com/documentation/kinesis/) and setup two basic single-shard streams for this example: + +Using the "Create Data Stream" button on the [Kinesis Dashboard](https://console.aws.amazon.com/kinesis/home), work through the dialogue. +See the [Kinesis Stream for Ingest docs](../operator-docs/kinesis-stream-for-ingest.md) for details. + +You should be able to quickly use the "Create Data Stream" button on the [Kinesis Dashboard](https://console.aws.amazon.com/kinesis/home), and setup streams that are similar to the following example: + +![Screenshot of AWS console page for creating a Kinesis stream](../assets/cnm_create_kinesis_stream_V2.jpg) + +Please bear in mind that your `{{prefix}}-lambda-processing` IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the `cumulus` top-level module for your deployment this should be set properly. + +If not, the most straightforward approach is to attach the `AmazonKinesisFullAccess` policy for the stream resource to whatever role your Lambda +s are using, however your environment/security policies may require an approach specific to your deployment environment. + +In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions. + +:::note +Ensure that the data streams created have server-side encryption enabled. +::: + +For more information on how this process works and how to develop a process that will add records to a stream, read the [Kinesis documentation](https://aws.amazon.com/documentation/kinesis/) and the [developer guide](https://docs.aws.amazon.com/streams/latest/dev/introduction.html). + +### Source Data + +This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section. + +### Collection and Provider + +Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the `Ingest Source Data` section. + +This can be done via the [Cumulus Dashboard](https://github.com/nasa/cumulus-dashboard) if installed or the [API](../api.md). It is strongly recommended to use the dashboard if possible. + +--- + +## Configure the Workflow + +Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow. + +The following are steps that are required to set up your Cumulus instance to run the example workflow: + +### Example CNM Workflow + +In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream. + +The following [workflow definition](workflows/README.md) should be added to a new `.tf` workflow resource (e.g. `cnm_workflow.tf`) in your deployment directory. For the complete CNM workflow example, see [examples/cumulus-tf/cnm_workflow.tf](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/cnm_workflow.tf). + +Add the following to the new terraform file in your deployment directory, updating the following: + +- Set the `response-endpoint` key in the `CnmResponse` task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section +- Update the `source` key to the workflow module to match the Cumulus release associated with your deployment. + +```hcl +module "cnm_workflow" { + source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip" + + prefix = var.prefix + name = "CNMExampleWorkflow" + workflow_config = module.cumulus.workflow_config + system_bucket = var.system_bucket + +{ +state_machine_definition = < +- + +:::note + +To utilize these tasks you need to ensure you have a compatible CMA layer. See the [deployment instructions](../deployment/README.md#deploy-cumulus-message-adapter-layer) for more details on how to deploy a CMA layer. + +::: + +- `CNMToCMA`: +- `CnmResponse`: + +Below is a description of each of these tasks: + +#### CNMToCMA + +`CNMToCMA` is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message _or_ include a translation task like this one. + +You can also manipulate the data sent to downstream tasks using `task_config` for various states in your workflow resource configuration. Read more about how to configure data on the [Workflow Input & Output](https://nasa.github.io/cumulus/docs/workflows/input_output) page. + +##### CnmResponse + +The `CnmResponse` Lambda generates a CNM response message and puts it on the `response-endpoint` Kinesis stream. + +You can read more about the expected schema of a `CnmResponse` record in the [Cloud Notification Mechanism schema repository](https://github.com/podaac/cloud-notification-message-schema#response-message-fields). + +##### Additional Tasks + +Lastly, this entry also makes use of the `SyncGranule` task from the [`cumulus` module](https://github.com/nasa/cumulus/releases). + +### Redeploy + +Once the above configuration changes have been made, redeploy your stack. + +Please refer to `Update Cumulus resources` in the [deployment documentation](../deployment/upgrade.md#update-cumulus-resources) if you are unfamiliar with redeployment. + +### Rule Configuration + +Cumulus includes a `messageConsumer` Lambda function ([message-consumer](https://github.com/nasa/cumulus/blob/master/packages/api/lambdas/message-consumer.js)). Cumulus kinesis-type rules create the [event source mappings](https://docs.aws.amazon.com/lambda/latest/dg/API_CreateEventSourceMapping.html) between Kinesis streams and the `messageConsumer` Lambda. The `messageConsumer` Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the `messageConsumer` triggers workflows associated with the enabled kinesis-type rules. + +To add a rule via the dashboard (if you'd like to use the API, see the docs [here](https://nasa.github.io/cumulus-api/#create-rule)), navigate to the `Rules` page and click `Add a rule`, then configure the new rule using the following template (substituting correct values for parameters denoted by `${}`): + +```json +{ + "collection": { + "name": "L2_HR_PIXC", + "version": "000" + }, + "name": "L2_HR_PIXC_kinesisRule", + "provider": "PODAAC_SWOT", + "rule": { + "type": "kinesis", + "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}" + }, + "state": "ENABLED", + "workflow": "CNMExampleWorkflow" +} +``` + +:::note + +- The rule's `value` attribute value must match the Amazon Resource Name [ARN](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream. +- The collection and provider should match the collection and provider you setup in the [`Prerequisites`](#prerequisites) section. + +::: + +Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview. + +--- + +## Execute the Workflow + +Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute. + +### How to Trigger the Workflow + +To trigger matching workflows, you will need to put a record on the Kinesis stream that the [message-consumer](https://github.com/nasa/cumulus/blob/master/packages/api/lambdas/message-consumer.js) Lambda will recognize as a matching event. Most importantly, it should include a `collection` name that matches a valid collection. + +For the purpose of this example, the easiest way to accomplish this is using the [AWS CLI](https://aws.amazon.com/cli/). + +#### Create Record JSON + +Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid [Cloud Notification Mechanism](https://github.com/podaac/cloud-notification-message-schema#cumulus-sns-schema) message. + +:::note + +This example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would. + +::: + +The following values (denoted by `${}` in the sample below) should be replaced to match values we've previously configured: + +- `TEST_DATA_FILE_NAME`: The filename of the test data that is available in the S3 (or other) provider we created earlier. +- `TEST_DATA_URI`: The full S3 path to the test data (e.g. s3://bucket-name/path/granule) +- `COLLECTION`: The collection name defined in the prerequisites for this product + +```json +{ + "product": { + "files": [ + { + "checksumType": "md5", + "name": "${TEST_DATA_FILE_NAME}", + "checksum": "bogus_checksum_value", + "uri": "${TEST_DATA_URI}", + "type": "data", + "size": 12345678 + } + ], + "name": "${TEST_DATA_FILE_NAME}", + "dataVersion": "006" + }, + "identifier ": "testIdentifier123456", + "collection": "${COLLECTION}", + "provider": "TestProvider", + "version": "001", + "submissionTime": "2017-09-30T03:42:29.791198" +} +``` + +#### Add Record to Kinesis Data Stream + +Using the JSON file you created, push it to the Kinesis notification stream: + +```bash +aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json +``` + +:::note + +The above command uses the stream name, _not_ the ARN. + +::: + +The command should return output similar to: + +```json +{ + "ShardId": "shardId-000000000000", + "SequenceNumber": "42356659532578640215890215117033555573986830588739321858" +} +``` + +This command will put a record containing the JSON from the `--data` flag onto the Kinesis data stream. The `messageConsumer` Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the `CNMExampleWorkflow` workflow as defined by the rule previously configured. + +You can view the current running executions on the `Executions` dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. + +### Verify Workflow Execution + +As detailed above, once the record is added to the Kinesis data stream, the `messageConsumer` Lambda will trigger the `CNMExampleWorkflow` . + +#### TranslateMessage + +`TranslateMessage` (which corresponds to the `CNMToCMA` Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a `meta.cnm` key (as well as the payload) to store the original message. + +:::info + +For more on the Message Adapter, please see [the Message Flow documentation](workflows/cumulus-task-message-flow.md). + +::: + +An example of what is happening in the `CNMToCMA` Lambda is as follows: + +Example Input Payload: + +```json +"payload": { + "identifier ": "testIdentifier123456", + "product": { + "files": [ + { + "checksumType": "md5", + "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "checksum": "bogus_checksum_value", + "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "type": "data", + "size": 12345678 + } + ], + "name": "TestGranuleUR", + "dataVersion": "006" + }, + "version": "123456", + "collection": "MOD09GQ", + "provider": "TestProvider", + "submissionTime": "2017-09-30T03:42:29.791198" +} +``` + +Example Output Payload: + +```json + "payload": { + "cnm": { + "identifier ": "testIdentifier123456", + "product": { + "files": [ + { + "checksumType": "md5", + "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "checksum": "bogus_checksum_value", + "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "type": "data", + "size": 12345678 + } + ], + "name": "TestGranuleUR", + "dataVersion": "006" + }, + "version": "123456", + "collection": "MOD09GQ", + "provider": "TestProvider", + "submissionTime": "2017-09-30T03:42:29.791198", + "receivedTime": "2017-09-30T03:42:31.634552" + }, + "output": { + "granules": [ + { + "granuleId": "TestGranuleUR", + "files": [ + { + "path": "some-bucket/data", + "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "bucket": "some-bucket", + "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf", + "size": 12345678 + } + ] + } + ] + } + } +``` + +#### SyncGranules + +This Lambda will take the files listed in the payload and move them to `s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}`. + +#### CnmResponse + +Assuming a successful execution of the workflow, this task will recover the `meta.cnm` key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream. + +If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead. + +The data written to the `response-endpoint` should adhere to the [Response Message Fields](https://github.com/podaac/cloud-notification-message-schema#cumulus-sns-schema) schema. + +**Example CNM Success Response**: + +```json +{ + "provider": "PODAAC_SWOT", + "collection": "SWOT_Prod_l2:1", + "processCompleteTime": "2017-09-30T03:45:29.791198", + "submissionTime": "2017-09-30T03:42:29.791198", + "receivedTime": "2017-09-30T03:42:31.634552", + "identifier": "1234-abcd-efg0-9876", + "response": { + "status": "SUCCESS" + } +} +``` + +**Example CNM Error Response**: + +```json +{ + "provider": "PODAAC_SWOT", + "collection": "SWOT_Prod_l2:1", + "processCompleteTime": "2017-09-30T03:45:29.791198", + "submissionTime": "2017-09-30T03:42:29.791198", + "receivedTime": "2017-09-30T03:42:31.634552", + "identifier": "1234-abcd-efg0-9876", + "response": { + "status": "FAILURE", + "errorCode": "PROCESSING_ERROR", + "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value." + } +} +``` + +Note the `CnmResponse` state defined in the `.tf` workflow definition above configures `$.exception` to be passed to the `CnmResponse` Lambda keyed under `config.WorkflowException`. This is required for the `CnmResponse` code to deliver a failure response. + +To test the failure scenario, send a record missing the `product.name` key. + +--- + +## Verify results + +### Check for successful execution on the dashboard + +Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard: + +![Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard](../assets/cnm_success_example.png) + +### Check the test granule has been delivered to S3 staging + +The test granule identified in the Kinesis record should be moved to the deployment's private staging area. + +### Check for Kinesis records + +A `SUCCESS` notification should be present on the `response-endpoint` Kinesis stream. + +You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis [Basic Stream Operations](https://docs.aws.amazon.com/streams/latest/dev/fundamental-stream.html) is useful to review before proceeding): + +Get a shard iterator (substituting your stream name as appropriate): + +```bash +aws kinesis get-shard-iterator \ + --shard-id shardId-000000000000 \ + --shard-iterator-type LATEST \ + --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME +``` + +which should result in an output to: + +```json +{ + "ShardIterator": "VeryLongString==" +} +``` + +- Re-trigger the workflow by using the `put-record` command from +- As the workflow completes, use the output from the `get-shard-iterator` command to request data from the stream: + +```bash +aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE +``` + +This should result in output similar to: + +```json +{ + "Records": [ + { + "SequenceNumber": "49586720336541656798369548102057798835250389930873978882", + "ApproximateArrivalTimestamp": 1532664689.128, + "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9", + "PartitionKey": "1" + }, + { + "SequenceNumber": "49586720336541656798369548102059007761070005796999266306", + "ApproximateArrivalTimestamp": 1532664707.149, + "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9", + "PartitionKey": "1" + } + ], + "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==", + "MillisBehindLatest": 0 +} +``` + +Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the [KCL](https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-kcl.html). + +For purposes of validating the workflow, it may be simpler to locate the workflow in the [Step Function Management Console](https://console.aws.amazon.com/states/home) and assert the expected output is similar to the below examples. + +**Successful CNM Response Object Example:** + +```json +{ + "cnmResponse": { + "provider": "TestProvider", + "collection": "MOD09GQ", + "version": "123456", + "processCompleteTime": "2017-09-30T03:45:29.791198", + "submissionTime": "2017-09-30T03:42:29.791198", + "receivedTime": "2017-09-30T03:42:31.634552", + "identifier ": "testIdentifier123456", + "response": { + "status": "SUCCESS" + } + } +} +``` + +--- + +## Kinesis Record Error Handling + +### messageConsumer + +The default Kinesis stream processing in the Cumulus system is configured for record error tolerance. + +When the `messageConsumer` fails to process a record, the failure is captured and the record is published to the `kinesisFallback` SNS Topic. The `kinesisFallback` SNS topic broadcasts the record and a subscribed copy of the `messageConsumer` Lambda named `kinesisFallback` consumes these failures. + +At this point, the [normal Lambda asynchronous invocation retry behavior](https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html) will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a [dead letter queue](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html). Cumulus' dead letter queue is an SQS Queue named `kinesisFailure`. Operators can use this queue to inspect failed records. + +This system ensures when `messageConsumer` fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control. + +The Kinesis error handling system - the `kinesisFallback` SNS topic, `messageConsumer` Lambda, and `kinesisFailure` SQS queue - come with the API package and do not need to be configured by the operator. + +To examine records that were unable to be processed at any step you need to go look at the dead letter queue `{{prefix}}-kinesisFailure`. +Check the [Simple Queue Service (SQS) console](https://console.aws.amazon.com/sqs/home). Select your queue, and under the `Queue Actions` tab, you can choose `View/Delete Messages`. `Start polling` for messages and you will see records that failed to process through the `messageConsumer`. + +Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently. + +### Kinesis Stream logging + +#### Notification Stream messages + +Cumulus includes two Lambdas (`KinesisInboundEventLogger` and `KinesisOutboundEventLogger`) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs. + +When a `kinesis` rule is created, in addition to the `messageConsumer` event mapping, an event mapping is created to trigger `KinesisInboundEventLogger` to record a log of the inbound record, to allow for analysis in case of unexpected failure. + +#### Response Stream messages + +Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the `KinesisOutboundEventLogger` Lambda that targets your `response-endpoint`. You can do this in the Lambda management page for `KinesisOutboundEventLogger`. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow: + +![Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda](../assets/KinesisLambdaTriggerConfiguration.png) + +Once this is done, all records sent to the `response-endpoint` will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see [creating an event source mapping](https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-eventsourcemapping). diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/error-handling.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/error-handling.md new file mode 100644 index 00000000000..695f1d7257a --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/error-handling.md @@ -0,0 +1,265 @@ +--- +id: error-handling +title: Error Handling in Workflows +hide_title: false +--- + +Cumulus Workflow error handling is configurable via AWS Step Function +definitions, which enable users to configure what the state machine does next +when an exception is thrown. Read more in the AWS docs: +[How Step Functions Works: Error Handling](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html). + +Cumulus Workflow Tasks _should_ throw errors and rely on the state machine logic +to handle the error state. Errors and exceptions thrown in Cumulus Workflow +Tasks using the Cumulus Message Adapter (CMA) are caught and rethrown by the CMA +libraries _unless_ the error name contains `WorkflowError`. + +The former (tasks which throw errors which are not `WorkflowError`s) is the +expected behavior. However a `WorkflowError` can be used to handle errors that +should _not_ result in task failure. + +## Workflow Configuration + +Some best practices for error handling in Cumulus Workflows are: + +- States should include a `Catch` configuration object which defines the + `ResultPath` to be `$.exception`. This passes along the entire Cumulus message + to the next state with the addition of the `Error` and `Cause` details of the + thrown error in the `exception` key. Excluding this `Catch` configuration + means that any execution records saved for your failed workflows will not + include details about the exceptions. +- States may be configured to `Retry` a task on specified failures to handle + transient issues, such as those arising from resource allocation throttling, + instead of failing the entire workflow. Cumulus supports the AWS retry + configuration outlined + [here](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html) + and an example is provided in the `HelloWorld` step of the `RetryPassWorkflow` + workflow defined in the Cumulus repository's + [example workflow `retry_pass_workflow`](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/retry_pass_workflow.tf). +- Tasks downstream of failed tasks should understand how to pass along + exceptions if required. If a task throws an error which is caught by the + workflow configuration and passed to another state which also uses the CMA, + the CMA overrides the exception key to `"None"` so the exception will not be + passed to downstream tasks after the next state. This works if the exception + is not needed in downstream tasks. If the exception is needed in downstream + tasks, you need to re-attach the exception to the Cumulus message by setting + the `ResultPath` to be `$.exception` for the task where the error is initially + caught. In the example below, `CnmResponseFail` catches and re-attaches the + error to the message. +- If multiple downstream tasks should run after a workflow task has thrown an + error, you can create a separate failure branch of your workflow by chaining + tasks that catch and re-attach the error as described above. +- Tasks that are lambdas should be configured to retry in the event of a Lambda + Service Exception. See + [this documentation](https://docs.aws.amazon.com/step-functions/latest/dg/bp-lambda-serviceexception.html) + on configuring your workflow to handle transient lambda errors. + +**Example `state machine definition`:** + +```json +{ + "Comment": "Tests Workflow from Kinesis Stream", + "StartAt": "TranslateMessage", + "States": { + "TranslateMessage": { + "Parameters": { + "cma": { + "event.$": "$", + "task_config": { + "cumulus_message": { + "outputs": [ + { + "source": "{$.cnm}", + "destination": "{$.meta.cnm}" + }, + { + "source": "{$}", + "destination": "{$.payload}" + } + ] + } + } + } + }, + "Type": "Task", + "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "Catch": [ + { + "ErrorEquals": ["States.ALL"], + "ResultPath": "$.exception", + "Next": "CnmResponseFail" + } + ], + "Next": "SyncGranule" + }, + "SyncGranule": { + "Parameters": { + "cma": { + "event.$": "$", + "ReplaceConfig": { + "Path": "$.payload", + "TargetPath": "$.payload" + }, + "task_config": { + "provider": "{$.meta.provider}", + "buckets": "{$.meta.buckets}", + "collection": "{$.meta.collection}", + "downloadBucket": "{$.meta.buckets.private.name}", + "stack": "{$.meta.stack}", + "cumulus_message": { + "outputs": [ + { + "source": "{$.granules}", + "destination": "{$.meta.input_granules}" + }, + { + "source": "{$}", + "destination": "{$.payload}" + } + ] + } + } + } + }, + "Type": "Task", + "Resource": "${module.cumulus.sync_granule_task.task_arn}", + "Retry": [ + { + "ErrorEquals": ["States.ALL"], + "IntervalSeconds": 10, + "MaxAttempts": 3 + } + ], + "Catch": [ + { + "ErrorEquals": ["States.ALL"], + "ResultPath": "$.exception", + "Next": "CnmResponseFail" + } + ], + "Next": "CnmResponse" + }, + "CnmResponse": { + "Parameters": { + "cma": { + "event.$": "$", + "task_config": { + "OriginalCNM": "{$.meta.cnm}", + "CNMResponseStream": "{$.meta.cnmResponseStream}", + "region": "us-east-1", + "WorkflowException": "{$.exception}", + "cumulus_message": { + "outputs": [ + { + "source": "{$}", + "destination": "{$.meta.cnmResponse}" + }, + { + "source": "{$}", + "destination": "{$.payload}" + } + ] + } + } + } + }, + "Type": "Task", + "Resource": "${aws_lambda_function.cnm_response_task.arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "Catch": [ + { + "ErrorEquals": ["States.ALL"], + "ResultPath": "$.exception", + "Next": "WorkflowFailed" + } + ], + "Next": "WorkflowSucceeded" + }, + "CnmResponseFail": { + "Parameters": { + "cma": { + "event.$": "$", + "task_config": { + "OriginalCNM": "{$.meta.cnm}", + "CNMResponseStream": "{$.meta.cnmResponseStream}", + "region": "us-east-1", + "WorkflowException": "{$.exception}", + "cumulus_message": { + "outputs": [ + { + "source": "{$}", + "destination": "{$.meta.cnmResponse}" + }, + { + "source": "{$}", + "destination": "{$.payload}" + } + ] + } + } + } + }, + "Type": "Task", + "Resource": "${aws_lambda_function.cnm_response_task.arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "Catch": [ + { + "ErrorEquals": ["States.ALL"], + "ResultPath": "$.exception", + "Next": "WorkflowFailed" + } + ], + "Next": "WorkflowFailed" + }, + "WorkflowSucceeded": { + "Type": "Succeed" + }, + "WorkflowFailed": { + "Type": "Fail", + "Cause": "Workflow failed" + } + } +} +``` + +The above results in a workflow which is visualized in the diagram below: + +![Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures](../assets/kinesis-workflow.png) + +## Summary + +Error handling should (mostly) be the domain of workflow configuration. diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/hello-world.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/hello-world.md new file mode 100644 index 00000000000..57fc532638d --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/hello-world.md @@ -0,0 +1,92 @@ +--- +id: hello-world +title: HelloWorld Workflow +hide_title: false +--- + +Example task meant to be a sanity check/introduction to the Cumulus workflows. + +## Pre-Deployment Configuration + +### Workflow Configuration + +A workflow definition can be found in the [template repository hello_world_workflow module](https://github.com/nasa/cumulus-template-deploy/blob/master/cumulus-tf/hello_world_workflow.tf). + +```json +{ + "Comment": "Returns Hello World", + "StartAt": "HelloWorld", + "States": { + "HelloWorld": { + "Parameters": { + "cma": { + "event.$": "$", + "task_config": { + "buckets": "{$.meta.buckets}", + "provider": "{$.meta.provider}", + "collection": "{$.meta.collection}" + } + } + }, + "Type": "Task", + "Resource": "${module.cumulus.hello_world_task.task_arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "End": true + } + } +} +``` + +Workflow **error-handling** can be configured as discussed in the [Error-Handling](error-handling.md) cookbook. + +### Task Configuration + +The HelloWorld task is provided for you as part of the `cumulus` terraform module, no configuration is needed. + +If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at [`cumulus/tf-modules/ingest/hello-world-task.tf`](https://github.com/nasa/cumulus/tf-modules/ingest/hello-world-task.tf). The Lambda source code is located in the Cumulus source code at ['cumulus/tasks/hello-world'](https://github.com/nasa/cumulus/tasks/hello-world/). + +## Execution + +We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow. + +Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the `Rules` page and click `Add a rule`. + +```json +{ + "collection": { # collection values can be configured and found on the Collections page + "name": "${collection_name}", + "version": "${collection_version}" + }, + "name": "helloworld_rule", + "provider": "${provider}", # found on the Providers page + "rule": { + "type": "onetime" + }, + "state": "ENABLED", + "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page +} +``` + +![Screenshot of AWS Step Function execution graph for the HelloWorld workflow](../assets/hello_world_workflow.png) +_Executed workflow as seen in AWS Console_ + +### Output/Results + +The `Executions` page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here. + +To get some deeper information on the execution, click on the value in the `Name` column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events. + +## Summary + +Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard. diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/ingest-notifications.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/ingest-notifications.md new file mode 100644 index 00000000000..5b0cae8ba99 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/ingest-notifications.md @@ -0,0 +1,149 @@ +--- +id: ingest-notifications +title: Ingest Notification in Workflows +hide_title: false +--- + +On deployment, an [SQS queue](https://aws.amazon.com/sqs/) and three [SNS topics](https://aws.amazon.com/sns/), one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow. + +The ingest notification reporting SQS queue is populated via a [Cloudwatch rule for any Step Function execution state transitions](https://docs.aws.amazon.com/step-functions/latest/dg/cw-events.html). The `sfEventSqsToDbRecords` Lambda consumes this queue. The queue and Lambda are included in the `cumulus` module and the Cloudwatch rule in the `workflow` module and are included by default in a Cumulus deployment. + +The `sfEventSqsToDbRecords` Lambda function reads from the `sfEventSqsToDbRecordsInputQueue` queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure). + +![Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions](../assets/interfaces.svg) + +## Sending SQS messages to report status + +### Publishing granule/PDR reports directly to the SQS queue + +If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object. + +Posting directly to the queue will require knowing the queue URL. Assuming that you are using the [`cumulus` module](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus) for your deployment, you can get the queue URL by adding them to `outputs.tf` for your Terraform deployment [as in our example deployment](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/outputs.tf): + +```hcl +output "stepfunction_event_reporter_queue_url" { + value = module.cumulus.stepfunction_event_reporter_queue_url +} + +output "report_executions_sns_topic_arn" { + value = module.cumulus.report_executions_sns_topic_arn +} +output "report_granules_sns_topic_arn" { + value = module.cumulus.report_granules_sns_topic_arn +} +output "report_pdrs_sns_topic_arn" { + value = module.cumulus.report_pdrs_sns_topic_arn +} +``` + +Then, when you run `terraform deploy`, you should see the topic ARNs printed to your console: + +```bash +Outputs: +... +stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/-sfEventSqsToDbRecordsInputQueue +report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:-report-executions-topic +report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:-report-executions-topic +report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:-report-pdrs-topic +``` + +Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a [Cloudwatch Step Function event](https://docs.aws.amazon.com/step-functions/latest/dg/cw-events.html) containing a Cumulus message. For `SUCCEEDED` events, the Cumulus message is expected to be in `detail.output`. For all other events statuses, a Cumulus Message is expected in `detail.input`. The Cumulus Message populating these fields **MUST** be a JSON string, not an object. **Messages that do not conform to the schemas will fail to be created as records**. + +If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda: + +- `/aws/lambda/-sfEventSqsToDbRecords` + +### In a workflow + +As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so **you should not include a workflow step to publish the initial or final status of your workflows**. + +However, if you want to report your ingest status at any point **during a workflow execution**, you can add a workflow step using the `SfSqsReport` Lambda. In the following example from [`cumulus-tf/parse_pdr_workflow.tf`](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/parse_pdr_workflow.tf), the `ParsePdr` workflow is configured to use the `SfSqsReport` Lambda, primarily to update the PDR ingestion status. + +:::info + +`${sf_sqs_report_task_arn}` is an interpolated value referring to a Terraform resource. See the example deployment code for the [`ParsePdr` workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/parse_pdr_workflow.tf). + +::: + +```json + "PdrStatusReport": { + "Parameters": { + "cma": { + "event.$": "$", + "ReplaceConfig": { + "FullMessage": true + }, + "task_config": { + "cumulus_message": { + "input": "{$}" + } + } + } + }, + "ResultPath": null, + "Type": "Task", + "Resource": "${sf_sqs_report_task_arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "Catch": [ + { + "ErrorEquals": [ + "States.ALL" + ], + "ResultPath": "$.exception", + "Next": "WorkflowFailed" + } + ], + "Next": "WaitForSomeTime" + }, +``` + +## Subscribing additional listeners to SNS topics + +Additional listeners to SNS topics can be configured in a `.tf` file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (`test_lambda`) to receive messages from the `report_executions` SNS topic. To subscribe to the `report_granules` or `report_pdrs` SNS topics instead, simply replace `report_executions` in the code block below with either of those values. + +```hcl +resource "aws_lambda_function" "test_lambda" { + function_name = "${var.prefix}-testLambda" + filename = "./testLambda.zip" + source_code_hash = filebase64sha256("./testLambda.zip") + handler = "index.handler" + role = module.cumulus.lambda_processing_role_arn + runtime = "nodejs10.x" +} + +resource "aws_sns_topic_subscription" "test_lambda" { + topic_arn = module.cumulus.report_executions_sns_topic_arn + protocol = "lambda" + endpoint = aws_lambda_function.test_lambda.arn +} + +resource "aws_lambda_permission" "test_lambda" { + action = "lambda:InvokeFunction" + function_name = aws_lambda_function.test_lambda.arn + principal = "sns.amazonaws.com" + source_arn = module.cumulus.report_executions_sns_topic_arn +} +``` + +### SNS message format + +Subscribers to the SNS topics can expect to find the published message in the [SNS event](https://docs.aws.amazon.com/lambda/latest/dg/eventsources.html#eventsources-sns) at `Records[0].Sns.Message`. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the `record` property and the event type as the `event` property. + +The ingest notification record of the execution, granule, or PDR should conform to the [data model schema for the given record type](https://github.com/nasa/cumulus/tree/master/packages/api/lib/schemas.js). + +## Summary + +Workflows can be configured to send SQS messages at any point using the `sf-sqs-report` task. + +Additional listeners can be easily configured to trigger when messages are sent to the SNS topics. diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/queue-post-to-cmr.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/queue-post-to-cmr.md new file mode 100644 index 00000000000..39c213c91cb --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/queue-post-to-cmr.md @@ -0,0 +1,132 @@ +--- +id: queue-post-to-cmr +title: Queue PostToCmr +hide_title: false +--- + +In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup. + +## Overview + +The general concept is that the last task of the ingest workflow will be `QueueWorkflow`, which queues the publish workflow. The publish workflow contains the `PostToCmr` task and if a CMR error occurs during `PostToCmr`, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the `QueueWorkflow` task again in the publish workflow. The following diagram demonstrates this queueing process. + +![Diagram of workflow queueing](../assets/queue-workflow.png) + +## Ingest Workflow + +The last step should be the `QueuePublishWorkflow` step. It should be configured with a queueUrl and workflow. In this case, the `queueUrl` is a [throttled queue](./throttling-queued-executions). Any `queueUrl` can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. `PublishWorkflow`). + +```json + "QueuePublishWorkflowStep": { + "Parameters": { + "cma": { + "event.$": "$", + "ReplaceConfig": { + "FullMessage": true + }, + "task_config": { + "internalBucket": "{$.meta.buckets.internal.name}", + "stackName": "{$.meta.stack}", + "workflow": "{$.meta.workflow}", + "queueUrl": "${start_sf_queue_url}", + "provider": "{$.meta.provider}", + "collection": "{$.meta.collection}", + "childWorkflowMeta": { "file_etags": "{$.meta.file_etags}", "staticValue": "aStaticValue" } + } + } + }, + "Type": "Task", + "Resource": "${queue_workflow_task_arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "Catch": [ + { + "ErrorEquals": [ + "States.ALL" + ], + "ResultPath": "$.exception", + "Next": "WorkflowFailed" + } + ], + "End": true + }, +``` + +## Publish Workflow + +Configure the Catch section of your `PostToCmr` task to proceed to `QueueWorkflow` if a `CMRInternalError` is caught. Any other error will cause the workflow to fail. + +```json + "Catch": [ + { + "ErrorEquals": [ + "CMRInternalError" + ], + "Next": "RequeueWorkflow" + }, + { + "ErrorEquals": [ + "States.ALL" + ], + "Next": "WorkflowFailed", + "ResultPath": "$.exception" + } + ], +``` + +Then, configure the `QueueWorkflow` task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error. + +```json +{ + "RequeueWorkflow": { + "Parameters": { + "cma": { + "event.$": "$", + "task_config": { + "buckets": "{$.meta.buckets}", + "distribution_endpoint": "{$.meta.distribution_endpoint}", + "workflow": "PublishGranuleQueue", + "queueUrl": "${start_sf_queue_url}", + "provider": "{$.meta.provider}", + "collection": "{$.meta.collection}", + "childWorkflowMeta": { "file_etags": "{$.meta.file_etags}", "staticValue": "aStaticValue" } + } + } + }, + "Type": "Task", + "Resource": "${queue_workflow_task_arn}", + "Catch": [ + { + "ErrorEquals": [ + "States.ALL" + ], + "Next": "WorkflowFailed", + "ResultPath": "$.exception" + } + ], + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "End": true + } +} + ``` diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/run-tasks-in-lambda-or-docker.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/run-tasks-in-lambda-or-docker.md new file mode 100644 index 00000000000..85ee07b0791 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/run-tasks-in-lambda-or-docker.md @@ -0,0 +1,159 @@ +--- +id: run-tasks-in-lambda-or-docker +title: Run Step Function Tasks in AWS Lambda or Docker +hide_title: false +--- + +## Overview + +[AWS Step Function Tasks](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-tasks.html) can run tasks on [AWS Lambda](https://aws.amazon.com/lambda/) or on [AWS Elastic Container Service (ECS)](https://aws.amazon.com/ecs/) as a Docker container. + +Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type. + +## When to use Lambda + +You should use AWS Lambda whenever all of the following are true: + +- The task runs on one of the supported [Lambda Runtimes](https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html). At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET. +- The lambda package is less than 50 MB in size, zipped. +- The task consumes less than each of the following resources: + - 3008 MB memory allocation + - 512 MB disk storage (must be written to `/tmp`) + - 15 minutes of execution time + +:::info + +See [this page](https://docs.aws.amazon.com/lambda/latest/dg/limits.html) for a complete and up-to-date list of AWS Lambda limits. + +::: + +If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with [`cumulus-ecs-task`](https://github.com/nasa/cumulus-ecs-task). + +## Step Function Activities and `cumulus-ecs-task` + +[Step Function Activities](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-activities.html) enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the [`cumulus-ecs-task`](https://github.com/nasa/cumulus-ecs-task) activity worker. The `cumulus-ecs-task` worker runs as a Docker container on the Cumulus ECS cluster. + +The `cumulus-ecs-task` container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see `--lambdaArn` in the example below). This ARN argument is defined at deployment time. The `cumulus-ecs-task` worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment. + +## Example: Replacing AWS Lambda with a Docker container run on ECS + +This example will use an already-defined workflow from the `cumulus` module that includes the [`QueueGranules` task](https://github.com/nasa/cumulus/blob/master/tf-modules/ingest/queue-granules-task.tf) in its configuration. + +The following example is an excerpt from the [Discover Granules workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/discover_granules_workflow.asl.json) containing the step definition for the `QueueGranules` step: + +:::note interpolated values + +`${ingest_granule_workflow_name}` and `${queue_granules_task_arn}` are interpolated values that refer to Terraform resources. See the example deployment code for the [Discover Granules workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/discover_granules_workflow.tf). + +::: + +```json + "QueueGranules": { + "Parameters": { + "cma": { + "event.$": "$", + "ReplaceConfig": { + "FullMessage": true + }, + "task_config": { + "provider": "{$.meta.provider}", + "internalBucket": "{$.meta.buckets.internal.name}", + "stackName": "{$.meta.stack}", + "granuleIngestWorkflow": "${ingest_granule_workflow_name}", + "queueUrl": "{$.meta.queues.startSF}" + } + } + }, + "Type": "Task", + "Resource": "${queue_granules_task_arn}", + "Retry": [ + { + "ErrorEquals": [ + "Lambda.ServiceException", + "Lambda.AWSLambdaException", + "Lambda.SdkClientException" + ], + "IntervalSeconds": 2, + "MaxAttempts": 6, + "BackoffRate": 2 + } + ], + "Catch": [ + { + "ErrorEquals": [ + "States.ALL" + ], + "ResultPath": "$.exception", + "Next": "WorkflowFailed" + } + ], + "End": true + }, +``` + +Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new `.tf` file or updating an existing one): + +- A `aws_sfn_activity` resource: + +```hcl +resource "aws_sfn_activity" "queue_granules" { + name = "${var.prefix}-QueueGranules" +} +``` + +- An instance of the `cumulus_ecs_service` module (found on the [Cumulus releases page](https://github.com/nasa/cumulus/releases) configured to provide the `QueueGranules` task: + +```hcl + +module "queue_granules_service" { + source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip" + + prefix = var.prefix + name = "QueueGranules" + + cluster_arn = module.cumulus.ecs_cluster_arn + desired_count = 1 + image = "cumuluss/cumulus-ecs-task:1.9.0" + + cpu = 400 + memory_reservation = 700 + + environment = { + AWS_DEFAULT_REGION = data.aws_region.current.name + } + command = [ + "cumulus-ecs-task", + "--activityArn", + aws_sfn_activity.queue_granules.id, + "--lambdaArn", + module.cumulus.queue_granules_task.task_arn, + "--lastModified", + module.cumulus.queue_granules_task.last_modified_date + ] + alarms = { + MemoryUtilizationHigh = { + comparison_operator = "GreaterThanThreshold" + evaluation_periods = 1 + metric_name = "MemoryUtilization" + statistic = "SampleCount" + threshold = 75 + } + } +} +``` + +:::note + +If you have updated the code for the Lambda specified by `--lambdaArn`, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code. + +::: + +- An updated [Discover Granules workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/discover_granules_workflow.asl.json)) to utilize the new resource (the `Resource` key in the `QueueGranules` step has been updated to: + +`"Resource": "${aws_sfn_activity.queue_granules.id}"`)` + +If you then run this workflow in place of the `DiscoverGranules` workflow, the `QueueGranules` step would run as an ECS task instead of a lambda. + +## Final note + +Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the [AWS Service Integrations](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html) page. diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/sips-workflow.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/sips-workflow.md new file mode 100644 index 00000000000..153e2fc19a4 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/sips-workflow.md @@ -0,0 +1,142 @@ +--- +id: sips-workflow +title: Science Investigator-led Processing Systems (SIPS) +hide_title: false +--- + +The Cumulus ingest workflow supports the SIPS workflow. In the following document, we'll discuss what a SIPS workflow is and how to set one up in a Cumulus instance. + +In this document, we assume the user already has a provider endpoint configured and ready with some data to ingest. We'll be using an S3 provider and ingesting from a MOD09GQ collection. + +## Setup + +### Provider + +We need to have a [provider](../configuration/data-management-types.md#providers) from whom data can be ingested. Our provider is an S3 provider hosted in the `cumulus-test-internal` bucket. + +![Screenshot of Cumulus dashboard screen for configuring an S3 provider](../assets/sips-provider.png) + +### Collection + +We need to build a collection. Details on collections can be found +[here](../configuration/data-management-types.md#collections). The following collection will have +`MOD09GQ` as a collection name, `006` as a version. + +```json +{ + "name": "MOD09GQ", + "version": "006", + "process": "modis", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "granuleIdExtraction": "(MOD09GQ\\..*)(\\.hdf|\\.cmr|_ndvi\\.jpg)", + "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$", + "files": [ + { + "bucket": "protected", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}" + }, + { + "bucket": "private", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf\\.met$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met" + }, + { + "bucket": "protected-2", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.cmr\\.xml$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml" + }, + { + "bucket": "public", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}_ndvi\\.jpg$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg" + } + ], + "duplicateHandling": "replace", + "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}", +} +``` + +### Rule + +Finally, let's create a [rule](../configuration/data-management-types.md#rules). In this example +we're just going to create a `onetime` throw-away rule that will be easy to test +with. This rule will kick off the `DiscoverAndQueuePdrs` workflow, which is the +beginning of a Cumulus SIPS workflow: + +![Screenshot of a Cumulus rule configuration](../assets/add_rule.png) + +:::note + +A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar. + +::: + +## DiscoverAndQueuePdrs Workflow + +This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the `cumulus` terraform module for use in your workflows: + +1. DiscoverPdrs - [source](https://github.com/nasa/cumulus/tree/master/tasks/discover-pdrs) +2. QueuePdrs - [source](https://github.com/nasa/cumulus/tree/master/tasks/queue-pdrs) + +![Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console](../assets/sips-discover-and-queue-pdrs-execution.png) + +_An example workflow module configuration can be viewed in the Cumulus source for the [discover_and_queue_pdrs_workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/discover_and_queue_pdrs_workflow.tf)._ + +:::note + +To use this example workflow module as a template for a new workflow in your deployment the `source` key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our [release](https://github.com/nasa/cumulus/releases) page, as all of the provided Cumulus workflows are internally self-referential. + +::: + +## ParsePdr Workflow + +The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. **NOTE** that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR. + +The lambdas below are included in the `cumulus` terraform module for use in your workflows: + +1. ParsePdr - [source](https://github.com/nasa/cumulus/tree/master/tasks/parse-pdr) +2. QueueGranules - [source](https://github.com/nasa/cumulus/tree/master/tasks/queue-granules) +3. CheckStatus - [source](https://github.com/nasa/cumulus/tree/master/tasks/pdr-status-check) + +![Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console](../assets/sips-parse-pdr.png) + +_An example workflow module configuration can be viewed in the Cumulus source for the [parse_pdr_workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/parse_pdr_workflow.tf)._ + +:::note + +To use this example workflow module as a template for a new workflow in your deployment the `source` key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our [release](https://github.com/nasa/cumulus/releases) page, as all of the provided Cumulus workflows are internally self-referential. + +::: + +## IngestGranule Workflow + +The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR. + +The lambdas below are included in the `cumulus` terraform module for use in your workflows: + +1. SyncGranule - [source](https://github.com/nasa/cumulus/tree/master/tasks/sync-granule). +2. CmrStep - [source](https://github.com/nasa/cumulus/tree/master/tasks/post-to-cmr) + +Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step. + +:::tip + +Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations). + +::: + +![Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console](../assets/sips-ingest-granule.png) + +_An example workflow module configuration can be viewed in the Cumulus source for the [ingest_and_publish_granule_workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/ingest_and_publish_granule_workflow.tf)._ + +:::note + +To use this example workflow module as a template for a new workflow in your deployment the `source` key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our [release](https://github.com/nasa/cumulus/releases) page, as all of the provided Cumulus workflows are internally self-referential. + +::: + +## Summary + +In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR. diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/throttling-queued-executions.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/throttling-queued-executions.md new file mode 100644 index 00000000000..36c38990ee5 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/throttling-queued-executions.md @@ -0,0 +1,200 @@ +--- +id: throttling-queued-executions +title: Throttling queued executions +hide_title: false +--- + +In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue. + +We will also review the architecture of this feature and highlight some implementation notes. + +Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider. + +## Implementing the queue + +### Create and deploy the queue + +#### Add a new queue + +In a `.tf` file for your [Cumulus deployment](./../deployment/#deploy-the-cumulus-instance), add a new SQS queue: + +```hcl +resource "aws_sqs_queue" "background_job_queue" { + name = "${var.prefix}-backgroundJobQueue" + receive_wait_time_seconds = 20 + visibility_timeout_seconds = 60 +} +``` + +#### Set maximum executions for the queue + +Define the `throttled_queues` variable for the `cumulus` module in your [Cumulus deployment](./../deployment/#deploy-the-cumulus-instance) to specify the maximum concurrent executions for the queue. + +```hcl +module "cumulus" { + # ... other variables + + throttled_queues = [{ + url = aws_sqs_queue.background_job_queue.id, + execution_limit = 5 + }] +} +``` + +#### Setup consumer for the queue + +Add the `sqs2sfThrottle` Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis. + +:::caution + +You **must use the `sqs2sfThrottle` Lambda as the consumer for any queue with a queue execution limit** or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready. + +::: + +`aws_sqs_queue.background_job_queue.id` refers to the [queue resource defined above](#add-a-new-queue). + +```hcl +resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" { + schedule_expression = "rate(1 minute)" +} + +resource "aws_cloudwatch_event_target" "background_job_queue_watcher" { + rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name + arn = module.cumulus.sqs2sfThrottle_lambda_function_arn + input = jsonencode({ + messageLimit = 500 + queueUrl = aws_sqs_queue.background_job_queue.id + timeLimit = 60 + }) +} + +resource "aws_lambda_permission" "background_job_queue_watcher" { + action = "lambda:InvokeFunction" + function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn + principal = "events.amazonaws.com" + source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn +} +``` + +#### Re-deploy your Cumulus application + +Follow the instructions to [re-deploy your Cumulus application](./../deployment/upgrade-readme#update-cumulus-resources). After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template): + +```json +{ + "cumulus_meta": { + "queueExecutionLimits": { + "": 5 + } + } +} +``` + +### Integrate your queue with workflows and/or rules + +#### Integrate queue with queuing steps in workflows + +For any workflows using `QueueGranules` or `QueuePdrs` that you want to use your new queue, update the Cumulus configuration of those steps in your workflows. + +As seen in this partial configuration for a `QueueGranules` step, update the `queueUrl` to reference the new throttled queue: + +:::note ingest_granule_workflow_name + +`${ingest_granule_workflow_name}` is an interpolated value referring to a Terraform resource. See the example deployment code for the [`DiscoverGranules` workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/discover_granules_workflow.tf). + +::: + +```json +{ + "QueueGranules": { + "Parameters": { + "cma": { + "event.$": "$", + "ReplaceConfig": { + "FullMessage": true + }, + "task_config": { + "queueUrl": "${aws_sqs_queue.background_job_queue.id}", + "provider": "{$.meta.provider}", + "internalBucket": "{$.meta.buckets.internal.name}", + "stackName": "{$.meta.stack}", + "granuleIngestWorkflow": "${ingest_granule_workflow_name}" + } + } + } + } +} +``` + +Similarly, for a `QueuePdrs` step: + +:::note parse_pdr_workflow_name + +`${parse_pdr_workflow_name}` is an interpolated value referring to a Terraform resource. See the example deployment code for the [`DiscoverPdrs` workflow](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/discover_and_queue_pdrs_workflow.tf). + +::: + +```json +{ + "QueuePdrs": { + "Parameters": { + "cma": { + "event.$": "$", + "ReplaceConfig": { + "FullMessage": true + }, + "task_config": { + "queueUrl": "${aws_sqs_queue.background_job_queue.id}", + "provider": "{$.meta.provider}", + "collection": "{$.meta.collection}", + "internalBucket": "{$.meta.buckets.internal.name}", + "stackName": "{$.meta.stack}", + "parsePdrWorkflow": "${parse_pdr_workflow_name}" + } + } + } + } +} +``` + +After making these changes, [re-deploy your Cumulus application](./../deployment/upgrade-readme#update-cumulus-resources) for the execution throttling to take effect on workflow executions queued by these workflows. + +#### Create/update a rule to use your new queue + +Create or update a rule definition to include a `queueUrl` property that refers to your new queue: + +```json +{ + "name": "s3_provider_rule", + "workflow": "DiscoverAndQueuePdrs", + "provider": "s3_provider", + "collection": { + "name": "MOD09GQ", + "version": "006" + }, + "rule": { + "type": "onetime" + }, + "state": "ENABLED", + "queueUrl": "" // configure rule to use your queue URL +} +``` + +After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue. + +## Architecture + +![Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit](../assets/queued-execution-throttling.png) + +Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the `sqs2sfThrottle` Lambda attempts to increment the semaphore and responds as follows: + +- If the increment operation is successful, then the count was not at the maximum and an execution is started +- If the increment operation fails, then the count was already at the maximum so no execution is started + +## Final notes + +Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting: + +- The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started. +- If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently **will be applied to all of the executions scheduled across all of those workflows/rules**. +- If you are scheduling the same workflow both via a queue with a `maxExecutions` value and a queue without a `maxExecutions` value, **only the executions scheduled via the queue with the `maxExecutions` value will be limited to the maximum**. diff --git a/website/versioned_docs/version-v18.3.5/data-cookbooks/tracking-files.md b/website/versioned_docs/version-v18.3.5/data-cookbooks/tracking-files.md new file mode 100644 index 00000000000..2ba43b03ef2 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/data-cookbooks/tracking-files.md @@ -0,0 +1,91 @@ +--- +id: tracking-files +title: Tracking Ancillary Files +hide_title: false +--- + +## Contents + +- [Contents](#contents) + - [Introduction](#introduction) + - [File types](#file-types) + - [File Type Configuration](#file-type-configuration) + - [CMR Metadata](#cmr-metadata) + - [Common Use Cases](#common-use-cases) + +### Introduction + +This document covers setting up ingest to track primary and ancillary files under various file types, which will carry through to the CMR and granule record. +Cumulus has a number of options for tracking files in granule records, and in CMR metadata under certain metadata elements or with specified file types. +We will cover Cumulus file types, file type configuration, effects on CMR metadata, and some common use case examples. + +### File types + +Cumulus follows the Cloud Notification Mechanism (CNM) file type conventions. Under this schema, there are four data types: + +- `data` +- `browse` +- `metadata` +- `qa` + +In Cumulus, these data types are mapped to the `Type` attribute on `RelatedURL`s for UMM-G metadata, or used to map +resources to one of `OnlineAccessURL`, `OnlineResource` or `AssociatedBrowseImages` for ECHO10 XML metadata. + +### File Type Configuration + +File types for each file in a granule can be configured in a number of different ways, depending on the ingest type and workflow. +For more information, see the [ancillary metadata](../features/ancillary_metadata) documentation. + +### CMR Metadata + +When updating granule CMR metadata, the `UpdateGranulesCmrMetadataFileLinks` task will add the external facing URLs to the CMR metadata file based on the file type. +The table below shows how the CNM data types map to CMR metadata updates. Non-CNM file types are handled as 'data' file types. +The UMM-G column reflects the `RelatedURL`'s `Type` derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element. + +|CNM Type |UMM-G `RelatedUrl.Type` |ECHO10 Location | +| ------ | ------ | ------ | +| `ancillary` | `'VIEW RELATED INFORMATION'` | `OnlineResource` | +| `data` | `'GET DATA'`(HTTPS URL) or `'GET DATA VIA DIRECT ACCESS'`(S3 URI) | `OnlineAccessURL` | +| `browse` | `'GET RELATED VISUALIZATION'` | `AssociatedBrowseImage` | +| `linkage` | `'EXTENDED METADATA'` | `OnlineResource` | +| `metadata` | `'EXTENDED METADATA'` | `OnlineResource` | +| `qa` | `'EXTENDED METADATA'` | `OnlineResource` | + +### Common Use Cases + +This section briefly documents some common use cases and the recommended configuration for the file. +The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. +The other two cases covered in the [ancillary metadata](../features/ancillary_metadata) documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here. + +Configuring browse imagery: + +```json +{ + "bucket": "public", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg", + "type": "browse" +} +``` + +Configuring a documentation entry: + +```json +{ + "bucket": "protected", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf", + "type": "metadata" +} +``` + +Configuring other associated files (use types `metadata` or `qa` as appropriate): + +```json +{ + "bucket": "protected", + "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$", + "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt", + "type": "qa" +} +``` diff --git a/website/versioned_docs/version-v18.3.5/deployment/README.md b/website/versioned_docs/version-v18.3.5/deployment/README.md new file mode 100644 index 00000000000..82f152dfdc7 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/README.md @@ -0,0 +1,670 @@ +--- +id: deployment-readme +title: How to Deploy Cumulus +hide_title: false +--- + +## Overview + +This is a guide for deploying a new instance of Cumulus. + +This document assumes familiarity with Terraform. If you are not comfortable +working with Terraform, the following links should bring you up to speed: + +- [Introduction to Terraform](https://www.terraform.io/intro/index.html) +- [Getting Started with Terraform and Amazon Web Services (AWS)](https://learn.hashicorp.com/terraform/?track=getting-started#getting-started) +- [Terraform Configuration Language](https://www.terraform.io/docs/configuration/index.html) + +The process involves: + +- Creating [AWS S3 Buckets](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html) +- Configuring a VPC, if necessary +- Configuring an Earthdata application, if necessary +- Creating/configuring a [PostgreSQL compatible database](../deployment/postgres_database_deployment), and an AWS Secrets Manager secret to allow database access +- Creating a Lambda layer for the [Cumulus Message Adapter (CMA)](./../workflows/input_output.md#cumulus-message-adapter) +- Creating resources for your Terraform backend +- Using [Terraform](https://www.terraform.io) to deploy resources to AWS + +:::info + +Please note that internal and sensitive information is not in this public resource and you may have to visit our [Cumulus wiki](https://wiki.earthdata.nasa.gov/display/CUMULUS/Deployment) for NGAP access steps and other credentials. + +::: + +--- + +## Requirements + +### Linux/MacOS Software Requirements + +- git +- zip +- AWS CLI - [AWS Command Line Interface](https://aws.amazon.com/cli/) +- [Terraform](https://www.terraform.io) + +### Install Terraform + +It is recommended to keep a consistent version of Terraform as you deploy. Once your state files are migrated to a higher version, they are not always backwards compatible so integrators should pin their Terraform version. This is easily accomplished using the Terraform Version Manager [(tfenv)](https://github.com/tfutils/tfenv). If you have a Continuous Integration (CI) environment (or any other machine) that you are using to deploy the same stack, **you should pin your version across those machines as well**, otherwise you will run into errors trying to re-deploy from your local machine. + +If you are using a Mac and [Homebrew](https://brew.sh), installing tfenv is +as simple as: + +```shell +brew update +brew install tfenv +``` + +For other cases, installation instructions are available to follow along [here](https://github.com/tfutils/tfenv#installation). + +```shell + $ tfenv install 1.5.3 +[INFO] Installing Terraform v1.5.3 +... +[INFO] Switching completed + +$ tfenv use 1.5.3 +[INFO] Switching to v1.5.3 +... +[INFO] Switching completed +``` + +It is recommended to stay on the Cumulus Core TF version which can be found [here](https://github.com/nasa/cumulus/blob/master/example/.tfversion). Any changes to that will be noted in the [release notes](https://github.com/nasa/cumulus/releases). + +To verify your Terraform version, run: + +```shell +$ terraform --version +Terraform v1.5.3 +``` + +### Credentials + +- [CMR](https://earthdata.nasa.gov/about/science-system-description/eosdis-components/common-metadata-repository) username and password. CMR credentials must be provided if you are exporting metadata to CMR with Earthdata Login authentication. +- [NASA Launchpad](https://launchpad.nasa.gov). Launchpad credentials must be provided if you are using Launchpad authentication to export metadata to CMR or to authenticate with the Cumulus API. For more information on how to authenticate go to [Launchpad Authentication](https://wiki.earthdata.nasa.gov/display/CUMULUS/Launchpad+Authentication). +- [Earthdata Login](https://earthdata.nasa.gov/about/science-system-description/eosdis-components/earthdata-login) username and password. User must have the ability to administer and/or create applications in URS. It's recommended to obtain an account in the test environment (UAT). + +### Needed Git Repositories + +- [Cumulus Deployment Template](https://github.com/nasa/cumulus-template-deploy) +- [Cumulus Dashboard](https://github.com/nasa/cumulus-dashboard) + +--- + +## Prepare Deployment Repository + +:::info existing configured repo + + If you already are working with an existing repository that is configured appropriately for the version of Cumulus you intend to deploy or update, skip to [Prepare AWS Configuration.](#prepare-aws-configuration). + +::: + +Clone the [`cumulus-template-deploy`](https://github.com/nasa/cumulus-template-deploy) repo and name appropriately for your organization: + +```bash + git clone https://github.com/nasa/cumulus-template-deploy +``` + +We will return to [configuring this repo and using it for deployment below](#deploy-the-cumulus-instance). + +
+ Optional: Create a new repository + + [Create a new repository](https://help.github.com/articles/creating-a-new-repository/) on Github so that you can add your workflows and other modules to source control: + +```bash + git remote set-url origin https://github.com/nasa/ + git push origin master +``` + +You can then [add/commit](https://help.github.com/articles/adding-a-file-to-a-repository-using-the-command-line/) changes as needed. + +:::caution Update Your Gitignore File + +If you are pushing your deployment code to a git repo, make sure to add `terraform.tf` and `terraform.tfvars` to `.gitignore`, **as these files will contain sensitive data related to your AWS account**. + +::: + +
+ +--- + +## Prepare AWS Configuration + +### Set Access Keys + +You need to make some AWS information available to your environment. If you don't already have the access key and secret access key of an AWS user with IAM Create-User permissions, you must [create access keys](https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html) for such a user with IAM Create-User permissions, then export the access keys: + +```bash + export AWS_ACCESS_KEY_ID= + export AWS_SECRET_ACCESS_KEY= + export AWS_REGION= +``` + +If you don't want to set environment variables, [access keys can be stored locally via the AWS CLI.](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) + +### Create S3 Buckets + +See [creating S3 buckets](deployment/create_bucket.md) for more information on how to create a bucket. + +The following S3 bucket should be created (replacing `` with whatever you'd like, generally your organization/DAAC's name): + +- `-internal` + +You can create additional S3 buckets based on the needs of your workflows. + +These buckets do not need any non-default permissions to function with Cumulus; however, your local security requirements may vary. + +:::caution naming S3 buckets + +S3 bucket object names are global and must be unique across all accounts/locations/etc. + +::: + +### VPC, Subnets, and Security Group + +Cumulus supports operation within a VPC, but you will need to separately create: + +- VPC +- Subnet +- Security group +- VPC endpoints for the various services used by Cumulus if you wish to route traffic through the VPC + +These resources only need to be created once per AWS account and their IDs will be used to configure your Terraform deployment. + +#### Elasticsearch in a VPC + +Amazon Elasticsearch Service (ES) does not use a VPC Endpoint. To use ES within a VPC, before deploying run: + +```shell +aws iam create-service-linked-role --aws-service-name es.amazonaws.com +``` + +This operation only needs to be done once per account, but it must be done for both NGAP and regular AWS environments. + +### Look Up ECS-optimized AMI (DEPRECATED) + +:::info + +This step is unnecessary if you using the latest changes in the [`cumulus-template-deploy` repo which will automatically determine the AMI ID for you +based on your `deploy_to_ngap` variable](https://github.com/nasa/cumulus-template-deploy/commit/8472e2f3a7185d77bb68bf9e0f21a92a91b0cba9). + +::: + +Look up the recommended machine image ID for the Linux version and AWS region of your deployment. See [Linux Amazon ECS-optimized AMIs docs](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html#ecs-optimized-ami-linux). The image ID, beginning with `ami-`, will be assigned to the `ecs_cluster_instance_image_id` variable for the [cumulus-tf module](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/variables.tf). + +### Set Up EC2 Key Pair (Optional) + +The key pair will be used to SSH into your EC2 instance(s). It is recommended to [create or import a key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) and specify it in your Cumulus deployment. + +This can also be done post-deployment by redeploying your Cumulus instance. + +--- + +## Configure Earthdata Application + +The Cumulus stack can authenticate with [Earthdata Login](https://urs.earthdata.nasa.gov/documentation). If you want to use this functionality, you must create and register a new Earthdata application. Use the [User Acceptance Tools (UAT) site](https://uat.urs.earthdata.nasa.gov) unless you intend use a different URS environment (which will require updating the `urs_url` value shown below). + +Follow the directions on [how to register an application](https://wiki.earthdata.nasa.gov/display/EL/How+To+Register+An+Application). Use any url for the `Redirect URL`, it will be deleted in a later step. Also note the password in Step 3 and client ID in Step 4 use these to replace `urs_client_id` and `urs_client_password` in the `terraform.tfvars` for the `cumulus-tf` module shown below. + +--- + +## Create Resources for Terraform State + +:::info + +If you're re-deploying an existing Cumulus configuration you should skip to [Deploy the Cumulus instance](#deploy-the-cumulus-instance), as these values should already be configured. + +::: + +The state of the Terraform deployment is stored in S3. In the following examples, it will be assumed that state is being stored in a bucket called `my-tf-state`. You can also use an existing bucket, if desired. + +### Create the State Bucket + +```shell +aws s3api create-bucket --bucket my-tf-state +``` + +:::tip + +In order to help prevent loss of state information, **it is strongly recommended that versioning be enabled on the state bucket**. + +::: + +```shell +aws s3api put-bucket-versioning \ + --bucket my-tf-state \ + --versioning-configuration Status=Enabled +``` + +:::danger important: terraform state + +In order to reduce your risk of the corruption or loss of your Terraform state file, or otherwise corrupt your Cumulus deployment, please see the [Terraform Best Practices](terraform-best-practices.md) guide. + +However, unfortunately, if your state information does become lost or corrupt, then deployment (via `terraform apply`) will have unpredictable results, including possible loss of data and loss of deployed resources. + +::: + +### Create the Locks Table + +Terraform uses a lock stored in DynamoDB in order to prevent multiple simultaneous updates. In the following examples, that table will be called `my-tf-locks`. + +```shell +$ aws dynamodb create-table \ + --table-name my-tf-locks \ + --attribute-definitions AttributeName=LockID,AttributeType=S \ + --key-schema AttributeName=LockID,KeyType=HASH \ + --billing-mode PAY_PER_REQUEST \ + --region us-east-1 +``` + +--- + +## Configure the PostgreSQL Database + +Cumulus requires a [PostgreSQL compatible database](../deployment/postgres-database-deployment.md) cluster deployed to AWS. We suggest utilizing [RDS](https://docs.aws.amazon.com/rds/index.html). For further guidance about what type of RDS database to use, please [see the guide on choosing and configuring your RDS database](./choosing_configuring_rds.md). + +Cumulus provides a default [template and RDS cluster module](../deployment/postgres-database-deployment.md) utilizing Aurora Serverless. + +However, Core intentionally provides a "bring your own" approach, and any well-planned cluster setup should work, given the following: + +- Appropriate testing/evaluation is given to ensure the database capacity will scale and the database deployment will allow access to Cumulus's internal components. Core provides for security-group oriented permissions management via the `rds_security_group` configuration parameter. +- The database is configured such that its endpoint is accessible from the VPC and subnets configured for the Core deployment. +- An AWS Secrets Manager secret exists that has the following format: + +```json +{ + "database": "databaseName", + "host": "xxx", + "password": "defaultPassword", + "port": 5432, + "username": "xxx" +} +``` + +- `database` -- the PostgreSQL database used by the configured user +- `host` -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com +- `password` -- the database password +- `port` -- The database connection port, should always be 5432 +- `username` -- the database username + +This secret should provide access to a PostgreSQL database provisioned on the cluster. + +To configure Cumulus you will need: + +- The AWS Secrets Manager ARN for the _user_ Core will write with (e.g. `arn:aws:secretsmanager:AWS-REGION:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA` ) for use in configuring `rds_user_access_secret_arn`. +- (Optional) The security group ID that provides access to the cluster to configure `rds_security_group`. + +--- + +## Deploy the Cumulus Instance + +A typical Cumulus deployment is broken into two +[Terraform root modules](https://www.terraform.io/docs/configuration/modules.html): +[`data-persistence`](https://github.com/nasa/cumulus/tree/master/tf-modules/data-persistence) and [`cumulus`](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus). + +The `data-persistence` module should be deployed first. This module creates the Elasticsearch domain, DynamoDB tables, RDS database tables, and performs any structural updates needed on the RDS tables via migrations. During the RDS migration, duplicate tables will be deployed by the `data-persistence` module in both DynamoDB and the RDS database. The `cumulus` module deploys the rest of Cumulus: distribution, API, ingest, workflows, etc. The `cumulus` module depends on the resources created in the `data-persistence` deployment. + +Each of these modules have to be deployed independently and require their own Terraform backend, variable, and output settings. The template deploy repo that was cloned previously already contains the scaffolding of the necessary files for the deployment of each module: `data-persistence-tf` deploys the `data-persistence` module and `cumulus-tf` deploys the `cumulus` module. For reference on the files that are included, see the [documentation on adding components to a Terraform deployment](components.md#adding-components-to-your-terraform-deployment). + +### Troubleshooting + +:::tip + +Please see our [troubleshooting documentation for any issues with your deployment](../troubleshooting/troubleshooting-deployment) when performing the upcoming steps. + +::: + +### Configure and Deploy the `data-persistence-tf` Root Module + +These steps should be executed in the `data-persistence-tf` directory of the template deploy repo that you previously cloned. Run the following to copy the example files. + +```shell +cd data-persistence-tf/ +cp terraform.tf.example terraform.tf +cp terraform.tfvars.example terraform.tfvars +``` + +In `terraform.tf`, configure the remote state settings by substituting the appropriate values for: + +- `bucket` +- `dynamodb_table` +- `PREFIX` (whatever prefix you've chosen for your deployment) + +Fill in the appropriate values in `terraform.tfvars`. See the [`data-persistence` module variable definitions](https://github.com/nasa/cumulus/blob/master/tf-modules/data-persistence/variables.tf) for more detail on each variable. + +Consider [the size of your Elasticsearch cluster](#elasticsearch) when configuring `data-persistence`. + +:::tip + +Elasticsearch is optional and can be disabled using `include_elasticsearch = false` in your `terraform.tfvars`. Your Cumulus Dashboard and endpoints querying Elasticsearch will not work without Elasticsearch. + +::: + +:::note reminder + +If you are including `subnet_ids` in your `terraform.tfvars`, Elasticsearch will need a service-linked role to deploy successfully. Follow the [instructions above](#elasticsearch-in-a-vpc) to create the service-linked role if you haven't already. + +::: + +#### Initialize Terraform + +Run `terraform init`[^3] + +You should see an output like: + +```shell +* provider.aws: version = "~> 2.32" + +Terraform has been successfully initialized! +``` + +#### Deploy + +Run `terraform apply` to deploy your data persistence resources. Then type `yes` when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see an output like: + +```shell +Apply complete! Resources: 16 added, 0 changed, 0 destroyed. + +Outputs: + +dynamo_tables = { + "access_tokens" = { + "arn" = "arn:aws:dynamodb:us-east-1:12345:table/prefix-AccessTokensTable" + "name" = "prefix-AccessTokensTable" + } + # ... more tables ... +} +elasticsearch_alarms = [ + { + "arn" = "arn:aws:cloudwatch:us-east-1:12345:alarm:prefix-es-vpc-NodesLowAlarm" + "name" = "prefix-es-vpc-NodesLowAlarm" + }, + # ... more alarms ... +] +elasticsearch_domain_arn = arn:aws:es:us-east-1:12345:domain/prefix-es-vpc +elasticsearch_hostname = vpc-prefix-es-vpc-abcdef.us-east-1.es.amazonaws.com +elasticsearch_security_group_id = sg-12345 +``` + +Your data persistence resources are now deployed. + +### Deploy the Cumulus Message Adapter Layer (DEPRECATED) + +:::info + +This step is unnecessary if you using the latest changes in the [`cumulus-template-deploy` repo which will automatically download the Cumulus Message Adapter and create the layer for you based on your `cumulus_message_adapter_version` variable](https://github.com/nasa/cumulus-template-deploy/commit/8472e2f3a7185d77bb68bf9e0f21a92a91b0cba9). + +::: + +The [Cumulus Message Adapter (CMA)](./../workflows/input_output.md#cumulus-message-adapter) is necessary for interpreting the input and output of Cumulus workflow steps. The CMA is now integrated with Cumulus workflow steps as a Lambda layer. + +To deploy a CMA layer to your account: + +1. Go to the [CMA releases page](https://github.com/nasa/cumulus-message-adapter/releases) and download the `cumulus-message-adapter.zip` for the desired release +2. Use the AWS CLI to publish your layer: + +```shell +$ aws lambda publish-layer-version \ + --layer-name prefix-CMA-layer \ + --region us-east-1 \ + --zip-file fileb:///path/to/cumulus-message-adapter.zip +{ + ... more output ... + "LayerVersionArn": "arn:aws:lambda:us-east-1:1234567890:layer:prefix-CMA-layer:1", + ... more output ... +} +``` + +Make sure to copy the `LayerVersionArn` of the deployed layer, as it will be used to configure the `cumulus-tf` deployment in the next step. + +### Configure and Deploy the `cumulus-tf` Root Module + +These steps should be executed in the `cumulus-tf` directory of the template repo that was cloned previously. + +```shell +cd cumulus-tf/ +cp terraform.tf.example terraform.tf +cp terraform.tfvars.example terraform.tfvars +``` + +In `terraform.tf`, configure the remote state settings by substituting the appropriate values for: + +- `bucket` +- `dynamodb_table` +- `PREFIX` (whatever prefix you've chosen for your deployment) + +Fill in the appropriate values in `terraform.tfvars`. See the [Cumulus module variable definitions](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/variables.tf) for more detail on each variable. + +Notes on specific variables: + +- **`deploy_to_ngap`**: This variable controls the provisioning of certain resources and policies that are specific to an NGAP environment. **If you are deploying to NGAP, you must set this variable to `true`.** +- **`prefix`**: The value should be the same as the `prefix` from the data-persistence deployment. +- **`data_persistence_remote_state_config`**: This object should contain the remote state values that you configured in `data-persistence-tf/terraform.tf`. These settings allow `cumulus-tf` to determine the names of the resources created in `data-persistence-tf`. +- **`rds_security_group`**: The ID of the security group used to allow access to the PostgreSQL database +- **`rds_user_access_secret_arn`**: The ARN for the Secrets Manager secret that provides database access information +- **`cumulus_message_adapter_version`**: The version number (e.g. `1.3.0`) of the [Cumulus Message Adapter](https://github.com/nasa/cumulus-message-adapter/releases) to deploy +- **`key_name` (optional)**: The name of your key pair from [setting up your key pair](#set-up-ec2-key-pair-optional). Adding your `key_name` sets the EC2 keypair +for deployment's EC2 instances and allows you to connect to them via [SSH/SSM](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-sessions-start.html). + +Consider [the sizing of your Cumulus instance](#cumulus-instance-sizing) when configuring your variables. + +### Choose a Distribution API + +#### Default Configuration + +If you are deploying from the Cumulus Deployment Template or a configuration based on that repo, the Thin Egress App (TEA) distribution app will be used by default. + +#### Configuration Options + +Cumulus can be configured to use either TEA or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the [Deployment Template](https://github.com/nasa/cumulus-template-deploy). + +:::note + +If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use +the new API gateway. + +::: + +#### Configure the Thin Egress App + +TEA can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps [in the TEA documentation](./thin_egress_app) to configure distribution in your `cumulus-tf` deployment. + +#### Configure the Cumulus Distribution API (Optional) + +If you would prefer to use the Cumulus Distribution API, which supports [AWS Cognito authentication](https://aws.amazon.com/cognito/), follow [these steps](./cumulus_distribution) to configure distribution in your `cumulus-tf` deployment. + +### Initialize Terraform + +Follow the [above instructions to initialize Terraform](#initialize-terraform) using `terraform init`[^3]. + +### Deploy + +Run `terraform apply` to deploy the resources. Type `yes` when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this: + +```shell +Apply complete! Resources: 292 added, 0 changed, 0 destroyed. + +Outputs: + +archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token +archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/ +distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login +distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/ +``` + +:::caution + +Cumulus deploys API Gateways for the Archive and Distribution APIs. In production environments these must be behind CloudFront distributions using HTTPS connections. + +::: + +### Update Earthdata Application + +Add the two redirect URLs to your EarthData login application by doing the following: + +1. Login to URS +2. Under My Applications -> Application Administration -> use the edit icon of your application +3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment + - e.g. `archive_api_redirect_uri = https://.execute-api.us-east-1.amazonaws.com/dev/token` +4. Also add the Distribution url + - e.g. `distribution_redirect_uri = https://.execute-api.us-east-1.amazonaws.com/dev/login`[^1] +5. You may delete the placeholder url you used to create the application + +If you've lost track of the needed redirect URIs, they can be located on the [API Gateway](https://console.aws.amazon.com/apigateway). Once there, select `-archive` and/or `-thin-egress-app-EgressGateway`, `Dashboard` and utilizing the base URL at the top of the page that is accompanied by the text `Invoke this API at:`. Make sure to append `/token` for the archive URL and `/login` to the thin egress app URL. + +:::caution + +In production environments, the API Gateway URLs must be replaced with CloudFront distributions using HTTPS connections to ensure Data In Transit compliance. + +::: + +--- + +## Deploy Cumulus Dashboard + +### Dashboard Requirements + +:::info what you will need + +The requirements are similar to the [Cumulus stack deployment requirements](#requirements). The installation instructions below include a step that will install/use the required node version referenced in the `.nvmrc` file in the Dashboard repository. + +::: + +- git +- [node 12.18](https://nodejs.org/en/) (use [nvm](https://github.com/creationix/nvm) to upgrade/downgrade) +- [npm](https://www.npmjs.com/get-npm) +- zip +- AWS CLI - [AWS Command Line Interface](https://aws.amazon.com/cli/) +- python + +### Prepare AWS + +**Create S3 Bucket for Dashboard:** + +- Create it, e.g. `-dashboard`. Use the command line or console as you did when [preparing AWS configuration](#prepare-aws-configuration). +- Configure the bucket to host a website: + - AWS S3 console: Select `-dashboard` bucket then, "Properties" -> "Static Website Hosting", point to `index.html` + - CLI: `aws s3 website s3://-dashboard --index-document index.html` +- The bucket's url will be `http://-dashboard.s3-website-.amazonaws.com` or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint" +- Ensure the bucket's access permissions allow your deployment user access to write to the bucket + +### Install Dashboard + +To install the Cumulus Dashboard, clone the [repository](https://github.com/nasa/cumulus-dashboard) into the root `deploy` directory and install dependencies with `npm install`: + +```bash + git clone https://github.com/nasa/cumulus-dashboard + cd cumulus-dashboard + nvm use + npm install +``` + +If you do not have the correct version of node installed, replace `nvm use` with `nvm install $(cat .nvmrc)` in the above example. + +#### Dashboard Versioning + +By default, the `master` branch will be used for Dashboard deployments. The `master` branch of the repository contains the most recent stable release of the Cumulus Dashboard. + +If you want to test unreleased changes to the Dashboard, use the `develop` branch. + +Each [release/version of the Dashboard](https://github.com/nasa/cumulus-dashboard/releases) will have [a tag in the Dashboard repo](https://github.com/nasa/cumulus-dashboard/tags). Release/version numbers will use semantic versioning (major/minor/patch). + +To checkout and install a specific version of the Dashboard: + +```bash + git fetch --tags + git checkout # e.g. v1.2.0 + nvm use + npm install +``` + +If you do not have the correct version of node installed, replace `nvm use` with `nvm install $(cat .nvmrc)` in the above example. + +### Building the Dashboard + +:::caution + +These environment variables are available during the build: `APIROOT`, `DAAC_NAME`, `STAGE`, `HIDE_PDR`. Any of these can be set on the command line to override the values contained in `config.js` when running the build below. + +::: + +To configure your dashboard for deployment, set the `APIROOT` environment variable to your app's API root.[^2] + +Build your dashboard from the Cumulus Dashboard repository root directory, `cumulus-dashboard`: + +```bash + APIROOT= npm run build +``` + +### Dashboard Deployment + +Deploy your dashboard to S3 bucket from the `cumulus-dashboard` directory: + +Using AWS CLI: + +```bash + aws s3 sync dist s3://-dashboard +``` + +From the S3 Console: + +- Open the `-dashboard` bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'. + +You should be able to visit the Dashboard website at `http://-dashboard.s3-website-.amazonaws.com` or find the url +`-dashboard` -> "Properties" -> "Static website hosting" -> "Endpoint" and log in with a user that you had previously configured for access. + +:::caution + +In production environments, the dashboard must be behind a CloudFront distributions using an HTTPS connection to ensure Data In Transit compliance. + +::: + +--- + +## Cumulus Instance Sizing + +The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size. + +:::tip aws cost calculator + +Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a [pricing calculator](https://calculator.aws/#/) for estimating costs. + +::: + +### Elasticsearch + +The [mappings file](https://github.com/nasa/cumulus/blob/master/packages/es-client/config/mappings.json) contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored. + +AWS provides [documentation](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html) on calculating and configuring for sizing. + +In addition to size you'll want to consider the [number of nodes](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains-dedicatedmasternodes.html) which determine how the system reacts in the event of a failure. + +Configuration can be done in the [data persistence module](https://github.com/nasa/cumulus/blob/master/tf-modules/data-persistence/variables.tf#L16) in `elasticsearch_config` and the [cumulus module](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/variables.tf#L541) in `es_index_shards`. + +:::caution reindex after changes + +If you make changes to your Elasticsearch configuration you will need to [reindex](../troubleshooting/reindex-elasticsearch) for those changes to take effect. + +::: + +### EC2 Instances and Autoscaling Groups + +EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via [Cumulus deployment variables](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/variables.tf). + +When configuring your ECS cluster consider: + +- The [EC2 instance type](https://aws.amazon.com/ec2/instance-types/) and [EBS volume size](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_constraints.html) needed to accommodate your workloads. Configured as `ecs_cluster_instance_type` and `ecs_cluster_instance_docker_volume_size`. +- The minimum and desired number of instances on hand to accommodate your workloads. Configured as `ecs_cluster_min_size` and `ecs_cluster_desired_size`. +- The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as `ecs_cluster_max_size`. +- Your autoscaling parameters: `ecs_cluster_scale_in_adjustment_percent`, `ecs_cluster_scale_out_adjustment_percent`, `ecs_cluster_scale_in_threshold_percent`, and `ecs_cluster_scale_out_threshold_percent`. + +--- + +## Footnotes + +[^1]: To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs. +[^2]: The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the `AWS console -> Amazon API Gateway -> APIs -> -archive -> Dashboard`, and reading the URL at the top after "Invoke this API at" +[^3]: Run `terraform init` if: + - This is the first time deploying the module + - You have added any additional child modules, including [Cumulus components](./components.md#available-cumulus-components) + - You have updated the `source` for any of the child modules diff --git a/website/versioned_docs/version-v18.3.5/deployment/api_gateway_logging.md b/website/versioned_docs/version-v18.3.5/deployment/api_gateway_logging.md new file mode 100644 index 00000000000..a6a4900afbb --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/api_gateway_logging.md @@ -0,0 +1,83 @@ +--- +id: api-gateway-logging +title: API Gateway Logging +hide_title: false +--- + +## Enabling API Gateway Logging + +In order to enable distribution API Access and execution logging, configure the TEA deployment by setting `log_api_gateway_to_cloudwatch` on the `thin_egress_app` module: + +```hcl +log_api_gateway_to_cloudwatch = true +``` + +This enables the distribution API to send its logs to the default CloudWatch location: `API-Gateway-Execution-Logs_/` + +## Configure Permissions for API Gateway Logging to CloudWatch + +### Instructions: Enabling Account Level Logging from API Gateway to CloudWatch + +This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch. + +1. ### Create a policy document + + The `AmazonAPIGatewayPushToCloudWatchLogs` managed policy, with an ARN of `arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs`, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with `apigateway.amazonaws.com` as its trusted entity. + + Save this snippet as `apigateway-policy.json`. + + ```json + { + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "", + "Effect": "Allow", + "Principal": { + "Service": "apigateway.amazonaws.com" + }, + "Action": "sts:AssumeRole" + } + ] + } + ``` + +2. ### Create an account role to act as ApiGateway and write to CloudWatchLogs + + :::info in NGAP + + **NASA users in NGAP**: Be sure to use your account's permission boundary. + + ::: + + ```sh + aws iam create-role \ + --role-name ApiGatewayToCloudWatchLogs \ + [--permissions-boundary ] \ + --assume-role-policy-document file://apigateway-policy.json + ``` + + Note the ARN of the returned role for the last step. + +3. ### Attach correct permissions to role + + Next attach the `AmazonAPIGatewayPushToCloudWatchLogs` policy to the IAM role. + + ```sh + aws iam attach-role-policy \ + --role-name ApiGatewayToCloudWatchLogs \ + --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs" + ``` + +4. ### Update Account API Gateway settings with correct permissions + + Finally, set the IAM role ARN on the `cloudWatchRoleArn` property on your API Gateway Account settings. + + ```sh + aws apigateway update-account \ + --patch-operations op='replace',path='/cloudwatchRoleArn',value='' + ``` + +## Configure API Gateway CloudWatch Logs Delivery + +For details about configuring the API Gateway CloudWatch Logs delivery, see [Configure Cloudwatch Logs Delivery](configure_cloudwatch_logs_delivery.md). diff --git a/website/versioned_docs/version-v18.3.5/deployment/apis-introduction.mdx b/website/versioned_docs/version-v18.3.5/deployment/apis-introduction.mdx new file mode 100644 index 00000000000..94a2c9e2c73 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/apis-introduction.mdx @@ -0,0 +1,23 @@ +--- +id: apis-introduction +title: APIs +hide_title: false +--- + +import DocCardList from '@theme/DocCardList'; + +### Common Distribution APIs + +When deploying from the Cumulus Deployment Template or a configuration based on that repo, the Thin Egress App (TEA) distribution app will be used by default. However, you have the choice to use the Cumulus Distribution API as well. + +### Cumulus API Customization Use Cases + +Our Cumulus API offers you the flexibility to customize for your DAAC/organization. Below is a list of use cases that may help you with options: + +- [Cumulus API w/Launchpad Authentication](https://wiki.earthdata.nasa.gov/display/CUMULUS/Cumulus+API+with+Launchpad+Authentication) +- [Using Cumulus with Private APIs](https://wiki.earthdata.nasa.gov/display/CUMULUS/Using+Cumulus+with+Private+APIs) +- [Connecting to Cumulus Private APIs via socks5 proxy](https://wiki.earthdata.nasa.gov/display/CUMULUS/Connecting+to+Cumulus+Private+APIs+via+socks5+proxy) + +### Types of APIs + + diff --git a/website/versioned_docs/version-v18.3.5/deployment/choosing_configuring_rds.md b/website/versioned_docs/version-v18.3.5/deployment/choosing_configuring_rds.md new file mode 100644 index 00000000000..ffcb6a8ac45 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/choosing_configuring_rds.md @@ -0,0 +1,124 @@ +--- +id: choosing_configuring_rds +title: "RDS: Choosing and Configuring Your Database Type" +hide_title: false +--- + +## Background + +Cumulus uses a [PostgreSQL](https://www.postgresql.org/) database as its primary data store +for operational and archive records (e.g. collections, granules, etc). The Cumulus +core deployment code expects this database to be provided by the [AWS RDS](https://docs.aws.amazon.com/rds/index.html) service; however, it is agnostic about the type of the RDS database. + +RDS databases are broadly divided into two types: + +- **Provisioned**: Databases with a fixed capacity in terms of CPU and memory capacity. You can find +a list of the available database instance sizes in [this AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html). +- **Serverless**: Databases that can scale their CPU and memory capacity up and down in response to database load. [Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html) is the service which provides serverless RDS databases. + +## Provisioned vs. Serverless + +Generally speaking, the advantage of provisioned databases is that they **don't have to scale**. +As soon as they are deployed, they have the full capacity of your chosen instance size and are +ready for ingest operations. Of course, this advantage is also a downside: if you ever have a +sudden spike in database traffic, your database **can't scale** to accommodate that increased +load. + +On the other hand, serverless databases are designed precisely to **scale in response to load**. +While the ability of serverless databases to scale is quite useful, there can be complexity in +setting the scaling configuration to achieve the best results. Recommendations for Aurora serverless database scaling configuration are provided in the section [below](#recommended-scaling-configuration-for-aurora-serverless). + +To decide whether a provisioned or a serverless database is appropriate for your ingest +operations, you should consider the pattern of your data ingests. + +If you have a fairly steady, continuous rate of data ingest, then a provisioned database +may be appropriate because your database capacity needs should be consistent and the lack of +scaling shouldn't be an issue. + +If you have occasional, bursty ingests of data where you go from ingesting very little data +to suddenly ingesting quite a lot then a serverless database may be a better choice because it +will be able to handle the spikes in your database load. + +## General Configuration Guidelines + +### Cumulus Core Login Configuration + +Cumulus Core uses a `admin_db_login_secret_arn` and (optionally) `user_credentials_secret_arn` as inputs that allow various Cumulus components to act as a database administrator and/or read/write user. Those secrets should conform to the following format: + +```json +{ + "database": "postgres", + "dbClusterIdentifier": "clusterName", + "engine": "postgres", + "host": "xxx", + "password": "defaultPassword", + "port": 5432, + "username": "xxx", + "disableSSL": false, + "rejectUnauthorized": false, +} +``` + +- `database` -- the PostgreSQL database used by the configured user +- `dbClusterIdentifier` -- the value set by the `cluster_identifier` variable in the terraform module +- `engine` -- the Aurora/RDS database engine +- `host` -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com +- `password` -- the database password +- `username` -- the account username +- `port` -- The database connection port, should always be 5432 +- `rejectUnauthorized` -- If disableSSL is not set, set to false to allow self-signed certificates or non-supported CAs. Defaults to false. +- `disableSSL` - If set to true, disable use of SSL with Core database connections. Defaults to false. + +#### SSL encryption + +Current security policy/best practices require use of a SSL enabled configuration. Cumulus by default expects a configuration that includes a datastore that requires an SSL connection with a recognized certificate authority (RDS managed databases configured to use SSL will automatically work as AWS provides AWS CA bundles in the Lambda runtime environment). If deployed in an environment not making use of SSL, set `disableSSL` to `true` to disable this behavior. + +#### Self-Signed Certs + +Cumulus can accommodate a self-signed/unrecognized cert by setting `rejectUnauthorized` as `false` in the connection secret. This will result Core allowing use of certs without a valid CA. + +## Recommended Scaling Configuration for Aurora Serverless + +If you are going to use an Aurora Serverless RDS database, we recommend the following scaling recommendations: + +- Set the autoscaling timeout to 1 minute (currently the lowest allowed value) +- Set the database to force capacity change if the autoscaling timeout is reached + +The reason for these recommendations requires an understanding of Aurora Serverless scaling. +Aurora Serverless scaling works as described in [the Amazon Aurora documentation](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.how-it-works.html): + +> When it does need to perform a scaling operation, Aurora Serverless v1 first tries to identify a scaling point, a moment when no queries are being processed. + +However, during periods of heavy ingest, Cumulus will be continuously writing granules and other +records to the database, so a "scaling point" will never be reached. This is where the +"autoscaling timeout" setting becomes important. The "autoscaling timeout" is the amount of time +that Aurora will wait to find a "scaling point" before giving up. + +So with the above recommended settings, we are telling Aurora to only wait for a "scaling point" +for 1 minute and that if a "scaling point" cannot be found in that time, then we should +**force the database to scale anyway**. These settings effectively make the Aurora Serverless database scale as quickly as possible in response to increased database load. + +With forced scaling on databases, there is a consequence that some running queries or transactions +may be dropped. However, Cumulus write operations are written with automatic retry logic, so any +write operations that failed due to database scaling should be retried successfully. + +### Cumulus Serverless RDS Cluster Module + +Cumulus provides a Terraform module that will deploy an Aurora Serverless RDS cluster. If you are +using this module to create your RDS cluster, you can configure the autoscaling timeout action, +the cluster minimum and maximum capacity, and more as seen in the [supported variables for the module](https://github.com/nasa/cumulus/blob/6f104a89457be453809825ac2b4ac46985239365/tf-modules/cumulus-rds-tf/variables.tf). + +Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so +that value will have to be manually configured in the AWS console or CLI. + +## Optional: Manage RDS Database with pgAdmin + +### Setup SSM Port Forwarding + +:::note + +In order to perform this action you will need to deploy it within a VPC and have the credentials to access via NGAP protocols. + +::: + +For a walkthrough guide on how to utilize AWS's Session Manager for port forwarding to access the Cumulus RDS database go to the [Accessing Cumulus RDS database via SSM Port Forwarding](https://wiki.earthdata.nasa.gov/display/CUMULUS/Accessing+Cumulus+RDS+database+via+SSM+Port+Forwarding) article. diff --git a/website/versioned_docs/version-v18.3.5/deployment/components.md b/website/versioned_docs/version-v18.3.5/deployment/components.md new file mode 100644 index 00000000000..33bfd2330d8 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/components.md @@ -0,0 +1,77 @@ +--- +id: components +title: Component-based Cumulus Deployment +hide_title: false +--- + +Cumulus is now released in a modular architecture, which will allow users to +pick and choose the individual components that they want to deploy. These +components will be made available as [Terraform modules](https://www.terraform.io/docs/modules/index.html). + +Cumulus users will be able to add those individual components to their +deployment and link them together using Terraform. In addition, users will be +able to make use of the large number of publicly available modules on the [Terraform Module Registry](https://registry.terraform.io/). + +## Available Cumulus Components + +* [Cumulus](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus) +* [Data Persistence](https://github.com/nasa/cumulus/tree/master/tf-modules/data-persistence) +* [ECS Service](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus_ecs_service) +* [Distribution](https://github.com/nasa/cumulus/tree/master/tf-modules/distribution) +* [Thin Egress App](./thin_egress_app) +* [Cumulus Distribution App](./cumulus_distribution) +* [Workflow](https://github.com/nasa/cumulus/tree/master/tf-modules/workflow) + +## Adding components to your Terraform deployment + +Although Terraform components can be configured using a single file, it is recommended to +add the following files to your deployment: + +* **variables.tf** - [input variables](https://www.terraform.io/docs/configuration/variables.html) + used in your Terraform configuration +* **main.tf** - the contents of your deployment, mostly made up of + [module](https://www.terraform.io/docs/configuration/modules.html#calling-a-child-module) + statements and a + [provider configuration block](https://www.terraform.io/docs/configuration/providers.html#provider-configuration). +* **outputs.tf** - any [output values](https://www.terraform.io/docs/configuration/outputs.html) + to be returned by your deployment +* **terraform.tf** - contains [remote state](#remote-state) configuration, and + any other configuration of Terraform itself +* **terraform.tfvars** - + [variable definitions](https://www.terraform.io/docs/configuration/variables.html#variable-definitions-tfvars-files) + +**variables.tf**, **main.tf**, and **outputs.tf** should be stored in version +control, as they will be constant no matter what environment you are deploying +to. + +**terraform.tfvars** is going to contain environment-specific (and possibly +sensitive) values, so it should be added to **.gitignore**. + +**terraform.tf** is home to your +[Terraform-specific settings](https://www.terraform.io/docs/configuration/terraform.html). +This file will contain environment-specific values, so it should be added to +**.gitignore**. Unfortunately, `terraform` blocks +[can only contain constant values](https://www.terraform.io/docs/configuration/terraform.html#terraform-block-syntax); +they cannot reference variables defined in **terraform.tfvars**. + +An example of using Terraform to deploy components can be found in the [`example` directory](https://github.com/nasa/cumulus/tree/master/example) +of the Cumulus repo. + +## Remote State + +We suggest to follow the recommendations from the Terraform's [Remote State](https://www.terraform.io/docs/state/remote.html) +documentation: + +> By default, Terraform stores state locally in a file named `terraform.tfstate`. +> When working with Terraform in a team, use of a local file makes Terraform +> usage complicated because each user must make sure they always have the latest +> state data before running Terraform and make sure that nobody else runs +> Terraform at the same time. +> +> With remote state, Terraform writes the state data to a remote data store, +> which can then be shared between all members of a team. + +The recommended approach for handling remote state with Cumulus is to use the [S3 backend](https://www.terraform.io/docs/backends/types/s3.html). +This backend stores state in S3 and uses a DynamoDB table for locking. + +See the deployment documentation for a [walk-through of creating resources for your remote state using an S3 backend](README.md#create-resources-for-terraform-state). diff --git a/website/versioned_docs/version-v18.3.5/deployment/configure_cloudwatch_logs_delivery.md b/website/versioned_docs/version-v18.3.5/deployment/configure_cloudwatch_logs_delivery.md new file mode 100644 index 00000000000..c5d6f4b7232 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/configure_cloudwatch_logs_delivery.md @@ -0,0 +1,32 @@ +--- +id: cloudwatch-logs-delivery +title: Configure Cloudwatch Logs Delivery +hide_title: false +--- + +As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An [operator](https://nasa.github.io/cumulus/docs/glossary#operator) does this by configuring the `cumulus` module for [your deployment](../deployment/README.md#configure-and-deploy-the-cumulus-tf-root-module) as shown below. The value of the `log_destination_arn` variable is the ARN of a writeable log destination. + +The value can be either an [AWS::Logs::Destination](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-logs-destination.html) or a [Kinesis Stream](https://aws.amazon.com/kinesis/data-streams/) ARN to which your account can write. + +```hcl +log_destination_arn = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName] +``` + +## Logs Sent + +By default, the following logs will be sent to the destination when one is given. + +* Ingest logs +* Async Operation logs +* Thin Egress App API Gateway logs ([if configured](./api_gateway_logging.md)) + +## Additional Logs + +If additional logs are needed, you can configure `additional_log_groups_to_elk` with the Cloudwatch log groups you want to send to the destination. `additional_log_groups_to_elk` is a map with the key as a descriptor and the value with the Cloudwatch log group name. + +```hcl +additional_log_groups_to_elk = { + "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld" + "MyCustomTask" = "my-custom-task-log-group" +} +``` diff --git a/website/versioned_docs/version-v18.3.5/deployment/create_bucket.md b/website/versioned_docs/version-v18.3.5/deployment/create_bucket.md new file mode 100644 index 00000000000..667ad09b383 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/create_bucket.md @@ -0,0 +1,40 @@ +--- +id: create_bucket +title: Creating an S3 Bucket +hide_title: false +--- + +Buckets can be created on the command line with [AWS CLI][cli] or via the web interface on the [AWS console][web]. + +When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See [S3 Server Access Logging](../configuration/server_access_logging.md) for more details. + +## Command Line + +Using the [AWS Command Line Tool][cli] [create-bucket](https://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html) ``s3api`` subcommand: + +```bash +$ aws s3api create-bucket \ + --bucket foobar-internal \ + --region us-west-2 \ + --create-bucket-configuration LocationConstraint=us-west-2 +{ + "Location": "/foobar-internal" +} +``` + +:::info + +The `region` and `create-bucket-configuration` arguments are only necessary if you are creating a bucket outside of the `us-east-1` region. + +::: + +Please note security settings and other bucket options can be set via the options listed in the ``s3api`` documentation. + +Repeat the above step for each bucket to be created. + +## Web Interface + +If you prefer to use the AWS web interface instead of the command line, see [AWS "Creating a Bucket" documentation][web]. + +[cli]: https://aws.amazon.com/cli/ "Amazon Command Line Interface" +[web]: http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html "Amazon web console interface" diff --git a/website/versioned_docs/version-v18.3.5/deployment/cumulus-distribution.md b/website/versioned_docs/version-v18.3.5/deployment/cumulus-distribution.md new file mode 100644 index 00000000000..d2f37931d8a --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/cumulus-distribution.md @@ -0,0 +1,243 @@ +--- +id: cumulus_distribution +title: Using the Cumulus Distribution API +hide_title: false +--- + +The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3. + +:::tip + +If you need to access our quick reference materials while setting up or continuing to manage your API access go to the [Cumulus Distribution API Docs](https://nasa.github.io/cumulus-distribution-api/). + +::: + +## Configuring a Cumulus Distribution Deployment + +The Cumulus Distribution API is included in the main [Cumulus](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus_distribution) repo. It is available as part of the `terraform-aws-cumulus.zip` archive in the [latest release](https://github.com/nasa/cumulus/releases). + +These steps assume you're using the [Cumulus Deployment Template](https://github.com/nasa/cumulus-template-deploy/blob/master/cumulus-tf/main.tf) but they can also be used for custom deployments. + +To configure a deployment to use Cumulus Distribution: + + 1. Remove or comment the "Thin Egress App Settings" in the [Cumulus Template Deploy](https://github.com/nasa/cumulus-template-deploy/blob/master/cumulus-tf/main.tf) and enable the "Cumulus Distribution Settings". + 2. Delete or comment the contents of [thin_egress_app.tf](https://github.com/nasa/cumulus-template-deploy/blob/master/cumulus-tf/thin_egress_app.tf) and the corresponding Thin Egress App outputs in [outputs.tf](https://github.com/nasa/cumulus-template-deploy/blob/master/cumulus-tf/outputs.tf). These are not necessary for a Cumulus Distribution deployment. + 3. Uncomment the Cumulus Distribution outputs in [outputs.tf](https://github.com/nasa/cumulus-template-deploy/blob/master/cumulus-tf/outputs.tf). + 4. Rename `cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example` to `cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf`. + +## Cognito Application and User Credentials + +The major prerequisite for using the Cumulus Distribution API is to set up [Cognito](https://aws.amazon.com/cognito/). If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation. + +Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team. + +Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your `cumulus-tf/terraform.tfvars` file: + +* `csdap_host_url`: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event +* `csdap_client_id`: The client ID for the Cumulus application registered within the Cognito service +* `csdap_client_password`: The client password for the Cumulus application registered within the Cognito service + +Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials. + +## Cumulus Distribution URL + +Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form `//` (or `/path/to/file`, if using a custom bucket map, as explained further below). + +By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named `-distribution`, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (only in development) or (b) put an HTTPS CloudFront URL in front of your API Gateway (required in production, UAT, and SIT). + +In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your `cumulus-tf` module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be `cumulus_distribution_api_uri`, which is the URL for the private API Gateway. + +You may override this default URL by adding a `cumulus_distribution_url` variable to your `cumulus-tf/terraform.tfvars` file and setting it to one of the following values (both are explained below): + +1. The default URL, but with a port added to it, in order to allow you to configure tunneling (only in development) +2. An HTTPS CloudFront URL placed in front of your Cumulus Distribution API Gateway (required in production environments) + +The following subsections explain these approaches in turn. + +### Using Your Cumulus Distribution API Gateway URL as Your Distribution URL + +Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development). Here is an outline of the required steps with details provided further below: + +1. Create/import a key pair into your AWS EC2 service (if you haven't already done so), or setup auth using a PKCS11Provider +2. Add a reference to the name of the key pair to your Terraform variables (we'll set the `key_name` Terraform variable) +3. Choose an open local port on your machine (we'll use 9000 in the following example) +4. Add a reference to the value of your `cumulus_distribution_api_uri` (mentioned earlier), including your chosen port (we'll set the `cumulus_distribution_url` Terraform variable) +5. Redeploy Cumulus +6. Add an entry to your `/etc/hosts` file +7. Add a redirect URI to Cognito via the Cognito API +8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI) +9. Add a sample file to S3 to test downloading via Cognito + +### Setting up SSH Keypair + +:::note +Setting up a keypair is optional if your organization is making use of alternative authentication mechanisms built into the AMI +::: + +To create or import an existing key pair, you can use the AWS CLI (see AWS [ec2 import-key-pair](https://docs.aws.amazon.com/cli/latest/reference/ec2/import-key-pair.html)), or the AWS Console (see [Amazon EC2 key pairs and Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)). + +Once your key pair is added to AWS, add the following to your `cumulus-tf/terraform.tfvars` file: + +```plain +key_name = "" +cumulus_distribution_url = "https://.execute-api..amazonaws.com:/dev/" +``` + +where: + +* `` is the name of the key pair you just added to AWS +* `` and `` are the corresponding parts from your `cumulus_distribution_api_uri` output variable +* `` is your open local port of choice (9000 is typically a good choice) + +Once you save your variable changes, redeploy your `cumulus-tf` module. + +While your deployment runs, add the following entry to your `/etc/hosts` file, replacing `` with the host name of the `cumulus_distribution_url` Terraform variable you just added above: + +```plain +localhost +``` + +Next, you'll need to use the Cognito API to add the value of your `cumulus_distribution_url` Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details: + +* method: POST +* base URL: the value of your `csdap_host_url` Terraform variable +* path: /authclient/updateRedirectUri +* username: the value of your `csdap_client_id` Terraform variable +* password: the value of your `csdap_client_password` Terraform variable +* headers: Content-Type='application/x-www-form-urlencoded' +* body: redirect_uri=/login + +where `` is the value of your `cumulus_distribution_url` Terraform variable. Note the `/login` path at the end of the `redirect_uri` value. + +For reference, see the [Cognito Authentication Service API](https://wiki.earthdata.nasa.gov/display/ACAS/Cognito+Authentication+Service+API). + +Next, [install the Session Manager Plugin for the AWS CLI](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html). If running on macOS, and you use Homebrew, you can install it simply as follows: + +```bash +brew install --cask session-manager-plugin --no-quarantine +``` + +As your final setup step, add a sample file to one of the protected buckets listed in your `buckets` Terraform variable in your `cumulus-tf/terraform.tfvars` file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets. + +At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows: + +1. Determine your EC2 instance ID +2. Connect to the NASA VPN +3. Start an AWS SSM session +4. Open an SSH tunnel +5. Use a browser to navigate to your file + +To determine your EC2 instance ID for your Cumulus deployment, run the follow command where `` is the name of the appropriate AWS profile to use, and `` is the value of your `prefix` Terraform variable: + +```bash +aws --profile ec2 describe-instances --filters Name=tag:Deployment,Values= Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text +``` + +:::caution Connect to NASA VPN + +Before proceeding with the remaining steps, make sure you are connected to the NASA VPN. + +::: + +Use the value output from the command above in place of `` in the following command, which will start an SSM session: + +```bash +aws ssm start-session --target --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000 +``` + +If successful, you should see output similar to the following: + +```plain +Starting session with SessionId: NGAPShApplicationDeveloper-*** +Port 6000 opened for sessionId NGAPShApplicationDeveloper-***. +Waiting for connections... +``` + +In another terminal window, open a tunnel with port forwarding using your chosen port from above (e.g., 9000): + +#### Using SSH keypair authentication + +```bash +ssh -4 -p 6000 -N -L ::443 ec2-user@127.0.0.1 +``` + +##### Using PKCS11Provider + +```bash +ssh -4 -I /usr/lib/ssh-keychain.dylib -p 6000 -N -L ::443 @127.0.0.1 +``` + +where: + +* `` is the open local port you chose earlier (e.g., 9000) +* `` is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your `cumulus_distribution_url` Terraform variable above) + +Finally, use your chosen browser to navigate to `//`, where `` and `` reference the sample file you added to S3 above. + +If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, and then next enter a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin. + +Once you're finished testing, clean up as follows: + +1. Stop your SSH tunnel (enter `Ctrl-C`) +2. Stop your AWS SSM session (enter `Ctrl-C`) +3. If you like, disconnect from the NASA VPN + +While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next. + +### Using a CloudFront URL as Your Distribution URL + +In Production (OPS), and in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files. + +This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details: + +* The private, backing URL, which is the value of your `cumulus_distribution_api_uri` Terraform output value +* A request to add the AWS account's VPC to the whitelist + +Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your `cumulus-tf/terraform.tfvars` file: + +```plain +cumulus_distribution_url = +``` + +In addition, add a Cognito redirect URI, as detailed in the [previous section](#using-your-cumulus-distribution-api-gateway-url-as-your-distribution-url). Note that in this case, the value you'll use for `redirect_uri` is `/login` since the value of your `cumulus_distribution_url` is now your CloudFront URL. + +At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (`csdap_host_url`, `csdap_client_id`, and `csdap_client_password`). + +Redeploy Cumulus with your new/updated Terraform variables. + +As your final setup step, add a sample file to one of the protected buckets listed in your `buckets` Terraform variable in your `cumulus-tf/terraform.tfvars` file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets. + +Finally, use your chosen browser to navigate to `//`, where `` and `` reference the sample file you added to S3. + +If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin. + +## S3 Bucket Mapping + +An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link. + +The Cumulus Distribution API uses a `bucket_map.yaml` or `bucket_map.yaml.tmpl` file to determine which buckets to +serve. [See the examples](https://github.com/nasa/cumulus/tree/master/example/cumulus-tf/cumulus_distribution). + +The default Cumulus module generates a file at `s3://${system_bucket}/distribution_bucket_map.json`. + +The configuration file is a simple JSON mapping of the form: + +```json +{ + "daac-public-data-bucket": "/path/to/this/kind/of/data" +} +``` + +:::note cumulus bucket mapping + +Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map **must include mappings for all of the `protected` and `public` buckets specified in the `buckets` variable in `cumulus-tf/terraform.tfvars`**, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors. + +::: + +## Switching from the Thin Egress App to Cumulus Distribution + +If you have previously deployed the [Thin Egress App (TEA)](../deployment/thin-egress-app.md) as your distribution app, you can switch to Cumulus Distribution by following the steps above. + +Note, however, that the `cumulus_distribution` module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA. + +There will also be downtime while your API Gateway is updated. diff --git a/website/versioned_docs/version-v18.3.5/deployment/databases-introduction.mdx b/website/versioned_docs/version-v18.3.5/deployment/databases-introduction.mdx new file mode 100644 index 00000000000..36d98f949d5 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/databases-introduction.mdx @@ -0,0 +1,15 @@ +--- +id: databases-introduction +title: Databases +hide_title: false +--- + +import DocCardList from '@theme/DocCardList'; + +### Cumulus Core Database + +Cumulus uses a PostgreSQL database as its primary data store for operational and archive records (e.g. collections, granules, etc). We expect a PostgreSQL database to be provided by the AWS RDS service; however, there are two types of the RDS database which we will explore in the upcoming pages. + +### Types of Databases + + diff --git a/website/versioned_docs/version-v18.3.5/deployment/postgres-database-deployment.md b/website/versioned_docs/version-v18.3.5/deployment/postgres-database-deployment.md new file mode 100644 index 00000000000..75bfb3ffb61 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/postgres-database-deployment.md @@ -0,0 +1,481 @@ +--- +id: postgres_database_deployment +title: PostgreSQL Database Deployment +hide_title: false +--- + +## Overview + +Cumulus deployments require an Aurora [PostgreSQL 13](https://www.postgresql.org/) compatible database to be provided as the primary data store for Cumulus with Elasticsearch for non-authoritative querying/state data for the API and other applications that require more complex queries. Note that Cumulus is tested with an Aurora Postgres database. + +Users are *strongly* encouraged to plan for and implement a database solution that scales to their use requirements, meets their security posture and maintenance needs and/or allows for multi-tenant cluster usage. + +For some scenarios (such as single tenant, test deployments, infrequent ingest and the like) a properly +configured [Aurora Serverless](https://aws.amazon.com/rds/aurora/serverless/) cluster +*may* suffice. + +To that end, Cumulus provides a terraform module +[`cumulus-rds-tf`](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus-rds-tf) +that will deploy an AWS RDS Aurora Serverless PostgreSQL 13 compatible [database cluster](https://aws.amazon.com/rds/aurora/postgresql-features/), and optionally provision a single deployment database with credentialed secrets for use with Cumulus. + +We have provided an example terraform deployment using this module in the [Cumulus template-deploy repository](https://github.com/nasa/cumulus-template-deploy/tree/master/rds-cluster-tf) on GitHub. + +Use of this example involves: + +- Creating/configuring a [Terraform](https://www.terraform.io) module directory +- Using [Terraform](https://www.terraform.io) to deploy resources to AWS + +--- + +## Requirements + +Configuration/installation of this module requires the following: + +- [Terraform](https://www.terraform.io) +- git +- A VPC configured for use with Cumulus Core. This should match the subnets you provide when [Deploying Cumulus](./) to allow Core's lambdas to properly access the database. +- At least two subnets across multiple AZs. These should match the subnets you provide as configuration when [Deploying Cumulus](./), and should be within the same VPC. + +### Needed Git Repositories + +- [Cumulus Deployment Template](https://github.com/nasa/cumulus-template-deploy) + +### Assumptions + +#### OS/Environment + +The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported. + +#### Terraform + +This document assumes knowledge of Terraform. If you are not comfortable +working with Terraform, the following links should bring you up to speed: + +- [Introduction to Terraform](https://www.terraform.io/intro/index.html) +- [Getting Started with Terraform and AWS](https://learn.hashicorp.com/terraform/?track=getting-started#getting-started) +- [Terraform Configuration Language](https://www.terraform.io/docs/configuration/index.html) + +For Cumulus specific instructions on installation of Terraform, refer to the main [Cumulus Installation Documentation](../deployment/README.md#install-terraform). + +#### Aurora/RDS + +This document also assumes some basic familiarity with PostgreSQL databases and Amazon Aurora/RDS. If you're unfamiliar consider perusing the [AWS docs](https://aws.amazon.com/rds/aurora/) and the [Aurora Serverless V1 docs](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html). + +## Prepare Deployment Repository + +:::tip + + If you already are working with an existing repository that has a configured `rds-cluster-tf` deployment for the version of Cumulus you intend to deploy or update, *or* you need to only configure this module for your repository, skip to [Prepare AWS Configuration](postgres_database_deployment#prepare-aws-configuration). + +::: + +Clone the [`cumulus-template-deploy`](https://github.com/nasa/cumulus-template-deploy) repo and name appropriately for your organization: + +```bash + git clone https://github.com/nasa/cumulus-template-deploy +``` + +We will return to [configuring this repo and using it for deployment below](#configure-and-deploy-the-module). + +
+ Optional: Create a New Repository + + [Create a new repository](https://help.github.com/articles/creating-a-new-repository/) on GitHub so that you can add your workflows and other modules to source control: + +```bash + git remote set-url origin https://github.com// + git push origin master +``` + +You can then [add/commit](https://help.github.com/articles/adding-a-file-to-a-repository-using-the-command-line/) changes as needed. + +:::caution Update Your Gitignore File + +If you are pushing your deployment code to a git repo, make sure to add `terraform.tf` and `terraform.tfvars` to `.gitignore`, **as these files will contain sensitive data related to your AWS account**. + +::: + +
+ +--- + +## Prepare AWS Configuration + +To deploy this module, you need to make sure that you have the following steps from the [Cumulus deployment instructions](https://nasa.github.io/cumulus/docs/deployment/) in similar fashion *for this module*: + +- [Set access keys](https://nasa.github.io/cumulus/docs/deployment/#set-access-keys) +- [Create the state bucket](https://nasa.github.io/cumulus/docs/deployment/#create-the-state-bucket) +- [Create the locks table](https://nasa.github.io/cumulus/docs/deployment/#create-the-locks-table) + +--- + +### Configure and Deploy the Module + +When configuring this module, please keep in mind that unlike Cumulus deployment, **this module should be deployed once** to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. + +:::tip + +This module does not need to be re-deployed for each Core update. + +::: + +These steps should be executed in the `rds-cluster-tf` directory of the template deploy repo that you previously cloned. Run the following to copy the example files: + +```shell +cd rds-cluster-tf/ +cp terraform.tf.example terraform.tf +cp terraform.tfvars.example terraform.tfvars +``` + +In `terraform.tf`, configure the remote state settings by substituting the appropriate values for: + +- `bucket` +- `dynamodb_table` +- `PREFIX` (whatever prefix you've chosen for your deployment) + +Fill in the appropriate values in `terraform.tfvars`. See the [rds-cluster-tf module variable definitions](https://github.com/nasa/cumulus/tree/master/tf-modules/cumulus-rds-tf) for more detail on all of the configuration options. A few notable configuration options are documented in the next section. + +#### Configuration Options + +- `deletion_protection` -- defaults to `true`. Set it to `false` if you want to be able to delete your *cluster* with a terraform destroy without manually updating the cluster. +- `db_admin_username` -- cluster database administration username. Defaults to `postgres`. +- `db_admin_password` -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a [`random_string`](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/string) resource as input. +- `region` -- defaults to `us-east-1`. +- `subnets` -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your `lambda_subnet_ids`. +- `max_capacity` -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value. +- `min_capacity` -- the minimum ACUs the cluster will scale to +- `provision_user_database` -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the [next section](#provision-user-and-user-database). + +#### Provision User and User Database + +If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, *in addition to* managing the cluster itself, the following configuration keys are required: + +- `provision_user_database` -- must be set to `true`. This configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy. +- `permissions_boundary_arn` -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment. +- `rds_user_password` -- the value to set the user password to. +- `prefix` -- this value will be used to set a unique identifier for the `ProvisionDatabase` lambda, as well as name the provisioned user/database. + +Once configured, the module will deploy the lambda and run it on each provision thus creating the configured database (if it does not exist), updating the user password (if that value has been changed), and updating the output user database secret. + +Setting `provision_user_database` to false after provisioning will **not** result in removal of the configured database, as the lambda is non-destructive as configured in this module. + +:::note + +This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required. + +::: + +#### Initialize Terraform + +Run `terraform init` + +You should see a similar output: + +```shell +* provider.aws: version = "~> 2.32" + +Terraform has been successfully initialized! +``` + +#### Deploy + +Run `terraform apply` to deploy the resources. + +:::caution + +If re-applying this module, variables (e.g. `engine_version`, `snapshot_identifier` ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset **carefully** for resources that will be re-created/destroyed before applying. + +::: + +Review the changeset, and assuming it looks correct, type `yes` when prompted to confirm that you want to create all of the resources. + +Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user's database, lambdas, and security groups): + +
+ Output Example + +```shell +terraform apply + +An execution plan has been generated and is shown below. +Resource actions are indicated with the following symbols: + + create + +Terraform will perform the following actions: + + # module.rds_cluster.aws_db_subnet_group.default will be created + + resource "aws_db_subnet_group" "default" { + + arn = (known after apply) + + description = "Managed by Terraform" + + id = (known after apply) + + name = (known after apply) + + name_prefix = "xxxxxxxxx" + + subnet_ids = [ + + "subnet-xxxxxxxxx", + + "subnet-xxxxxxxxx", + ] + + tags = { + + "Deployment" = "xxxxxxxxx" + } + } + + # module.rds_cluster.aws_rds_cluster.cumulus will be created + + resource "aws_rds_cluster" "cumulus" { + + apply_immediately = true + + arn = (known after apply) + + availability_zones = (known after apply) + + backup_retention_period = 1 + + cluster_identifier = "xxxxxxxxx" + + cluster_identifier_prefix = (known after apply) + + cluster_members = (known after apply) + + cluster_resource_id = (known after apply) + + copy_tags_to_snapshot = false + + database_name = "xxxxxxxxx" + + db_cluster_parameter_group_name = (known after apply) + + db_subnet_group_name = (known after apply) + + deletion_protection = true + + enable_http_endpoint = true + + endpoint = (known after apply) + + engine = "aurora-postgresql" + + engine_mode = "serverless" + + engine_version = "10.12" + + final_snapshot_identifier = "xxxxxxxxx" + + hosted_zone_id = (known after apply) + + id = (known after apply) + + kms_key_id = (known after apply) + + master_password = (sensitive value) + + master_username = "xxxxxxxxx" + + port = (known after apply) + + preferred_backup_window = "07:00-09:00" + + preferred_maintenance_window = (known after apply) + + reader_endpoint = (known after apply) + + skip_final_snapshot = false + + storage_encrypted = (known after apply) + + tags = { + + "Deployment" = "xxxxxxxxx" + } + + vpc_security_group_ids = (known after apply) + + + scaling_configuration { + + auto_pause = true + + max_capacity = 4 + + min_capacity = 2 + + seconds_until_auto_pause = 300 + + timeout_action = "RollbackCapacityChange" + } + } + + # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created + + resource "aws_secretsmanager_secret" "rds_login" { + + arn = (known after apply) + + id = (known after apply) + + name = (known after apply) + + name_prefix = "xxxxxxxxx" + + policy = (known after apply) + + recovery_window_in_days = 30 + + rotation_enabled = (known after apply) + + rotation_lambda_arn = (known after apply) + + tags = { + + "Deployment" = "xxxxxxxxx" + } + + + rotation_rules { + + automatically_after_days = (known after apply) + } + } + + # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created + + resource "aws_secretsmanager_secret_version" "rds_login" { + + arn = (known after apply) + + id = (known after apply) + + secret_id = (known after apply) + + secret_string = (sensitive value) + + version_id = (known after apply) + + version_stages = (known after apply) + } + + # module.rds_cluster.aws_security_group.rds_cluster_access will be created + + resource "aws_security_group" "rds_cluster_access" { + + arn = (known after apply) + + description = "Managed by Terraform" + + egress = (known after apply) + + id = (known after apply) + + ingress = (known after apply) + + name = (known after apply) + + name_prefix = "cumulus_rds_cluster_access_ingress" + + owner_id = (known after apply) + + revoke_rules_on_delete = false + + tags = { + + "Deployment" = "xxxxxxxxx" + } + + vpc_id = "vpc-xxxxxxxxx" + } + + # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created + + resource "aws_security_group_rule" "rds_security_group_allow_postgres" { + + from_port = 5432 + + id = (known after apply) + + protocol = "tcp" + + security_group_id = (known after apply) + + self = true + + source_security_group_id = (known after apply) + + to_port = 5432 + + type = "ingress" + } + +Plan: 6 to add, 0 to change, 0 to destroy. + +Do you want to perform these actions? + Terraform will perform the actions described above. + Only 'yes' will be accepted to approve. + + Enter a value: yes + +module.rds_cluster.aws_db_subnet_group.default: Creating... +module.rds_cluster.aws_security_group.rds_cluster_access: Creating... +module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating... +``` + +Then, after the resources are created: + +```shell +Apply complete! Resources: X added, 0 changed, 0 destroyed. +Releasing state lock. This may take a few moments... + +Outputs: + +admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR +admin_db_login_secret_version = xxxxxxxxx +rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com +security_group_id = xxxxxxxxx +user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA +``` + +Note the output values for `admin_db_login_secret_arn` (and optionally `user_credentials_secret_arn`) as these provide the AWS Secrets Manager secrets required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well. + +The content of each of these secrets are in the form: + +```json +{ + "database": "postgres", + "dbClusterIdentifier": "clusterName", + "engine": "postgres", + "host": "xxx", + "password": "defaultPassword", + "port": 5432, + "username": "xxx" +} +``` + +- `database` -- the PostgreSQL database used by the configured user +- `dbClusterIdentifier` -- the value set by the `cluster_identifier` variable in the terraform module +- `engine` -- the Aurora/RDS database engine +- `host` -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com +- `password` -- the database password +- `username` -- the account username +- `port` -- The database connection port, should always be 5432 + +
+ +--- + +### Connect to PostgreSQL DB via pgAdmin + +If you would like to manage your PostgreSQL database in an GUI tool, you can via pgAdmin. + +#### Requirements + +- Install AWS CLI ([installation steps](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)) +- Install SSM AWS CLI plugin ([installation steps](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html)) + +#### SSH Setup in AWS Secrets Manager + +You will need to navigate to AWS Secrets Manager and retrieve the secret values for your database. The secret name will contain the string `_db_login` and your prefix. Click the "Retrieve secret value" button (![Retrieve secret value](../assets/pgadmin_retrieve_btn.png))to see the secret values. + +The value for your secret name can also be retrieved from the `data-persistence-tf` directory with the command `terraform output`. + +![pgAdmin values to retrieve](../assets/pgadmin_retrieve_values.png) + +#### Setup ~/.ssh/config + +Replace HOST value and PORT value with the values retrieved from Secrets Manager. + +The LocalForward number 9202 can be any unused LocalForward number in your SSH config: + +##### Using SSH keypair authentication + +```shell +Host ssm-proxy + Hostname 127.0.0.1 + User ec2-user + LocalForward 9202 [HOST value]:[PORT value] + IdentityFile ~/.ssh/id_rsa + Port 6868 +``` + +##### Using PKCS11Provider + +```shell +Host ssm-proxy + Hostname 127.0.0.1 + User ec2-user + LocalForward 9202 [HOST value]:[PORT value] + Port 6868 + PKCS11Provider /usr/lib/ssh-keychain.dylib +``` + +#### Create a Local Port Forward + +- Create a local port forward to the SSM box port 22, this creates a tunnel from `` to the SSH port on the SSM host. + +:::caution + +`` should not be `8000`. + +::: + +- Replace the following command values for `` with your instance ID: + +```shell +aws ssm start-session --target --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6868 +``` + +- Then, in another terminal tab, enter: + +```shell +ssh ssm-proxy +``` + +#### Create PgAdmin Server + +- Open pgAdmin and begin creating a new server (in newer versions it may be registering a new server). + +![Creating a pgAdmin server](../assets/pgadmin_create_server.png) + +- In the "Connection" tab, enter the values retrieved from Secrets Manager. Host name/address and Port should be the Hostname and LocalForward number from the ~/.ssh/config file. + +![pgAdmin server connection value entries](../assets/pgadmin_server_connection.png) + +:::note + +Maintenance database corresponds to "database". + +::: + +You can select "Save Password?" to save your password. Click "Save" when you are finished. You should see your new server in pgAdmin. + +#### Query Your Database + +- In the "Browser" area find your database, navigate to the name, and click on it. + +- Select the "Query Editor" to begin writing queries to your database. + +![Using the query editor in pgAdmin](../assets/pgadmin_query_tool.png) + +You are all set to manage your queries in pgAdmin! + +--- + +### Next Steps + +Your database cluster has been created/updated! From here you can continue to add additional user accounts, databases, and other database configurations. diff --git a/website/versioned_docs/version-v18.3.5/deployment/share-s3-access-logs.md b/website/versioned_docs/version-v18.3.5/deployment/share-s3-access-logs.md new file mode 100644 index 00000000000..0ac9d738e39 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/share-s3-access-logs.md @@ -0,0 +1,47 @@ +--- +id: share-s3-access-logs +title: Share S3 Access Logs +hide_title: false +--- + +It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package. + +## S3 Replicator + +The S3 Replicator is a Node.js package that contains a simple Lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another. + +First, ensure that you have enabled [S3 Server Access Logging](../configuration/server_access_logging). + +Next, configure your `terraform.tfvars` as described in the [`s3-replicator/README.md`](https://github.com/nasa/cumulus/blob/master/tf-modules/s3-replicator/README.md) to correspond to your deployment. The `source_bucket` and `source_prefix` are determined by how you enabled the [S3 Server Access Logging](../configuration/server_access_logging). + +In order to deploy the `s3-replicator` with Cumulus you will need to add the module to your terraform `main.tf` definition as the example below: + +```hcl +module "s3-replicator" { + source = "" + prefix = var.prefix + vpc_id = var.vpc_id + subnet_ids = var.subnet_ids + permissions_boundary = var.permissions_boundary_arn + source_bucket = var.s3_replicator_config.source_bucket + source_prefix = var.s3_replicator_config.source_prefix + target_bucket = var.s3_replicator_config.target_bucket + target_prefix = var.s3_replicator_config.target_prefix +} +``` + +The Terraform source package can be found on the [Cumulus GitHub Release page](https://github.com/nasa/cumulus/releases) under the asset tab `terraform-aws-cumulus-s3-replicator.zip`. + +## ESDIS Metrics + +In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates. + +Configure the S3 Replicator as described above using the `target_bucket` and `target_prefix` provided by the Metrics team. + +The Metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance. + +:::info + +For a more in-depth overview regarding ESDIS Metrics view the [Cumulus Distribution Metrics](../features/distribution-metrics.md) section. + +::: diff --git a/website/versioned_docs/version-v18.3.5/deployment/terraform-best-practices.md b/website/versioned_docs/version-v18.3.5/deployment/terraform-best-practices.md new file mode 100644 index 00000000000..9cad8ca5ac5 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/terraform-best-practices.md @@ -0,0 +1,283 @@ +--- +id: terraform-best-practices +title: Terraform Best Practices +hide_title: false +--- + +## How to Manage the Terraform State Bucket + +### Enable Bucket Versioning + +Since the Terraform state file for your Cumulus deployment is stored in S3, in +order to guard against its corruption or loss, it is **strongly recommended** +that versioning is enabled on the S3 bucket used for persisting your +deployment's Terraform state file. + +To enable bucket versioning, either use the AWS CLI command given in +[Configuring the Cumulus deployment](../deployment/README.md#create-resources-for-terraform-state), or the AWS Management Console, as follows: + +1. Go to the S3 service +2. Go to the bucket used for storing Terraform state files +3. Click the **Properties** tab +4. If the **Versioning** property is disabled, click **Disabled** to enable it, + which should then show the property as **Enabled**, with a check mark next + to it. + +### How to Recover from a Corrupted State File + +If your state file appears to be corrupted, or in some invalid state, and the +containing bucket has bucket versioning enabled, you may be able to recover by +[restoring a previous version][restore] of the state file. There are two primary +approaches, but the AWS documentation does not provide specific instructions +for either one: + +- **Option 1:** Copy a previous version of the state file into the same bucket +- **Option 2:** Permanently delete the current version of the file (i.e., the + corrupted one) + +For either approach, when using the **AWS Management Console**, the first steps +are: + +1. Go to the S3 service +2. Go to the appropriate bucket +3. On the **Overview** tab for the bucket, click the **Show** button to show + object versions +4. Locate your state file + +Next, you can proceed to either option: + +**Option 1**: To copy a previous version of your state file into the same bucket: + +1. Select the desired (good) version of the state file that you wish to make + the latest version +2. Click the **Download** button +3. Choose the location where you wish to save the file +4. **IMPORTANT:** Ensure the file name is identical to the name of the state + file in the bucket +5. Click **Save** +6. Now click the **Upload** button +7. Click the **Add files** button +8. Choose the file you just downloaded and click **Open** +9. Click the **Next** button (multiple times), then click the **Upload** button + +Once the upload completes, the newly uploaded file (identical to the good +version you just downloaded) becomes the **latest version** of the state file. + +**Option 2**: Alternatively, if you simply wish to delete the latest (corrupted) version +of the state file: + +1. Click the latest version of the file (listed at the top) +2. Click the **Actions** button and select **Delete** +3. On the dialog window, click the **Delete** button + +At this point, the previous version is now the latest version. + +:::caution + +When attempting to delete the latest (corrupt) version of the file, +you must _explicitly_ choose the **latest version**. Otherwise, if you simply +choose the file when versions are hidden, deleting it will insert a +_delete marker_ as the latest version of the file. This means that all prior +versions still exist, but the file _appears_ to be deleted. When you **Show** +the versions, you will see all of the previous versions (including the corrupt +one), as well as a _delete marker_ as the current version. + +::: + +### How to Recover from a Deleted State File + +If your state file appears to be deleted, but the containing bucket has bucket +versioning enabled, you _might_ be able to recover the file. This can occur +when your state file is not _permanently_ deleted, but rather a _delete marker_ +is the latest version of your file, and thus the file _appears_ to be deleted. + +#### Via AWS Management Console + +To recover your deleted state file via the AWS Management Console, **you may +follow one of the options detailed in the previous section** because the +_delete marker_ is simply considered the latest version of your file, and thus +can be treated in the same manner as any other version of your file. + +#### Via AWS CLI + +To handle this via the **AWS CLI** instead, first obtain the version ID of the +delete marker by replacing `BUCKET` and `KEY` as appropriate for the state file +in question, in the following command: + +```bash +aws s3api list-object-versions \ + --bucket BUCKET \ + --prefix KEY \ + --query "DeleteMarkers[?IsLatest].VersionId | [0]" +``` + +If the output from this command is `null`, then there is no delete marker, and +you may want to double-check your bucket and key values. If the bucket and key +values are correct, then your state file is either _not_ marked as deleted or +does not exist at all. + +Otherwise, you may remove the delete marker so that the state file no longer +appears deleted. This will restore the previous version of the file and make it +the latest version. Run the following command, using the same values for +`BUCKET` and `KEY` as used in the previous command, and replacing `VERSION_ID` +with the value output from the previous command: + +```bash +aws s3api delete-object \ + --bucket BUCKET \ + --key KEY \ + --version-id VERSION_ID +``` + +### Deny DeleteBucket Action + +As an additional measure to protect your Terraform state files from accidental +loss, it is also recommended that you deny all users the ability to delete the +bucket itself. At a later time, you may remove this protection when you are +sure you want to delete the bucket. + +To perform this action via the **AWS Management Console**: + +1. Go to the S3 service +2. Go to the bucket used for storing state files +3. Click the **Permissions** tab +4. Click **Bucket Policy** +5. Add the following policy statement to _deny_ the `s3:DeleteBucket` action for + all (`"*"`) principals, replacing `BUCKET_NAME` with the name of the bucket: + + ```json + { + "Statement": [ + { + "Sid": "DenyDeleteBucket", + "Effect": "Deny", + "Principal": "*", + "Action": "s3:DeleteBucket", + "Resource": "arn:aws:s3:::BUCKET_NAME" + } + ] + } + ``` + +6. Click **Save** + +To perform this action via the **AWS CLI** instead, save the JSON shown above +to a file named `policy.json` and run the following command from the directory +in which you saved `policy.json`, replacing `BUCKET_NAME` with the name of the +bucket: + +```bash +aws s3api put-bucket-policy --policy file://policy.json --bucket BUCKET_NAME +``` + +Afterwards, remove the `policy.json` file. + +## Change Resources Only via Terraform + +**All resource changes must be made via Terraform**, otherwise you risk that +your Terraform state file does not correctly represent the state of your +deployment resources. Specifically, this means: + +:::danger DO NOT's + +- **DO NOT** change deployment resources via the AWS Management Console +- **DO NOT** change deployment resources via the AWS CLI +- **DO NOT** change deployment resources via any of the AWS SDKs + +::: + +:::tip DO's + +Instead, **DO** change deployment resources **only** via changes to your +Terraform files (along with subsequent Terraform commands), except where +specifically instructed otherwise (such as in the instructions for destroying +a deployment). + +::: + +### Avoid Changing Connectivity Resources + +Keep in mind that changing connectivity resources can affect your ingest +functionality and API availability. + +Only update connectivity resources such as your VPC, subnets, and security +groups through Terraform deployments with S3 bucket versioning enabled. Test +connectivity immediately following deployment. + +### How to Reconcile Differences + +If your state file should get out of synch with the true state of your +resources, there are a number of things you can attempt to reconcile the +differences. However, given that each Cumulus deployment is unique, we can +provide only general guidance: + +- Consider [restoring a previous version][restore] of your state file, as described + in the earlier section about recovering from a corrupted state file +- If resources exist, but are not listed in your state file, consider using + `terraform import` (see ) +- If resources are missing, but are listed in your state file, run + `terraform plan` or `terraform apply`, both of which automatically run + `terraform refresh` to reconcile state. You may also run `terraform refresh` + directly. + +## How to Destroy Everything + +If you want to completely remove a deployment, note that there is some +protection in place to prevent accidental destruction of your data. Therefore, +there is an additional step required when you truly want to remove your entire +deployment. Further, destruction is performed in reverse order of creation. + +Starting from the root of your deployment repository workspace, perform the +following commands to first **destroy the resources for your `cumulus` module** +deployment. + +:::note + +If you are using Terraform workspaces, be sure to select the relevant +workspace first. + +::: + +```bash +tfenv use 1.5.3 +cd cumulus-tf +terraform init -reconfigure +terraform destroy +``` + +However, this does not prevent manual destruction in case you truly do wish to +remove them. You may do so via either the **AWS Management Console** or the +**AWS CLI**. As an additional precaution, you may want to create a backup for +each table in your deployment _before_ you delete them. + +Then, **destroy the resources for your `data-persistence` module**: + +```bash +cd ../data-persistence-tf +terraform init -reconfigure +terraform destroy +``` + +Destroying your data persistence layer does not destroy any of your RDS resources. Next, **destroy your database resources**. + +To teardown the entire cluster, if it was deployed by Terraform, use the `terraform destroy` command to delete your cluster. + +If using a shared cluster and you just want to destroy the database created by Cumulus for your deployment you must manually delete that individual database. The database is named `_db`. + +Delete any manual backups you have made that are no longer needed. + +Finally, since we tag the resources in your deployment, you should see if there +are any dangling resources left behind for any reason, by running the following +AWS CLI command, replacing `PREFIX` with your deployment prefix name: + +```bash +aws resourcegroupstaggingapi get-resources \ + --query "ResourceTagMappingList[].ResourceARN" \ + --tag-filters Key=Deployment,Values=PREFIX +``` + +Ideally, the output should be an empty list, but if it is not, then you may +need to manually delete the listed resources. + +[configuring]: ../deployment/README.md#create-resources-for-terraform-state "Configuring the Cumulus deployment" +[restore]: https://docs.aws.amazon.com/AmazonS3/latest/dev/RestoringPreviousVersions.html "Restoring a previous version" diff --git a/website/versioned_docs/version-v18.3.5/deployment/thin-egress-app.md b/website/versioned_docs/version-v18.3.5/deployment/thin-egress-app.md new file mode 100644 index 00000000000..39ec2329514 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/thin-egress-app.md @@ -0,0 +1,70 @@ +--- +id: thin_egress_app +title: Using the Thin Egress App (TEA) for Cumulus Distribution +hide_title: false +--- + +The [Thin Egress App (TEA)](https://github.com/asfadmin/thin-egress-app) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration. + +## Configuring a TEA Deployment + +TEA is deployed using [Terraform](https://terraform.io) modules. Refer to [these instructions](./components) for guidance on how to integrate new components with your deployment. + +The `cumulus-template-deploy` repository `cumulus-tf/main.tf` contains a `thin_egress_app` for distribution. + +The TEA module provides [these instructions](https://github.com/asfadmin/thin-egress-app/blob/devel/NGAP-DEPLOY-README.MD) +showing how to add it to your deployment and the following are instructions to configure the `thin_egress_app` module in your Cumulus deployment. + +### Create a Secret for Signing Thin Egress App JWTs + +The Thin Egress App uses JSON Web Tokens (JWTs) internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs. + +See the [Thin Egress App documentation](https://github.com/asfadmin/thin-egress-app#jwt-cookie-secret) on how to create this secret with the correct values. It will be used later to set the `thin_egress_jwt_secret_name` variable when deploying the Cumulus module. + +### Bucket_map.yaml + +The Thin Egress App uses a `bucket_map.yaml` file to determine which buckets to +serve. Documentation of the file format is available [here](https://github.com/asfadmin/thin-egress-app#bucket-map). + +The default Cumulus module generates a file at `s3://${system_bucket}/distribution_bucket_map.json`. + +The configuration file is a simple JSON mapping of the form: + +```json +{ + "daac-public-data-bucket": "/path/to/this/kind/of/data" +} +``` + +:::info + +Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets. + +::: + +#### Optionally Configure a Custom Bucket Map + +A simple configuration would look something like this: + +##### bucket_map.yaml + +```yaml +MAP: + my-protected: my-protected + my-public: my-public + +PUBLIC_BUCKETS: + - my-public +``` + +:::caution + +Your custom bucket map **must include mappings for all of the `protected` and `public` buckets specified in the `buckets` variable in `cumulus-tf/terraform.tfvars`**, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors. + +::: + +### Optionally Configure Shared Variables + +The `cumulus` module deploys certain components that interact with TEA. As a result, the `cumulus` module requires that if you are specifying a value for the `stage_name` variable to the TEA module, you **must use the same value for the `tea_api_gateway_stage` variable to the `cumulus` module**. + +One way to keep these variable values in sync across the modules is to use [Terraform local values](https://www.terraform.io/docs/configuration/locals.html) to define values to use for the variables for both modules. This approach is shown in the [Cumulus Core example deployment code](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/main.tf). diff --git a/website/versioned_docs/version-v18.3.5/deployment/upgrade.md b/website/versioned_docs/version-v18.3.5/deployment/upgrade.md new file mode 100644 index 00000000000..a7a67c701c9 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/deployment/upgrade.md @@ -0,0 +1,90 @@ +--- +id: upgrade-readme +title: Upgrading Cumulus +hide_title: false +--- + +After the initial deployment, any future updates to the Cumulus deployment from configuration files, Terraform files (`*.tf`), or modules from a new version of Cumulus can be deployed and will update the appropriate portions of the stack as needed. + +## Cumulus Versioning + +Cumulus uses a global versioning approach, meaning version numbers are consistent across all Terraform modules, and semantic versioning to track major, minor, and patch version (e.g., 1.0.0). + +:::danger important + +By convention, Cumulus minor version releases introduce breaking changes (e.g., 1.13.x -> 1.14.0), so it is critical that you consult the release notes for migration steps. Carefully read each `BREAKING CHANGES` and `MIGRATION STEPS` sections within the `CHANGELOG.md` file, following all steps, starting with the oldest release after your currently installed release, and progressing through them chronologically. + +::: + +To view the released module artifacts for each Cumulus Core version, see the [Cumulus Releases] page. + +## Migrating to a New Version + +When breaking changes have been introduced, the Cumulus Core team will publish instructions on migrating from one version to another. Detailed release notes with migration instructions (if any) for each release can be found on the [Cumulus Releases] page. + +1. **Use consistent Cumulus versions:** All Terraform modules must be updated to the same Cumulus version number (see below). In addition, your workflow Lambdas that utilize published Cumulus Core npm modules should always match your deployed Cumulus version to ensure compatibility. **Check the CHANGELOG for deprecation/breaking change notices.** +2. **Follow all intervening steps:** When skipping over versions, you **must perform all intervening migration steps**. For example, if going from version 1.1.0 to 1.3.0, upgrade from 1.1.0 to 1.2.0 and then to 1.3.0. This is critical because each release that contains migration steps provide instructions _only_ for migrating from the _immediately_ previous release, but you must follow _all_ migration steps between your currently installed release and _every release_ through the release that you wish to migrate to. +3. **Migrate lower environments first:** Migrate your "lowest" environment first and test it to ensure correctness before performing migration steps in each successively higher environment. For example, update Sandbox, then UAT, then SIT, and finally Prod. +4. **Conduct smoke tests:** In each environment, perform smoke tests that give you confidence that the upgrade was successful, prior to moving on to the next environment. Since deployments can vary widely, it is up to you to determine tests that might be specific to your deployment, but here are some general tests you might wish to perform: + * Confirm the Cumulus API is running and reachable by hitting the `/version` endpoint + * Run a workflow and confirm its operation (taking care in Production) + * Confirm distribution works +5. **Migrate during appropriate times:** Choose a time to migrate when support is more likely to be available in case you encounter problems, such as when you are most likely to be able to obtain support relatively promptly. Prefer earlier in the week over later in the week (particularly avoiding Fridays, if possible). + +## Updating Cumulus Version + +To update your Cumulus version: + +1. Find the desired release on the [Cumulus Releases] page +2. Update the `source` in your Terraform deployment files **for each of [your Cumulus components](./components.md#available-cumulus-components)** by replacing `vx.x.x` with the desired version of Cumulus. For example, here's the +entry from the `data-persistence` module: + + `source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform.zip//tf-modules/data-persistence"` + +3. Run `terraform init` to get the latest copies of your updated modules + +## Update Data Persistence Resources + +:::note Reminder + +Remember to [initialize Terraform](./README.md#initialize-terraform), if necessary. + +::: + +From the directory of your `data-persistence` deployment module (e.g., `data-persistence-tf`): + +```bash +$ AWS_REGION= \ # e.g. us-east-1 + AWS_PROFILE= \ + terraform apply +``` + +## Update Cumulus Resources + +:::note Reminder + +Remember to [initialize Terraform](./README.md#initialize-terraform), if necessary. + +::: + +From the directory of your `cumulus` deployment module (e.g., `cumulus-tf`): + +```bash +$ AWS_REGION= \ # e.g. us-east-1 + AWS_PROFILE= \ + terraform apply +``` + +Once you have successfully updated all of your resources, verify that your +deployment functions correctly. Please refer to some recommended smoke tests +given [above](#migrating-to-a-new-version), and consider additional tests appropriate for your particular +deployment and environment. + +## Update Cumulus Dashboard + +If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your [Cumulus Dashboard] deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating. + +[Cumulus Releases]: + https://github.com/nasa/cumulus/releases +[Cumulus Dashboard]: + https://github.com/nasa/cumulus-dashboard diff --git a/website/versioned_docs/version-v18.3.5/development/forked-pr.md b/website/versioned_docs/version-v18.3.5/development/forked-pr.md new file mode 100644 index 00000000000..348c153d7cf --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/development/forked-pr.md @@ -0,0 +1,44 @@ +--- +id: forked-pr +--- + +# Issuing PR From Forked Repos + +## Fork the Repo + +* Fork the Cumulus repo +* Create a new branch from the branch you'd like to contribute to +* If an issue doesn't already exist, submit one (see above) + +## Create a Pull Request + +* [Create a pull request](https://help.github.com/articles/creating-a-pull-request/) from your fork into the target branch of the nasa/cumulus repo +* Also read Github's documentation on how to work with forks [here](https://help.github.com/articles/working-with-forks/). +* Be sure to [mention the corresponding issue number](https://help.github.com/articles/closing-issues-using-keywords/) in the PR description, i.e. "Fixes Issue #10" + +## Reviewing PRs from Forked Repos + +Upon submission of a pull request, the Cumulus development team will review the code. + +Once the code passes an initial review, the team will run the CI tests against the proposed update. + +The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request. + +PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process: + +1. Create a new branch: + + ```bash + git checkout -b from- master + ``` + +2. Push the new branch to GitHub +3. Change the destination of the forked PR to the new branch that was just pushed + + ![Screenshot of Github interface showing how to change the base branch of a pull request](https://user-images.githubusercontent.com/1933118/46869547-80d31480-ce2c-11e8-9d2f-b8e1ea01fdb6.png) + +4. After code review and approval, merge the forked PR to the new branch. + +5. Create a PR for the new branch to master. + +6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate. diff --git a/website/versioned_docs/version-v18.3.5/development/integration-tests.md b/website/versioned_docs/version-v18.3.5/development/integration-tests.md new file mode 100644 index 00000000000..683dbf7a14c --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/development/integration-tests.md @@ -0,0 +1,31 @@ +--- +id: integration-tests +--- +# Integration Tests + +Cumulus has a comprehensive set of integration tests that tests the framework on +an active AWS account. As long as you have an AWS account with valid credentials +you can run the tests as described below: + +## Running integration tests on AWS + +1. Run: + + ```bash + npm install + npm run bootstrap + ``` + +2. Deploy your instance integrations on AWS and run tests by following the steps + [here](https://github.com/nasa/cumulus/tree/master/example/README.md) + +## Running integration tests on Bamboo + +Integration tests are run by default on Bamboo builds for the master branch, +a tagged release, and branches with an open PR. If you want to skip the +integration tests for a given commit for a PR branch, include `[skip-integration-tests]` +in the commit message. + +If you create a new stack and want to be able to run integration tests against +it in CI, you will need to add it to +[bamboo/select-stack.js](https://github.com/nasa/cumulus/tree/master/bamboo/select-stack.js). diff --git a/website/versioned_docs/version-v18.3.5/development/quality-and-coverage.md b/website/versioned_docs/version-v18.3.5/development/quality-and-coverage.md new file mode 100644 index 00000000000..3607ac2429b --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/development/quality-and-coverage.md @@ -0,0 +1,42 @@ +--- +id: quality-and-coverage +--- +# Code Coverage and Quality + +## Code Coverage + +Code coverage is checked using [nyc](https://github.com/istanbuljs/nyc). The +Bamboo build tests coverage. A summary can be viewed in the unit test build's output. + +The `npm test` command will output code coverage data for the entire Cumulus +repository. To create an html report, run `nyc report --reporter html` and open +the `index.html` file in the coverage folder. + +To run code coverage on an individual package during development, run +`npm run test`. This will output the coverage in the terminal. + +## Code quality checking + +This project uses [eslint](https://eslint.org/) to check code style and quality. +The configured eslint rules can be found in the project's +[.eslintrc.js](https://github.com/nasa/cumulus/blob/master/.eslintrc.js) +file. + +To check the configured linting, run `npm run lint`. + +## Documentation quality checking + +This project uses [markdownlint-cli](https://www.npmjs.com/package/markdownlint-cli) +as a frontend to [markdownlint](https://www.npmjs.com/package/markdownlint) to check +all of our markdown for style and formatting. The configured rules can be found +[here](https://github.com/nasa/cumulus/blob/master/.markdownlint.json). + +To run linting on the markdown files, run `npm run lint-md`. + +## Audit + +This project uses `audit-ci` to run a security audit on the package dependency +tree. This must pass prior to merge. The configured rules for `audit-ci` can be +found [here](https://github.com/nasa/cumulus/blob/master/audit-ci.json). + +To execute an audit, run `npm run audit`. diff --git a/website/versioned_docs/version-v18.3.5/development/release.md b/website/versioned_docs/version-v18.3.5/development/release.md new file mode 100644 index 00000000000..9c27c8a75b3 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/development/release.md @@ -0,0 +1,415 @@ +--- +id: release +--- +# Versioning and Releases + +## Versioning + +We use a global versioning approach, meaning version numbers in cumulus are consistent across all packages and tasks, and semantic versioning to track major, minor, and patch version (i.e. 1.0.0). We use Lerna to manage our versioning. Any change will force lerna to increment the version of all packages. + +Read more about the semantic versioning [here](https://docs.npmjs.com/getting-started/semantic-versioning). + +## Pre-release testing + +:::note + +This is only necessary when preparing a release for a new major version of Cumulus (e.g. preparing to go from `6.x.x` to `7.0.0`). + +::: + +Before releasing a new major version of Cumulus, we should test the deployment upgrade path from the latest release of Cumulus to the upcoming release. + +It is preferable to use the [cumulus-template-deploy](https://github.com/nasa/cumulus-template-deploy) repo for testing the deployment, since that repo is the officially recommended deployment configuration for end users. + +You should create an entirely new deployment for this testing to replicate the end user upgrade path. Using an existing test or CI deployment would not be useful because that deployment may already have been deployed with the latest changes and not match the upgrade path for end users. + +Pre-release testing steps: + +1. Checkout the [cumulus-template-deploy](https://github.com/nasa/cumulus-template-deploy) repo +2. Update the deployment code to use the latest release artifacts if it wasn't done already. For example, assuming that the latest release was `5.0.1`, update the deployment files as follows: + + ```text + # in data-persistence-tf/main.tf + source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/data-persistence" + + # in cumulus-tf/main.tf + source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/cumulus" + ``` + +3. For both the `data-persistence-tf` and `cumulus-tf` modules: + 1. Add the necessary backend configuration (`terraform.tf`) and variables (`terraform.tfvars`) + - You should use an entirely new deployment for this testing, so make sure to use values for `key` in `terraform.tf` and `prefix` in `terraform.tfvars` that don't collide with existing deployments + 2. Run `terraform init` + 3. Run `terraform apply` +4. Checkout the `master` branch of the `cumulus` repo +5. Run a full bootstrap of the code: `npm run bootstrap` +6. Build the pre-release artifacts: `./bamboo/create-release-artifacts.sh` +7. For both the `data-persistence-tf` and `cumulus-tf` modules: + 1. Update the deployment to use the built release artifacts: + + ```text + # in data-persistence-tf/main.tf + source = "[path]/cumulus/terraform-aws-cumulus.zip//tf-modules/data-persistence" + + # in cumulus-tf/main.tf + source = "/Users/mboyd/development/cumulus/terraform-aws-cumulus.zip//tf-modules/cumulus" + ``` + + 2. Review the `CHANGELOG.md` for any pre-deployment migration steps. If there are, go through the steps and confirm that they are successful + 3. Run `terraform init` + 4. Run `terraform apply` +8. Review the `CHANGELOG.md` for any post-deployment migration steps and confirm that they are successful +9. Delete your test deployment by running `terraform destroy` in `cumulus-tf` and `data-persistence-tf` + +## Updating Cumulus version and publishing to NPM + +### Deployment Steps + +1. [Create a branch for the new release](#1-create-a-branch-for-the-new-release) +2. [Update the Cumulus version number](#2-update-the-cumulus-version-number) +3. [Check Cumulus Dashboard PRs for Version Bump](#3-check-cumulus-dashboard-prs-for-version-bump) +4. [Update CHANGELOG.md](#4-update-changelogmd) +5. [Update DATA\_MODEL\_CHANGELOG.md](#5-update-data_model_changelogmd) +6. [Update CONTRIBUTORS.md](#6-update-contributorsmd) +7. [Update Cumulus package API documentation](#7-update-cumulus-package-api-documentation) +8. [Cut new version of Cumulus Documentation](#8-cut-new-version-of-cumulus-documentation) +9. [Create a pull request against the minor version branch](#9-create-a-pull-request-against-the-minor-version-branch) +10. [Create a git tag for the release](#10-create-a-git-tag-for-the-release) +11. [Publishing the release](#11-publishing-the-release) +12. [Create a new Cumulus release on github](#12-create-a-new-cumulus-release-on-github) +13. [Update Cumulus API document](#13-update-cumulus-api-document) +14. [Update Cumulus Template Deploy](#14-update-cumulus-template-deploy) +15. [Merge base branch back to master](#15-merge-base-branch-back-to-master) + +### 1. Create a branch for the new release + +#### From Master + +Create a branch titled `release-MAJOR.MINOR.x` for the release (use a literal x for the patch version). + +```shell + git checkout -b release-MAJOR.MINOR.x + +e.g.: + git checkout -b release-9.1.x +``` + +If creating a new major version release from master, say `5.0.0`, then the branch would be named `release-5.0.x`. If creating a new minor version release from master, say `1.14.0` then the branch would be named `release-1.14.x`. + +Having a release branch for each major/minor version allows us to easily backport patches to that version. + +Push the `release-MAJOR.MINOR.x` branch to GitHub if it was created locally. (Commits should be even with master at this point.) + +If creating a patch release, you can check out the existing base branch. + +Then create the release branch (e.g. `release-1.14.0`) from the minor version base branch. For example, from the `release-1.14.x` branch: + +```bash +git checkout -b release-1.14.0 +``` + +#### Backporting + +When creating a backport, a minor version base branch should already exist on GitHub. Check out the existing minor version base branch then create a release branch from it. For example: + +```bash +# check out existing minor version base branch +git checkout release-1.14.x +# pull to ensure you have the latest changes +git pull origin release-1.14.x +# create new release branch for backport +git checkout -b release-1.14.1 +# cherry pick the commits (or single squashed commit of changes) relevant to the backport +git cherry-pick [replace-with-commit-SHA] +# push up the changes to the release branch +git push +``` + +### 2. Update the Cumulus version number + +When changes are ready to be released, the Cumulus version number must be updated. + +Lerna handles the process of deciding which version number should be used as long as the developer specifies whether the change is a major, minor, or patch change. + +To update Cumulus's version number run: + +```bash +npm run update +``` + +![Screenshot of terminal showing interactive prompt from Lerna for selecting the new release version](https://static.notion-static.com/13acbe0a-c59d-4c42-90eb-23d4ec65c9db/Screen_Shot_2018-03-15_at_12.21.16_PM.png) + +Lerna will handle updating the packages and all of the dependent package version numbers. If a dependency has not been changed with the update, however, lerna will not update the version of the dependency. + +#### 2B. Verify Lerna + +:::note + +Lerna can struggle to correctly update the versions on any non-standard/alpha versions (e.g. `1.17.0-alpha0`). Additionally some packages may have been left at the previous version. +Please be sure to check any packages that are new or have been manually published since the previous release and any packages that list it as a dependency to ensure the listed versions are correct. +It's useful to use the search feature of your code editor or `grep` to see if there any references to the **_old_** package versions. +In bash shell you can run + +::: + +```bash + find . -name package.json -exec grep -nH "@cumulus/.*[0-9]*\.[0-9]\.[0-9].*" {} \; | grep -v "@cumulus/.*MAJOR\.MINOR\.PATCH.*" + +e.g.: + find . -name package.json -exec grep -nH "@cumulus/.*[0-9]*\.[0-9]\.[0-9].*" {} \; | grep -v "@cumulus/.*13\.1\.0.*" +``` + +Verify that no results are returned where MAJOR, MINOR, or PATCH differ from the intended version, and no outdated `-alpha` or `-beta` versions are specified. + +### 3. Check Cumulus Dashboard PRs for Version Bump + +There may be unreleased changes in the Cumulus Dashboard [project](https://github.com/nasa/cumulus-dashboard) that rely on this unreleased Cumulus Core version. + +If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release": + +- There will be a placeholder `change-me` value that should be replaced with the Cumulus Core to-be-released-version. +- Mark that PR as ready to be reviewed. + +### 4. Update CHANGELOG.md + +Update the `CHANGELOG.md`. Put a header under the `Unreleased` section with the new version number and the date. + +Add a link reference for the github "compare" view at the bottom of the `CHANGELOG.md`, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release. + +### 5. Update DATA\_MODEL\_CHANGELOG.md + +Similar to #4, make sure the DATA\_MODEL\_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate. + +### 6. Update CONTRIBUTORS.md + +```bash +./bin/update-contributors.sh +git add CONTRIBUTORS.md +``` + +Commit and push these changes, if any. + +### 7. Update Cumulus package API documentation + +Update auto-generated API documentation for any Cumulus packages that have it: + +```bash +npm run docs-build-packages +``` + +Commit and push these changes, if any. + +### 8. Cut new version of Cumulus Documentation + +Docusaurus v2 uses snapshot approach for [documentation versioning](https://docusaurus.io/docs/versioning). Every versioned docs +does not depends on other version. +If this is a patch version, or a minor version with no significant functionality changes requiring document update, do not create +a new version of the documentation, update the existing versioned_docs document instead. + +Create a new version: + +```bash +cd website +npm run docusaurus docs:version ${release_version} +# please update version in package.json +git add . +``` + +Instructions to rename an existing version: + +```bash +cd website +git mv versioned_docs/version- versioned_docs/version-${release_version} +git mv versioned_sidebars/version--sidebars.json versioned_sidebars/version-${release_version}-sidebars.json +# please update versions.json with new version +# please update documents under versioned_docs/version-${release_version} +git add . +``` + +Where `${release_version}` corresponds to the version tag `v1.2.3`, for example. + +Commit and push these changes. + +### 9. Create a pull request against the minor version branch + +1. Push the release branch (e.g. `release-1.2.3`) to GitHub. +2. Create a PR against the minor version base branch (e.g. `release-1.2.x`). +3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (`release-1.2.3`) and setting only these variables: + + - `GIT_PR`: `true` + - `SKIP_AUDIT`: `true` + + :::warning important + + Do NOT set the `PUBLISH_FLAG` variable to `true` for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan. + + ::: + + ![Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of "true"](../assets/configure-release-branch-test.png) + +4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (`release-1.2.x`). + - It **is safe** to do a squash merge in this instance, but not required +5. You may delete your release branch (`release-1.2.3`) after merging to the base branch. + +### 10. Create a git tag for the release + +Check out the minor version base branch (`release-1.2.x`) now that your changes are merged in and do a `git pull`. + +Ensure you are on the latest commit. + +Create and push a new git tag: + +```bash + git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH" + git push origin vMAJOR.MINOR.PATCH + +e.g.: + git tag -a v9.1.0 -m "Release 9.1.0" + git push origin v9.1.0 +``` + +### 11. Publishing the release + +Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered. + +The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. `release-1.2.x`) are: + +- The Bamboo build for the minor version release branch is triggered **automatically** on any commits to that branch, whereas we want to manually control when the release is published. +- We want to verify that integration tests have passed on the Bamboo build for the minor version release branch **before** we manually trigger the release, so that we can be sure that our code is safe to release. + +If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below: + +#### Creating a Bamboo branch plan for the release + +- In the Cumulus Core project (), click `Actions -> Configure Plan` in the top right. + +- Next to `Plan branch` click the rightmost button that displays `Create Plan Branch` upon hover. + +- Click `Create plan branch manually`. + +- Add the values in that list. Choose a display name that makes it _very_ clear this is a deployment branch plan. `Release (minor version branch name)` seems to work well (e.g. `Release (1.2.x)`)). + - **Make sure** you enter the correct branch name (e.g. `release-1.2.x`). + +- ::::note Manage the branch + + :::warning Deselect Enable Branch + + Deselect Enable Branch - if you do not do this, it will immediately fire off a build. + + ::: + + :::tip Do Immediately + + On the `Branch Details` page, enable `Change trigger`. Set the `Trigger type` to manual, this will prevent commits to the branch from triggering the build plan. + You should have been redirected to the `Branch Details` tab after creating the plan. If not, navigate to the branch from the list where you clicked `Create Plan Branch` in the previous step. + + ::: + + :::: + +- Go to the `Variables` tab. Ensure that you are on your branch plan and not the `master` plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be `Branch Variables`. Then set the branch variables as follow: + + - `DEPLOYMENT`: `cumulus-from-npm-tf` (**except in special cases such as incompatible backport branches**) + - If this variable is not set, it will default to the deployment name for the last committer on the branch + - `USE_CACHED_BOOTSTRAP`: `false` + - `USE_TERRAFORM_ZIPS`: `true` (**IMPORTANT**: MUST be set in order to run integration tests against the `.zip` files published during the build so that we are actually testing our released files) + - `GIT_PR`: `true` + - `SKIP_AUDIT`: `true` + - `PUBLISH_FLAG`: `true` + +- Enable the branch from the `Branch Details` page. + +- Run the branch using the `Run` button in the top right. + +Bamboo will build and run lint and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages. + +### 12. Create a new Cumulus release on github + +The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include: + +- A multi-module Terraform `.zip` artifact containing filtered copies of the `tf-modules`, `packages`, and `tasks` directories for use as Terraform module sources. +- A S3 replicator module +- A workflow module +- A distribution API module +- An ECS service module + +Make sure to verify the appropriate .zip files are present on Github after the release process is complete. + +:::caution important + + Copy the release notes for the new version from the changelog to the description of the new release on the [GitHub Releases page](https://github.com/nasa/cumulus/releases). + +::: + +:::info Optional + +The "Publish" step in Bamboo will push the release artifcats to GitHub (and NPM). If you need more time to validate the release _after_ the packages are published, you can mark the release as a "Pre-Release" on GitHub. This will clearly indicate the that release is not ready for the public. To do this: + +- Find the release on [GitHub Releases page](https://github.com/nasa/cumulus/releases) +- Click the "Edit release" button (pencil icon) +- Check the "This is a pre-release" checkbox +- Click "Update release" + +::: + +### 13. Update Cumulus API document + +There may be unreleased changes in the [Cumulus API document](https://github.com/nasa/cumulus-api) that are waiting on the Cumulus Core release. +If there are unrelease changes in the cumulus-api repo, follow the release instruction to create the release, the release version should match +the Cumulus Core release. + +### 14. Update Cumulus Template Deploy + +Users are encouraged to use our [Cumulus Template Deploy Project](https://github.com/nasa/cumulus-template-deploy) for deployments. The Cumulus Core version should be updated in this repo when a new Cumulus Core version is released. + +This will mean updating the `source` property of Cumulus modules with the correct version: + +```hcl +module "cumulus" { + source = "https://github.com/nasa/cumulus/releases/download/{most_current_version}/terraform-aws-cumulus.zip//tf-modules/cumulus" + ... +} +``` + +e.g. + +```hcl +module "cumulus" { + source = "https://github.com/nasa/cumulus/releases/download/v16.1.1/terraform-aws-cumulus.zip//tf-modules/cumulus" + ... +} +``` + +### 15. Merge base branch back to master + +Finally, you need to reproduce the version update changes back to master. + +If this is the latest version, you can simply create a PR to merge the minor version base branch back to master. + +Do not merge `master` back into the release branch since we want the release branch to _just_ have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master. + +:::note + +If this is a backport, you will need to create a PR that merges **ONLY** the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example: + +>**Please note** changes in 13.3.2 may not yet be released in future versions, as +>this is a backport and patch release on the 13.3.x series of releases. Updates that +>are included in the future will have a corresponding CHANGELOG entry in future +>releases.. + +::: + +## Troubleshooting + +### Delete and regenerate the tag + +To delete a published tag to re-tag, follow these steps: + +```bash + git tag -d vMAJOR.MINOR.PATCH + git push -d origin vMAJOR.MINOR.PATCH + +e.g.: + git tag -d v9.1.0 + git push -d origin v9.1.0 +``` diff --git a/website/versioned_docs/version-v18.3.5/docs-how-to.md b/website/versioned_docs/version-v18.3.5/docs-how-to.md new file mode 100644 index 00000000000..551af4167e0 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/docs-how-to.md @@ -0,0 +1,88 @@ +--- +id: docs-how-to +title: "Cumulus Documentation: How To's" +hide_title: false +--- + +## Cumulus Docs Installation + +### Run a Local Server + +Environment variables `DOCSEARCH_APP_ID`, `DOCSEARCH_API_KEY` and `DOCSEARCH_INDEX_NAME` must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search). + +```sh +git clone git@github.com:nasa/cumulus +cd cumulus +npm run docs-install +npm run docs-serve +``` + +:::note + +`docs-build` will build the documents into `website/build`. +`docs-clear` will clear the documents. + +::: + +:::caution + +Fix any broken links reported by Docusaurus if you see the following messages during build. + +[INFO] Docusaurus found broken links! + +Exhaustive list of all broken links found: + +::: + +### Cumulus Documentation + +Our project documentation is hosted on [GitHub Pages](https://pages.github.com/). The resources published to this website are housed in `docs/` directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images. + +We use the open-source static website generator [Docusaurus](https://docusaurus.io/docs) to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.). + +#### Add a New Page and Sidebars + +Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the `docs/` folder and adding some configuration values wrapped by `---` at the top of the file. There are many files that already have this header which can be used as reference. + +```markdown +--- +id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/ +title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar. +hide_title: false +--- +``` + +:::note + +To have the new page show up in a sidebar the designated `id` must be added to a sidebar in the `website/sidebars.js` file. Docusaurus has an in depth explanation of sidebars [here](https://docusaurus.io/docs/en/navigation). + +::: + +#### Versioning Docs + +We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found [here](https://docusaurus.io/docs/versioning). Docusaurus v2 uses snapshot approach for documentation versioning. Every versioned docs does not depends on other version. +It is worth noting that we would like the Documentation versions to match up directly with release versions. However, a new versioned docs can take up a lot of repo space and require maintenance, we suggest to update existing versioned docs for minor releases when there are no significant functionality changes. Cumulus versioning is explained in the [Versioning Docs](https://github.com/nasa/cumulus/tree/master/docs/development/release.md). + +#### Search + +Search on our documentation site is taken care of by [DocSearch](https://docsearch.algolia.com/). We have been provided with an `apiId`, `apiKey` and an `indexName` by DocSearch that we include in our `website/docusaurus.config.js` file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for these values to exist - `DOCSEARCH_APP_ID`, `DOCSEARCH_API_KEY` and `DOCSEARCH_NAME_INDEX`. + +#### Add a new task + +The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly. + +[Read more about adding a new task.](adding-a-task.md) + +#### Editing the tasks.md header or template + +Look at the `bin/build-tasks-doc.js` and `bin/tasks-header.md` files to edit the output of the tasks build script. + +#### Editing diagrams + +For some diagrams included in the documentation, the raw source is included in the `docs/assets/raw` directory to allow for easy updating in the future: + +- `assets/interfaces.svg` -> `assets/raw/interfaces.drawio` (generated using [draw.io](https://www.draw.io/)) + +### Deployment + +The `master` branch is automatically built and deployed to `gh-pages` branch. The `gh-pages` branch is served by Github Pages. Do not make edits to the `gh-pages` branch. diff --git a/website/versioned_docs/version-v18.3.5/external-contributions/external-contributions.md b/website/versioned_docs/version-v18.3.5/external-contributions/external-contributions.md new file mode 100644 index 00000000000..a22c2664b9e --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/external-contributions/external-contributions.md @@ -0,0 +1,29 @@ +--- +id: external-contributions +title: External Contributions +hide_title: false +--- + +Contributions to Cumulus may be made in the form of [PRs to the repositories directly](https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md) or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components. + +This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories. + +## Distribution + +[The ASF Thin Egress App](https://github.com/asfadmin/thin-egress-app#welcome-to-tea---the-thin-egress-app) is used by Cumulus for distribution. TEA can be deployed [with Cumulus](../deployment/thin_egress_app) or as part of other applications to distribute data. + +## Operational Cloud Recovery Archive (ORCA) + +[ORCA](https://nasa.github.io/cumulus-orca/) can be [deployed with Cumulus](https://nasa.github.io/cumulus-orca/docs/developer/deployment-guide/deployment) to provide a customizable baseline for creating and managing operational backups. + +## Workflow Tasks + +### CNM + +PO.DAAC provides two workflow tasks to be used with the [Cloud Notification Mechanism (CNM) Schema](https://github.com/podaac/cloud-notification-message-schema#cumulus-sns-schema): [CNM to Granule](https://github.com/podaac/cumulus-cnm-to-granule#cnm-to-granule-task) and [CNM Response](https://github.com/podaac/cumulus-cnm-response-task#cnm-response-task). + +See the [CNM workflow data cookbook](../data-cookbooks/cnm-workflow) for an example of how these can be used in a Cumulus ingest workflow. + +### DMR++ Generation + +GHRC has provided a [DMR++ Generation](https://github.com/ghrcdaac/dmrpp-generator#overview) wokrflow task. This task is meant to be used in conjunction with Cumulus' [Hyrax Metadata Updates workflow task](https://github.com/nasa/cumulus/tree/master/tasks/hyrax-metadata-updates#cumulushyrax-metadata-updates). diff --git a/website/versioned_docs/version-v18.3.5/faqs.md b/website/versioned_docs/version-v18.3.5/faqs.md new file mode 100644 index 00000000000..374d962e94f --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/faqs.md @@ -0,0 +1,190 @@ +--- +id: faqs +title: Frequently Asked Questions +hide_title: false +--- + +Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus. + +[General](#general) | [Workflows](#workflows) | [Integrators & Developers](#integrators--developers) | [Operators](#operators) + +--- + +### General + +
+ What prerequisites are needed to setup Cumulus? + + Answer: Here is a list of the tools and access that you will need in order to get started. To maintain the up-to-date versions that we are using please visit our [Cumulus main README](https://github.com/nasa/cumulus) for details. + +- [NVM](https://github.com/creationix/nvm) for node versioning +- [AWS CLI](http://docs.aws.amazon.com/cli/latest/userguide/installing.html) +- Bash +- Docker (only required for testing) +- docker-compose (only required for testing `pip install docker-compose`) +- Python +- [pipenv](https://pypi.org/project/pipenv/) + +:::info login credentials + +Keep in mind you will need access to the AWS console and an [Earthdata account](https://urs.earthdata.nasa.gov/) before you can deploy Cumulus. + +::: + +
+ +
+ What is the preferred web browser for the Cumulus environment? + + Answer: Our preferred web browser is the latest version of [Google Chrome](https://www.google.com/chrome/). +
+ +
+ How do I deploy a new instance in Cumulus? + + Answer: For steps on the Cumulus deployment process go to [How to Deploy Cumulus](deployment). +
+ +
+ Where can I find Cumulus release notes? + + Answer: To get the latest information about updates to Cumulus go to [Cumulus Versions](https://nasa.github.io/cumulus/versions). +
+ +
+ How do I quickly troubleshoot an issue in Cumulus? + + Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in [Troubleshooting Cumulus](troubleshooting). +
+ +
+ Where can I get support help? + + Answer: The following options are available for assistance: + +- Cumulus: Outside NASA users should file a [GitHub issue](https://github.com/nasa/cumulus/issues) and inside NASA users should file a Cumulus JIRA ticket. +- AWS: You can create a case in the [AWS Support Center](https://console.aws.amazon.com/support/home), accessible via your AWS Console. + +:::info + +For more information on how to submit an issue or contribute to Cumulus follow our guidelines at [Contributing](https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md). + +::: + +
+ +--- + +### Workflows + +
+ What is a Cumulus workflow? + + Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using [AWS Step Functions](https://docs.aws.amazon.com/step-functions/index.html). For more details, we suggest visiting the [Workflows](workflows) section. +
+ +
+ How do I set up a Cumulus workflow? + + Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps [here](workflows/developing-a-cumulus-workflow). +
+ +
+ Where can I find a list of workflow tasks? + + Answer: You can access a list of reusable tasks for Cumulus development at [Cumulus Tasks](tasks). +
+ +
+ Are there any third-party workflows or applications that I can use with Cumulus? + + Answer: The Cumulus team works with various partners to help build a robust framework. You can visit our [External Contributions](external-contributions/external-contributions.md) section to see what other options are available to help you customize Cumulus for your needs. +
+ +--- + +### Integrators & Developers + +
+ What is a Cumulus integrator? + + Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions: + +- Configure and deploy Cumulus to the AWS environment +- Configure Cumulus workflows +- Write custom workflow tasks + +
+ +
+ What are the steps if I run into an issue during deployment? + + Answer: If you encounter an issue with your deployment go to the [Troubleshooting Deployment](troubleshooting/troubleshoot_deployment.md) guide. +
+ +
+ Is Cumulus customizable and flexible? + + Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules. +
+ +
+ What are Terraform modules? + + Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to [Terraform modules](https://github.com/nasa/cumulus/tree/master/tf-modules). +
+ +
+ Where do I find Terraform module variables + + Answer: Go [here](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/variables.tf) for a list of Cumulus maintained variables. +
+ +
+ What are the common use cases that a Cumulus integrator encounters? + + Answer: The following are some examples of possible use cases you may see: + +- [Creating Cumulus Data Management Types](configuration/data-management-types) +- [Workflow: Add New Lambda](integrator-guide/workflow-add-new-lambda) +- [Workflow: Troubleshoot Failed Step(s)](integrator-guide/workflow-ts-failed-step) + +
+ +--- + +### Operators + +
+ What is a Cumulus operator? + + Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following: + +- Ingesting datasets +- Maintaining historical data ingest +- Starting and stopping data handlers +- Managing collections +- Managing provider definitions +- Creating, enabling, and disabling rules +- Investigating errors for granules and deleting or re-ingesting granules +- Investigating errors in executions and isolating failed workflow step(s) + +
+ +
+ What are the common use cases that a Cumulus operator encounters? + + Answer: The following are some examples of possible use cases you may see: + +- [Kinesis Stream For Ingest](operator-docs/kinesis-stream-for-ingest) +- [Create Rule In Cumulus](operator-docs/create-rule-in-cumulus) +- [Granule Workflows](operator-docs/granule-workflows) + +Explore more Cumulus operator best practices and how-tos in the dedicated [Operator Docs](operator-docs/about-operator-docs). +
+ +
+ Can you re-run a workflow execution in AWS? + + Answer: Yes. For steps on how to re-run a workflow execution go to [Re-running workflow executions](troubleshooting/rerunning-workflow-executions) in the [Cumulus Operator Docs](operator-docs/about-operator-docs). +
diff --git a/website/versioned_docs/version-v18.3.5/features/ancillary_metadata.md b/website/versioned_docs/version-v18.3.5/features/ancillary_metadata.md new file mode 100644 index 00000000000..4a2ed5221e5 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/features/ancillary_metadata.md @@ -0,0 +1,29 @@ +--- +id: ancillary_metadata +title: Ancillary Metadata Export +hide_title: false +--- + +This feature utilizes the `type` key on a files object in a Cumulus [granule](https://github.com/nasa/cumulus/blob/master/packages/api/lib/schemas.js). It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR. + +## Tasks setting type + +### [Discover Granules](../workflow_tasks/discover_granules) + + Uses the Collection `type` key to set the value for files on discovered granules in it's output. + +### [Parse PDR](../workflow_tasks/parse_pdr) + + Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set `type` on granules from the PDR. + +### CNMToCMALambdaFunction + + Natively supports types that are included in incoming messages to a [CNM Workflow](../data-cookbooks/cnm-workflow). + +## Tasks using type + +### [Move Granules](../workflow_tasks/move_granules) + + Uses the granule file `type` key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the `type`. + See the [file tracking data cookbook](../data-cookbooks/tracking-files#publish-to-cmr) for a detailed mapping. + If a non-CNM `type` is specified, the task assumes it is a 'data' file. diff --git a/website/versioned_docs/version-v18.3.5/features/backup_and_restore.md b/website/versioned_docs/version-v18.3.5/features/backup_and_restore.md new file mode 100644 index 00000000000..5e2fa217a65 --- /dev/null +++ b/website/versioned_docs/version-v18.3.5/features/backup_and_restore.md @@ -0,0 +1,179 @@ +--- +id: backup_and_restore +title: Cumulus Backup and Restore +hide_title: false +--- + +## Deployment Backup and Restore + +Most of your Cumulus deployment can be recovered by redeploying via Terraform. +The Cumulus metadata stored in your RDS database, including providers, collections, granules, rules, +and executions, can only be +restored if backup was properly configured or enabled. If a deployment is lost, +logs and Step Function executions in the AWS console will be irrecoverable. + +## Postgres Database + +:::note + +Cumulus supports a "bring your own" Postgres instance approach; however, our reference implementation utilizes a serverless Aurora RDS database - as such this reference provides AWS RDS Aurora Serverless backup options. + +::: + +### Backup and Restore + +#### Backup and Restore with AWS RDS + +##### Configuring Database Backups + +For AWS RDS Aurora database deployments, AWS provides a host of database +backup/integrity options, including [PITR (Point In Time +Recovery)](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html) +based on automated database backups and replay of transaction logs. + +For further information on RDS backup procedures, see the [AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_CommonTasks.BackupRestore.html) + +##### Disaster Recovery + +To recover a Cumulus Postgres database in a disaster or data-loss scenario, you should perform the following steps: + +* If the Postgres database cluster exists/is still online, halt workflow + activity, then take the cluster offline/remove access. +* Redeploy a new database cluster from your backup. See [AWS's PIT recovery + instructions](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html) + and [DB Snapshot recovery + instructions](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RestoreFromSnapshot.html), + or the examples below for more information. +* Configure your Cumulus deployment to utilize the new database cluster and re-deploy. + +##### cumulus-rds-tf examples + +The following sections provide a walk through of a few recovery scenarios for the provided `cumulus-rds-tf` +serverless module. + +***Point In Time Recovery*** + +If you need recovery that exceeds the 1-day granularity of AWS's snapshots, you +either must create and manually manage snapshots, or use Point In Time +Recovery (PITR) if you still have the original cluster available. + +Unfortunately as terraform does not yet support RDS PITR (see: +[github terraform-provider issue #5286](https://github.com/terraform-providers/terraform-provider-aws/issues/5286)), +this requires a manual procedure. + +If you are using the `cumulus-rds-tf` module to deploy an RDS Aurora Serverless +Postgres cluster, the following procedure can be used to successfully spin up a duplicate +cluster from backup in recovery scenarios where the database cluster is still viable: + +#### **1. Halt all ingest and remove access to the database to prevent Core processes from writing to the old cluster.** + +##### Halt Ingest + +Deactivate all Cumulus Rules, halt all clients that access the archive API and +stop any other database accessor processes. Ensure all active executions have +completed before proceeding. + +##### Remove Database Cluster Access + +Depending on your database cluster configuration, there are several ways to limit access to the +database. One example: + +Log in as the administrative user to your database cluster and run: + +```sql +alter database my_database connection limit 0; +select pg_terminate_backend(pg_stat_activity.pid) from pg_stat_activity where pg_stat_activity.datname = 'database'; +``` + +This should block new connections to the Core database from the database user +and cause database writes to fail. + +Note that it is possible in the above scenario to remove access to your datastore for your *administrative user*. Use care. + +#### **2. Using the AWS CLI (see [AWS PITR documentation](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PIT.html) for console instructions), making *certain* to use the same subnet groups and vpc-security-group IDs from your Core deployment, run the following command:** + + ```bash + aws rds restore-db-cluster-to-point-in-time --source-db-cluster-identifier "" --restore-to-time "