Skip to content

Conversation

@terado
Copy link
Contributor

@terado terado commented Sep 20, 2025

Description

AWS Backup Restore Testing Validation & Integrity Design

  • Added documentation for AWS Backup Restore Testing Validation & Integrity Design.
  • Introduced a new S3 validator example with TypeScript for validating restored S3 objects.
  • Created a manual validation Terraform module to orchestrate manual restore jobs and invoke customer validation logic.
  • Developed Lambda orchestrator for manual restore validation, including recovery point selection and validation result publishing.
  • Established IAM roles and policies for orchestrator and validation Lambda functions.
  • Added TypeScript configuration and package management for the manual validation module.
  • Included operational notes and future enhancements in the manual validation module documentation.

Type of changes

  • Refactoring (non-breaking change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would change existing functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I am familiar with the contributing guidelines
  • I have followed the code style of the project
  • I have added tests to cover my changes
  • I have updated the documentation accordingly
  • This PR is a result of pair or mob programming

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

  • I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

@terado terado requested a review from a team as a code owner September 20, 2025 00:17
Comment on lines +33 to +37
{
"recoveryPointArn": "arn:aws:backup:...:recovery-point:...", // optional override
"expectedKeys": ["path/example1.txt", "path/example2.txt"], // validator-specific
"expectedMinObjects": 10 // optional fallback
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{
"recoveryPointArn": "arn:aws:backup:...:recovery-point:...", // optional override
"expectedKeys": ["path/example1.txt", "path/example2.txt"], // validator-specific
"expectedMinObjects": 10 // optional fallback
}
{
"recoveryPointArn": "arn:aws:backup:...:recovery-point:...", // optional override
"validationArgs": {
"expectedKeys": ["path/example1.txt", "path/example2.txt"], // validator-specific
"expectedMinObjects": 10 // optional fallback
}
}

We should be as blind as possible to what we're passing the validation function, and not let the parameters we hand on to the validator pollute the namespace of the arguments to the orchestrator itself.

Comment on lines +48 to +53
"resourceType": "S3",
"createdResourceArn": "arn:aws:s3:::restored-bucket",
"targetBucket": "restored-bucket",
"s3": { "bucket": "restored-bucket" },
"expectedKeys": ["..."],
"expectedMinObjects": 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following on from here, we want to make the function signature at this level agnostic to the resources that have been restored. So we could do something like this:

Suggested change
"resourceType": "S3",
"createdResourceArn": "arn:aws:s3:::restored-bucket",
"targetBucket": "restored-bucket",
"s3": { "bucket": "restored-bucket" },
"expectedKeys": ["..."],
"expectedMinObjects": 10
"args": { // validationArgs from above, verbatim
"expectedKeys": ["..."],
"expectedMinObjects": 10
},
"resources": [
{
"resourceType": "S3",
"createdResourceArn": "arn:aws:s3:::restored-bucket",
"targetBucket": "restored-bucket",
"s3": { "bucket": "restored-bucket" },
}
]

Although that might be more than we need - can we get away with this?

Suggested change
"resourceType": "S3",
"createdResourceArn": "arn:aws:s3:::restored-bucket",
"targetBucket": "restored-bucket",
"s3": { "bucket": "restored-bucket" },
"expectedKeys": ["..."],
"expectedMinObjects": 10
"args": { // validationArgs from above, verbatim
"expectedKeys": ["..."],
"expectedMinObjects": 10
},
"resources": [
"arn:aws:s3:::restored-bucket"
]

That is, is it feasible for us to only pass in the ARN of the retored resource?

Input delivered to customer Lambda (superset of invocation + restore context):

```json
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{
{
"version": "1.0.0",

Just to give ourselves some wiggle room here, let's include a version number so we can evolve the contract without too much hair-pulling.


- Orchestrator policy limited to listing recovery points, starting & describing restore jobs, publishing validation, invoking a single validator ARN.
- Validator policy scoped to specific target bucket ARNs (S3 example).
- Sensitive data avoidance: orchestrator does not log object contents, only metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might, possibly, include sensitive data. People could be putting anything in filenames, especially on S3 buckets where we're receiving files from third parties. I think we do need to call that out.


## 8. Extensibility

- Add multi-resource batch mode via Step Functions if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, see #76 (comment). We need to be able to support validation across more than one resource type within the same validator. We do still want batching to support splitting the validation up on semantic or performance lines, but that's not for multi-resource support.

Comment on lines +37 to +40
"resourceType": "S3",
"createdResourceArn": "arn:aws:s3:::restored-bucket",
"targetBucket": "restored-bucket",
"s3": { "bucket": "restored-bucket" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Comment on lines +61 to +63
resource_type = "S3"
validation_lambda_arn = aws_lambda_function.customer_validator.arn
target_bucket_name = var.target_restore_bucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above


## Operational Notes

- Timeouts: Orchestrator Lambda default timeout is 15 minutes; long restores will exceed this—use small test datasets or adapt to Step Functions if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That adaptation isn't something users need to worry about. We'll pick that up as and when it's needed in the blueprint itself.


- Timeouts: Orchestrator Lambda default timeout is 15 minutes; long restores will exceed this—use small test datasets or adapt to Step Functions if needed.
- Costs: Avoid listing millions of S3 keys in the validator; prefer sampling.
- IAM Hardening: Current policy uses broad `backup:*` subset and `s3:Get*`; tighten to specific ARNs in production.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think we need to do some tightening ourselves here. The user is just importing the module, not modifying its contents.

Comment on lines +33 to +37
variable "target_bucket_name" {
type = string
description = "For S3 restores: name of the destination S3 bucket that the restore will produce or populate. Used only in the example orchestrator logic."
default = null
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. Plus we don't want specifics for the example in the module interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants