RFS can now take the source snapshot

Signed-off-by: Chris Helma <chelma+github@amazon.com>
chelma · Apr 4, 2024 · 2f1dbf3 · 2f1dbf3
1 parent 89e3f65
commit 2f1dbf3
Show file tree

Hide file tree

Showing 11 changed files with 507 additions and 50 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,7 +7,31 @@ __pycache__
 *.egg-info*
 .python-version
 logs
-test/opensearch-cluster-cdk/
+.vscode/
+
+# Build files
+plugins/opensearch/loggable-transport-netty4/.gradle/
+
 RFS/.gradle/
+RFS/bin/
 RFS/build/
+
+TrafficCapture/captureKafkaOffloader/bin/
+TrafficCapture/captureOffloader/bin/
+TrafficCapture/captureProtobufs/bin/
+TrafficCapture/coreUtilities/bin/
+TrafficCapture/nettyWireLogging/bin/
+TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJMESPathMessageTransformer/bin/
+TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJMESPathMessageTransformerProvider/bin/
+TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJoltMessageTransformer/bin/
+TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJoltMessageTransformerProvider/bin/
+TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonMessageTransformerInterface/bin/
+TrafficCapture/replayerPlugins/jsonMessageTransformers/openSearch23PlusTargetTransformerProvider/bin/
+TrafficCapture/testUtilities/bin/
+TrafficCapture/trafficCaptureProxyServer/bin/
+TrafficCapture/trafficCaptureProxyServerTest/bin/
+TrafficCapture/trafficReplayer/bin/
+
+# CDK files from end-to-end testing
 opensearch-cluster-cdk/
+test/opensearch-cluster-cdk/
diff --git a/RFS/README.md b/RFS/README.md
@@ -2,7 +2,65 @@
 
 ## How to run
 
-### Set up your ES 6.8 Source Cluster
+RFS provides a number of different options for running it.  We'll look at some of them below.  
+
+### Using a local snapshot directory
+
+In this scenario, you have a local directory on disk containing the snapshot you want to migrate.  You'll supply the `--snapshot-dir` arg, but not the ones related to a source cluster (`--source-host`, `--source-username`, `--source-password`) or S3 (`--s3-local-dir`, `--s3-repo-uri`, `--s3-region`)
+
+```
+TARGET_HOST=<source cluster URL>
+TARGET_USERNAME=<master user name>
+TARGET_PASSWORD=<master password>
+
+gradle build
+
+gradle run --args='-n global_state_snapshot --snapshot-dir ./test-resources/snapshots/ES_6_8_Single -l /tmp/lucene_files --target-host $TARGET_HOST --target-username $TARGET_USERNAME --target-password $TARGET_PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
+```
+
+### Using an existing S3 snapshot
+
+In this scenario, you have an existing snapshot in S3 you want to migrate.  You'll supply the S3-related args (`--s3-local-dir`, `--s3-repo-uri`, `--s3-region`), but not the `--snapshot-dir` one or the ones related to a source cluster (`--source-host`, `--source-username`, `--source-password`).
+
+```
+S3_REPO_URI=<something like "s3://my-test-bucket/ES_6_8_Single/">
+S3_REGION=us-east-1
+
+TARGET_HOST=<source cluster URL>
+TARGET_USERNAME=<master user name>
+TARGET_PASSWORD=<master password>
+
+gradle build
+
+gradle run --args='-n global_state_snapshot --s3-local-dir /tmp/s3_files --s3-repo-uri $S3_REPO_URI --s3-region $S3_REGION -l /tmp/lucene_files --target-host $TARGET_HOST --target-username $TARGET_USERNAME --target-password $TARGET_PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
+```
+
+### Using an source cluster
+
+In this scenario, you have a source cluster, and don't yet have a snapshot.  RFS will need to first make a snapshot of your source cluster, send it to S3, and then begin reindexing.  In this scenario, you'll supply the source cluster-related args (`--source-host`, `--source-username`, `--source-password`) and the S3-related args (`--s3-local-dir`, `--s3-repo-uri`, `--s3-region`), but not the `--snapshot-dir` one.
+
+```
+SOURCE_HOST=<source cluster URL>
+SOURCE_USERNAME=<master user name>
+SOURCE_PASSWORD=<master password>
+
+S3_REPO_URI=<something like "s3://my-test-bucket/ES_6_8_Single/">
+S3_REGION=us-east-1
+
+TARGET_HOST=<source cluster URL>
+TARGET_USERNAME=<master user name>
+TARGET_PASSWORD=<master password>
+
+gradle build
+
+gradle run --args='-n global_state_snapshot --source-host $SOURCE_HOST --source-username $SOURCE_USERNAME --source-password $SOURCE_PASSWORD --s3-local-dir /tmp/s3_files --s3-repo-uri $S3_REPO_URI --s3-region $S3_REGION -l /tmp/lucene_files --target-host $TARGET_HOST --target-username $TARGET_USERNAME --target-password $TARGET_PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
+```
+
+### Handling auth
+
+RFS currently supports both basic auth (username/password) and no auth for both the source and target clusters.  To use the no-auth approach, just neglect the username/password arguments.
+
+## How to set up an ES 6.8 Source Cluster w/ an attached debugger
 
 ```
 git clone git@github.com:elastic/elasticsearch.git
@@ -93,24 +151,142 @@ curl -X PUT "localhost:9200/_snapshot/fs_repository/global_state_snapshot?wait_f
   "ignore_unavailable": true,
   "include_global_state": true
 }'
-``` 
+```
 
-### Set up your OS 2.11 Target Cluster
+## How to set up an ES 7.10 Source Cluster running in Docker
 
-I've only tested the scripts going from ES 6.8 to OS 2.11.  For my test target, I just spun up an Amazon OpenSearch Service 2.11 cluster with a master user/password combo and otherwise default settings.
+The `./docker` directory contains a Dockerfile for an ES 7.10 cluster, which you can use for testing.  You can run it like so:
 
-### Run the scripts
+```
+(cd ./docker/TestSource_ES_7_10; docker build . -t es-w-s3)
 
-I've been running them VS Code integration, but you should be able to do it using their dedicated gradle commands as well.  That would look something like:
+docker run -d \
+-p 9200:9200 \
+-e discovery.type=single-node \
+--name elastic-source \
+es-w-s3 \
+/bin/sh -c '/usr/local/bin/docker-entrypoint.sh eswrapper & wait -n'
 
+curl http://localhost:9200
 ```
-SNAPSHOT_DIR=/Users/chelma/workspace/ElasticSearch/elasticsearch/distribution/build/cluster/shared/repo
-LUCENE_DIR=/tmp/lucene_files
-HOSTNAME=<Amazon OpenSearch Domain URL>
-USERNAME=<Amazon OpenSearch Domain master user name>
-PASSWORD=<Amazon OpenSearch Domain master password>
 
-gradle build
+### Providing AWS permissions for S3 snapshot creation
+
+While the container will have the `repository-s3` plugin installed out-of-the-box, to use it you'll need to provide AWS Credentials.  This plugin will either accept credential [from the Elasticsearch Keystore](https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3-client.html) or via the standard ENV variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN`).  The issue w/ either approach in local testing is renewal of the timeboxed creds.  One possible solution is to use an IAM User, but that is generally frowned upon.  The approach we'll take here is to accept that the test cluster is temporary, so the creds can be as well.  Therefore, we can make an AWS IAM Role in our AWS Account with the creds it needs, assume it locally to generate the credential triple, and pipe that into the container using ENV variables.
+
+Start by making an AWS IAM Role (e.g. `arn:aws:iam::XXXXXXXXXXXX:role/testing-es-source-cluster-s3-access`) with S3 Full Access permissions in your AWS Account.  You can then get credentials with that identity good for up to one hour:
+
+```
+unset access_key && unset secret_key && unset session_token
+
+output=$(aws sts assume-role --role-arn "arn:aws:iam::XXXXXXXXXXXX:role/testing-es-source-cluster-s3-access" --role-session-name "ES-Source-Cluster")
+
+access_key=$(echo $output | jq -r .Credentials.AccessKeyId)
+secret_key=$(echo $output | jq -r .Credentials.SecretAccessKey)
+session_token=$(echo $output | jq -r .Credentials.SessionToken)
+```
+
+The one hour limit is annoying but workable, given the only thing it's needed for is creating the snapshot at the very start of the RFS process.  This is primarily driven by the fact that IAM limits session durations to one hour when the role is assumed via another role (e.g. role chaining).  If your original creds in the AWS keyring are from an IAM User, etc, then this might not be a restriction for you and you can have up to 12 hours with the assumed creds.  Ideas on how to improve this would be greatly appreciated.
+
+Anyways, you can then launch the container with those temporary credentials like so, using the :
+
+```
+docker run -d \
+-p 9200:9200 \
+-e discovery.type=single-node \
+-e AWS_ACCESS_KEY_ID=$access_key \
+-e AWS_SECRET_ACCESS_KEY=$secret_key \
+-e AWS_SESSION_TOKEN=$session_token \
+-v ~/.aws:/root/.aws:ro \
+--name elastic-source \
+es-w-s3
+```
+
+If you need to renew the creds, you can just kill the existing source container, renew the creds, and spin up a new container.
+
+```
+docker stop elastic-source; docker rm elastic-source
+```
+
+### Setting up the Cluster w/ some sample docs
+
+You can set up the cluster w/ some sample docs like so:
+
+```
+curl -X PUT "localhost:9200/_component_template/posts_template" -H "Content-Type: application/json" -d'
+{
+  "template": {
+    "mappings": {
+      "properties": {
+        "contents": { "type": "text" },
+        "author": { "type": "keyword" },
+        "tags": { "type": "keyword" }
+      }
+    }
+  }
+}'
+
+curl -X PUT "localhost:9200/_index_template/posts_index_template" -H "Content-Type: application/json" -d'
+{
+  "index_patterns": ["posts_*"],
+  "template": {
+    "settings": {
+      "number_of_shards": 1,
+      "number_of_replicas": 1
+    },
+    "aliases": {
+      "current_posts": {}
+    }
+  },
+  "composed_of": ["posts_template"]
+}'
+
+curl -X PUT "localhost:9200/posts_2023_02_25"
+
+curl -X POST "localhost:9200/current_posts/_doc" -H "Content-Type: application/json" -d'
+{
+  "contents": "This is a sample blog post content.",
+  "author": "Author Name",
+  "tags": ["Elasticsearch", "Tutorial"]
+}'
+
+curl -X PUT "localhost:9200/posts_2024_01_01" -H "Content-Type: application/json" -d'
+{
+  "aliases": {
+    "another_alias": {
+      "routing": "user123",
+      "filter": {
+        "term": {
+          "author": "Tobias Funke"
+        }
+      }
+    }
+  }
+}'
+
+curl -X POST "localhost:9200/another_alias/_doc" -H "Content-Type: application/json" -d'
+{
+  "contents": "How Elasticsearch helped my patients",
+  "author": "Tobias Funke",
+  "tags": ["Elasticsearch", "Tutorial"]
+}'
+
+curl -X POST "localhost:9200/another_alias/_doc" -H "Content-Type: application/json" -d'
+{
+  "contents": "My Time in the Blue Man Group",
+  "author": "Tobias Funke",
+  "tags": ["Lifestyle"]
+}'
+
+curl -X POST "localhost:9200/another_alias/_doc" -H "Content-Type: application/json" -d'
+{
+  "contents": "On the Importance of Word Choice",
+  "author": "Tobias Funke",
+  "tags": ["Lifestyle"]
+}'
+```
+
+## How to set up an OS 2.11 Target Cluster
+
+I've only tested the scripts going from ES 6.8 to OS 2.11.  For my test target, I just spun up an Amazon OpenSearch Service 2.11 cluster with a master user/password combo and otherwise default settings.
 
-gradle run --args='-n global_state_snapshot --snapshot-dir $SNAPSHOT_DIR -l $LUCENE_DIR -h $HOSTNAME  -u $USERNAME -p $PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
-```
diff --git a/RFS/build.gradle b/RFS/build.gradle
@@ -12,17 +12,17 @@ repositories {
 
 dependencies {
     implementation 'com.beust:jcommander:1.81'
+    implementation 'com.fasterxml.jackson.core:jackson-databind:2.16.2'
+    implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-smile:2.16.2'
+    implementation 'com.fasterxml.jackson.core:jackson-annotations:2.16.2'
+    implementation 'com.fasterxml.jackson.core:jackson-core:2.16.2'
+    implementation 'io.netty:netty-codec-http:4.1.108.Final'
     implementation 'org.apache.logging.log4j:log4j-api:2.23.1'
     implementation 'org.apache.logging.log4j:log4j-core:2.23.1'
     implementation 'org.apache.lucene:lucene-core:8.11.3'
     implementation 'org.apache.lucene:lucene-analyzers-common:8.11.3'
     implementation 'org.apache.lucene:lucene-backward-codecs:8.11.3'
-    implementation 'com.fasterxml.jackson.core:jackson-databind:2.16.2'
-    implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-smile:2.16.2'
-    implementation 'com.fasterxml.jackson.core:jackson-annotations:2.16.2'
-    implementation 'com.fasterxml.jackson.core:jackson-core:2.16.2'
     implementation 'software.amazon.awssdk:s3:2.25.16'
-    implementation 'io.netty:netty-codec-http:4.1.108.Final'
 }
 
 application {

diff --git a/RFS/docker/TestSource_ES_7_10/Dockerfile b/RFS/docker/TestSource_ES_7_10/Dockerfile
@@ -0,0 +1,20 @@
+FROM docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2
+
+# Install the S3 Repo Plugin
+RUN echo y | /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3
+
+# Install the AWS CLI for testing purposes
+RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
+    unzip awscliv2.zip && \
+    ./aws/install
+
+# Install our custom entrypoint script
+COPY ./container-start.sh .
+
+# Configure Elastic
+ENV ELASTIC_SEARCH_CONFIG_FILE=/usr/share/elasticsearch/config/elasticsearch.yml
+# Prevents ES from complaining about nodes coun
+RUN echo "discovery.type: single-node" >> $ELASTIC_SEARCH_CONFIG_FILE
+ENV PATH=${PATH}:/usr/share/elasticsearch/jdk/bin/
+
+CMD /usr/share/elasticsearch/container-start.sh
diff --git a/RFS/docker/TestSource_ES_7_10/container-start.sh b/RFS/docker/TestSource_ES_7_10/container-start.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+echo "Setting AWS Creds from ENV Variables"
+bin/elasticsearch-keystore create
+echo $AWS_ACCESS_KEY_ID | bin/elasticsearch-keystore add s3.client.default.access_key --stdin
+echo $AWS_SECRET_ACCESS_KEY | bin/elasticsearch-keystore add s3.client.default.secret_key --stdin
+
+if [ -n "$AWS_SESSION_TOKEN" ]; then
+    echo $AWS_SESSION_TOKEN | bin/elasticsearch-keystore add s3.client.default.session_token --stdin
+fi
+
+echo "Starting Elasticsearch"
+/usr/local/bin/docker-entrypoint.sh eswrapper