Skip to content

Commit

Permalink
RFS can now take the source snapshot
Browse files Browse the repository at this point in the history
Signed-off-by: Chris Helma <chelma+github@amazon.com>
  • Loading branch information
chelma committed Apr 4, 2024
1 parent 89e3f65 commit 2f1dbf3
Show file tree
Hide file tree
Showing 11 changed files with 507 additions and 50 deletions.
26 changes: 25 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,31 @@ __pycache__
*.egg-info*
.python-version
logs
test/opensearch-cluster-cdk/
.vscode/

# Build files
plugins/opensearch/loggable-transport-netty4/.gradle/

RFS/.gradle/
RFS/bin/
RFS/build/

TrafficCapture/captureKafkaOffloader/bin/
TrafficCapture/captureOffloader/bin/
TrafficCapture/captureProtobufs/bin/
TrafficCapture/coreUtilities/bin/
TrafficCapture/nettyWireLogging/bin/
TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJMESPathMessageTransformer/bin/
TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJMESPathMessageTransformerProvider/bin/
TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJoltMessageTransformer/bin/
TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonJoltMessageTransformerProvider/bin/
TrafficCapture/replayerPlugins/jsonMessageTransformers/jsonMessageTransformerInterface/bin/
TrafficCapture/replayerPlugins/jsonMessageTransformers/openSearch23PlusTargetTransformerProvider/bin/
TrafficCapture/testUtilities/bin/
TrafficCapture/trafficCaptureProxyServer/bin/
TrafficCapture/trafficCaptureProxyServerTest/bin/
TrafficCapture/trafficReplayer/bin/

# CDK files from end-to-end testing
opensearch-cluster-cdk/
test/opensearch-cluster-cdk/
204 changes: 190 additions & 14 deletions RFS/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,65 @@

## How to run

### Set up your ES 6.8 Source Cluster
RFS provides a number of different options for running it. We'll look at some of them below.

### Using a local snapshot directory

In this scenario, you have a local directory on disk containing the snapshot you want to migrate. You'll supply the `--snapshot-dir` arg, but not the ones related to a source cluster (`--source-host`, `--source-username`, `--source-password`) or S3 (`--s3-local-dir`, `--s3-repo-uri`, `--s3-region`)

```
TARGET_HOST=<source cluster URL>
TARGET_USERNAME=<master user name>
TARGET_PASSWORD=<master password>
gradle build
gradle run --args='-n global_state_snapshot --snapshot-dir ./test-resources/snapshots/ES_6_8_Single -l /tmp/lucene_files --target-host $TARGET_HOST --target-username $TARGET_USERNAME --target-password $TARGET_PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
```

### Using an existing S3 snapshot

In this scenario, you have an existing snapshot in S3 you want to migrate. You'll supply the S3-related args (`--s3-local-dir`, `--s3-repo-uri`, `--s3-region`), but not the `--snapshot-dir` one or the ones related to a source cluster (`--source-host`, `--source-username`, `--source-password`).

```
S3_REPO_URI=<something like "s3://my-test-bucket/ES_6_8_Single/">
S3_REGION=us-east-1
TARGET_HOST=<source cluster URL>
TARGET_USERNAME=<master user name>
TARGET_PASSWORD=<master password>
gradle build
gradle run --args='-n global_state_snapshot --s3-local-dir /tmp/s3_files --s3-repo-uri $S3_REPO_URI --s3-region $S3_REGION -l /tmp/lucene_files --target-host $TARGET_HOST --target-username $TARGET_USERNAME --target-password $TARGET_PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
```

### Using an source cluster

In this scenario, you have a source cluster, and don't yet have a snapshot. RFS will need to first make a snapshot of your source cluster, send it to S3, and then begin reindexing. In this scenario, you'll supply the source cluster-related args (`--source-host`, `--source-username`, `--source-password`) and the S3-related args (`--s3-local-dir`, `--s3-repo-uri`, `--s3-region`), but not the `--snapshot-dir` one.

```
SOURCE_HOST=<source cluster URL>
SOURCE_USERNAME=<master user name>
SOURCE_PASSWORD=<master password>
S3_REPO_URI=<something like "s3://my-test-bucket/ES_6_8_Single/">
S3_REGION=us-east-1
TARGET_HOST=<source cluster URL>
TARGET_USERNAME=<master user name>
TARGET_PASSWORD=<master password>
gradle build
gradle run --args='-n global_state_snapshot --source-host $SOURCE_HOST --source-username $SOURCE_USERNAME --source-password $SOURCE_PASSWORD --s3-local-dir /tmp/s3_files --s3-repo-uri $S3_REPO_URI --s3-region $S3_REGION -l /tmp/lucene_files --target-host $TARGET_HOST --target-username $TARGET_USERNAME --target-password $TARGET_PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
```

### Handling auth

RFS currently supports both basic auth (username/password) and no auth for both the source and target clusters. To use the no-auth approach, just neglect the username/password arguments.

## How to set up an ES 6.8 Source Cluster w/ an attached debugger

```
git clone git@github.com:elastic/elasticsearch.git
Expand Down Expand Up @@ -93,24 +151,142 @@ curl -X PUT "localhost:9200/_snapshot/fs_repository/global_state_snapshot?wait_f
"ignore_unavailable": true,
"include_global_state": true
}'
```
```

### Set up your OS 2.11 Target Cluster
## How to set up an ES 7.10 Source Cluster running in Docker

I've only tested the scripts going from ES 6.8 to OS 2.11. For my test target, I just spun up an Amazon OpenSearch Service 2.11 cluster with a master user/password combo and otherwise default settings.
The `./docker` directory contains a Dockerfile for an ES 7.10 cluster, which you can use for testing. You can run it like so:

### Run the scripts
```
(cd ./docker/TestSource_ES_7_10; docker build . -t es-w-s3)
I've been running them VS Code integration, but you should be able to do it using their dedicated gradle commands as well. That would look something like:
docker run -d \
-p 9200:9200 \
-e discovery.type=single-node \
--name elastic-source \
es-w-s3 \
/bin/sh -c '/usr/local/bin/docker-entrypoint.sh eswrapper & wait -n'
curl http://localhost:9200
```
SNAPSHOT_DIR=/Users/chelma/workspace/ElasticSearch/elasticsearch/distribution/build/cluster/shared/repo
LUCENE_DIR=/tmp/lucene_files
HOSTNAME=<Amazon OpenSearch Domain URL>
USERNAME=<Amazon OpenSearch Domain master user name>
PASSWORD=<Amazon OpenSearch Domain master password>

gradle build
### Providing AWS permissions for S3 snapshot creation

While the container will have the `repository-s3` plugin installed out-of-the-box, to use it you'll need to provide AWS Credentials. This plugin will either accept credential [from the Elasticsearch Keystore](https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3-client.html) or via the standard ENV variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN`). The issue w/ either approach in local testing is renewal of the timeboxed creds. One possible solution is to use an IAM User, but that is generally frowned upon. The approach we'll take here is to accept that the test cluster is temporary, so the creds can be as well. Therefore, we can make an AWS IAM Role in our AWS Account with the creds it needs, assume it locally to generate the credential triple, and pipe that into the container using ENV variables.

Start by making an AWS IAM Role (e.g. `arn:aws:iam::XXXXXXXXXXXX:role/testing-es-source-cluster-s3-access`) with S3 Full Access permissions in your AWS Account. You can then get credentials with that identity good for up to one hour:

```
unset access_key && unset secret_key && unset session_token
output=$(aws sts assume-role --role-arn "arn:aws:iam::XXXXXXXXXXXX:role/testing-es-source-cluster-s3-access" --role-session-name "ES-Source-Cluster")
access_key=$(echo $output | jq -r .Credentials.AccessKeyId)
secret_key=$(echo $output | jq -r .Credentials.SecretAccessKey)
session_token=$(echo $output | jq -r .Credentials.SessionToken)
```

The one hour limit is annoying but workable, given the only thing it's needed for is creating the snapshot at the very start of the RFS process. This is primarily driven by the fact that IAM limits session durations to one hour when the role is assumed via another role (e.g. role chaining). If your original creds in the AWS keyring are from an IAM User, etc, then this might not be a restriction for you and you can have up to 12 hours with the assumed creds. Ideas on how to improve this would be greatly appreciated.

Anyways, you can then launch the container with those temporary credentials like so, using the :

```
docker run -d \
-p 9200:9200 \
-e discovery.type=single-node \
-e AWS_ACCESS_KEY_ID=$access_key \
-e AWS_SECRET_ACCESS_KEY=$secret_key \
-e AWS_SESSION_TOKEN=$session_token \
-v ~/.aws:/root/.aws:ro \
--name elastic-source \
es-w-s3
```

If you need to renew the creds, you can just kill the existing source container, renew the creds, and spin up a new container.

```
docker stop elastic-source; docker rm elastic-source
```

### Setting up the Cluster w/ some sample docs

You can set up the cluster w/ some sample docs like so:

```
curl -X PUT "localhost:9200/_component_template/posts_template" -H "Content-Type: application/json" -d'
{
"template": {
"mappings": {
"properties": {
"contents": { "type": "text" },
"author": { "type": "keyword" },
"tags": { "type": "keyword" }
}
}
}
}'
curl -X PUT "localhost:9200/_index_template/posts_index_template" -H "Content-Type: application/json" -d'
{
"index_patterns": ["posts_*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"aliases": {
"current_posts": {}
}
},
"composed_of": ["posts_template"]
}'
curl -X PUT "localhost:9200/posts_2023_02_25"
curl -X POST "localhost:9200/current_posts/_doc" -H "Content-Type: application/json" -d'
{
"contents": "This is a sample blog post content.",
"author": "Author Name",
"tags": ["Elasticsearch", "Tutorial"]
}'
curl -X PUT "localhost:9200/posts_2024_01_01" -H "Content-Type: application/json" -d'
{
"aliases": {
"another_alias": {
"routing": "user123",
"filter": {
"term": {
"author": "Tobias Funke"
}
}
}
}
}'
curl -X POST "localhost:9200/another_alias/_doc" -H "Content-Type: application/json" -d'
{
"contents": "How Elasticsearch helped my patients",
"author": "Tobias Funke",
"tags": ["Elasticsearch", "Tutorial"]
}'
curl -X POST "localhost:9200/another_alias/_doc" -H "Content-Type: application/json" -d'
{
"contents": "My Time in the Blue Man Group",
"author": "Tobias Funke",
"tags": ["Lifestyle"]
}'
curl -X POST "localhost:9200/another_alias/_doc" -H "Content-Type: application/json" -d'
{
"contents": "On the Importance of Word Choice",
"author": "Tobias Funke",
"tags": ["Lifestyle"]
}'
```

## How to set up an OS 2.11 Target Cluster

I've only tested the scripts going from ES 6.8 to OS 2.11. For my test target, I just spun up an Amazon OpenSearch Service 2.11 cluster with a master user/password combo and otherwise default settings.

gradle run --args='-n global_state_snapshot --snapshot-dir $SNAPSHOT_DIR -l $LUCENE_DIR -h $HOSTNAME -u $USERNAME -p $PASSWORD -s es_6_8 -t os_2_11 --movement-type everything'
```
10 changes: 5 additions & 5 deletions RFS/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,17 @@ repositories {

dependencies {
implementation 'com.beust:jcommander:1.81'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.16.2'
implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-smile:2.16.2'
implementation 'com.fasterxml.jackson.core:jackson-annotations:2.16.2'
implementation 'com.fasterxml.jackson.core:jackson-core:2.16.2'
implementation 'io.netty:netty-codec-http:4.1.108.Final'
implementation 'org.apache.logging.log4j:log4j-api:2.23.1'
implementation 'org.apache.logging.log4j:log4j-core:2.23.1'
implementation 'org.apache.lucene:lucene-core:8.11.3'
implementation 'org.apache.lucene:lucene-analyzers-common:8.11.3'
implementation 'org.apache.lucene:lucene-backward-codecs:8.11.3'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.16.2'
implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-smile:2.16.2'
implementation 'com.fasterxml.jackson.core:jackson-annotations:2.16.2'
implementation 'com.fasterxml.jackson.core:jackson-core:2.16.2'
implementation 'software.amazon.awssdk:s3:2.25.16'
implementation 'io.netty:netty-codec-http:4.1.108.Final'
}

application {
Expand Down
20 changes: 20 additions & 0 deletions RFS/docker/TestSource_ES_7_10/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2

# Install the S3 Repo Plugin
RUN echo y | /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

# Install the AWS CLI for testing purposes
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install

# Install our custom entrypoint script
COPY ./container-start.sh .

# Configure Elastic
ENV ELASTIC_SEARCH_CONFIG_FILE=/usr/share/elasticsearch/config/elasticsearch.yml
# Prevents ES from complaining about nodes coun
RUN echo "discovery.type: single-node" >> $ELASTIC_SEARCH_CONFIG_FILE
ENV PATH=${PATH}:/usr/share/elasticsearch/jdk/bin/

CMD /usr/share/elasticsearch/container-start.sh
13 changes: 13 additions & 0 deletions RFS/docker/TestSource_ES_7_10/container-start.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash

echo "Setting AWS Creds from ENV Variables"
bin/elasticsearch-keystore create
echo $AWS_ACCESS_KEY_ID | bin/elasticsearch-keystore add s3.client.default.access_key --stdin
echo $AWS_SECRET_ACCESS_KEY | bin/elasticsearch-keystore add s3.client.default.secret_key --stdin

if [ -n "$AWS_SESSION_TOKEN" ]; then
echo $AWS_SESSION_TOKEN | bin/elasticsearch-keystore add s3.client.default.session_token --stdin
fi

echo "Starting Elasticsearch"
/usr/local/bin/docker-entrypoint.sh eswrapper
Loading

0 comments on commit 2f1dbf3

Please sign in to comment.