From f214c7ce8a5e596a626c587bd61281b3a7b92270 Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 26 Aug 2025 11:30:20 -0700 Subject: [PATCH 1/3] resolve cdm and dsbulk tags --- .../ROOT/pages/cassandra-data-migrator.adoc | 347 +--------- modules/ROOT/pages/cdm-overview.adoc | 2 +- .../ROOT/pages/dsbulk-migrator-overview.adoc | 2 +- modules/ROOT/pages/dsbulk-migrator.adoc | 645 +----------------- .../cassandra-data-migrator-body.adoc | 344 ++++++++++ .../ROOT/partials/dsbulk-migrator-body.adoc | 642 +++++++++++++++++ 6 files changed, 990 insertions(+), 992 deletions(-) create mode 100644 modules/ROOT/partials/cassandra-data-migrator-body.adoc create mode 100644 modules/ROOT/partials/dsbulk-migrator-body.adoc diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index a467d4e8..576cc572 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -5,349 +5,4 @@ //This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav. -// tag::body[] -{description} -It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following: - -* Logging and run tracking -* Automatic reconciliation -* Performance tuning -* Record filtering -* Column renaming -* Support for advanced data types, including sets, lists, maps, and UDTs -* Support for SSL, including custom cipher algorithms -* Use `writetime` timestamps to maintain chronological write history -* Use Time To Live (TTL) values to maintain data lifecycles - -For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. - -== {cass-migrator} requirements - -To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas. - -== {cass-migrator-short} with {product-proxy} - -You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. - -When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. - -Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. - -For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`. - -== Install {cass-migrator} - -{company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes. - -[tabs] -====== -Install as a container:: -+ --- -Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. - -The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`. --- - -Install as a JAR file:: -+ --- -. Install Java 11 or later, which includes Spark binaries. - -. Install https://spark.apache.org/downloads.html[Apache Spark(TM)] version 3.5.x with Scala 2.13 and Hadoop 3.3 and later. -+ -[tabs] -==== -Single VM:: -+ -For one-off migrations, you can install the Spark binary on a single VM where you will run the {cass-migrator-short} job. -+ -. Get the Spark tarball from the Apache Spark archive. -+ -[source,bash,subs="+quotes"] ----- -wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz ----- -+ -Replace `**PATCH**` with your Spark patch version. -+ -. Change to the directory where you want install Spark, and then extract the tarball: -+ -[source,bash,subs="+quotes"] ----- -tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz ----- -+ -Replace `**PATCH**` with your Spark patch version. - -Spark cluster:: -+ -For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a Spark cluster or Spark Serverless platform. -+ -If you deploy CDM on a Spark cluster, you must modify your `spark-submit` commands as follows: -+ -* Replace `--master "local[*]"` with the host and port for your Spark cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`. -* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`. -==== - -. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}. - -. Add the `cassandra-data-migrator` dependency to `pom.xml`: -+ -[source,xml,subs="+quotes"] ----- - - datastax.cdm - cassandra-data-migrator - **VERSION** - ----- -+ -Replace `**VERSION**` with your {cass-migrator-short} version. - -. Run `mvn install`. - -If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. --- -====== - -== Configure {cass-migrator-short} - -. Create a `cdm.properties` file. -+ -If you use a different name, make sure you specify the correct filename in your `spark-submit` commands. - -. Configure the properties for your environment. -+ -In the {cass-migrator-short} repository, you can find a {cass-migrator-repo}/blob/main/src/resources/cdm.properties[sample properties file with default values], as well as a {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. -+ -{cass-migrator-short} jobs process all uncommented parameters. -Any parameters that are commented out are ignored or use default values. -+ -If you want to reuse a properties file created for a previous {cass-migrator-short} version, make sure it is compatible with the version you are currently using. -Check the {cass-migrator-repo}/releases[{cass-migrator-short} release notes] for possible breaking changes in interim releases. -For example, the 4.x series of {cass-migrator-short} isn't backwards compatible with earlier properties files. - -. Store your properties file where it can be accessed while running {cass-migrator-short} jobs using `spark-submit`. - -[#migrate] -== Run a {cass-migrator-short} data migration job - -A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster. - -To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table. - -The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. -The migration job is specified in the `--class` argument. - -[tabs] -====== -Local installation:: -+ --- -[source,bash,subs="+quotes,+attributes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ ---class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -Spark cluster:: -+ --- -[source,bash,subs="+quotes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "spark://**MASTER_HOST**:**PORT**" \ ---class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--master`: Provide the URL of your Spark cluster. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- -====== - -This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console. - -For additional modifications to this command, see <>. - -[#cdm-validation-steps] -== Run a {cass-migrator-short} data validation job - -After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records. - -Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation. - -. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. -The data validation job is specified in the `--class` argument. -+ -[tabs] -====== -Local installation:: -+ --- -[source,bash,subs="+quotes,+attributes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ ---class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -Spark cluster:: -+ --- -[source,bash,subs="+quotes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "spark://**MASTER_HOST**:**PORT**" \ ---class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--master`: Provide the URL of your Spark cluster. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- -====== - -. Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries. -+ -The {cass-migrator-short} validation job records differences as `ERROR` entries in the log file, listed by primary key values. -For example: -+ -[source,plaintext] ----- -23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999) -23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3] -23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2] -23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2] ----- -+ -When validating large datasets or multiple tables, you might want to extract the complete list of missing or mismatched records. -There are many ways to do this. -For example, you can grep for all `ERROR` entries in your {cass-migrator-short} log files or use the `log4j2` example provided in the {cass-migrator-repo}?tab=readme-ov-file#steps-for-data-validation[{cass-migrator-short} repository]. - -=== Run a validation job in AutoCorrect mode - -Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect** mode, which offers the following functions: - -* `autocorrect.missing`: Add any missing records in the target with the value from the origin. - -* `autocorrect.mismatch`: Reconcile any mismatched records between the origin and target by replacing the target value with the origin value. -+ -[IMPORTANT] -==== -Timestamps have an effect on this function. - -If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster. - -This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster. -==== - -* `autocorrect.missing.counter`: By default, counter tables are not copied when missing, unless explicitly set. - -In your `cdm.properties` file, use the following properties to enable (`true`) or disable (`false`) autocorrect functions: - -[source,properties] ----- -spark.cdm.autocorrect.missing false|true -spark.cdm.autocorrect.mismatch false|true -spark.cdm.autocorrect.missing.counter false|true ----- - -The {cass-migrator-short} validation job never deletes records from either the origin or target. -Data validation only inserts or updates data on the target. - -For an initial data validation, consider disabling AutoCorrect so that you can generate a list of data discrepancies, investigate those discrepancies, and then decide whether you want to rerun the validation with AutoCorrect enabled. - -[#advanced] -== Additional {cass-migrator-short} options - -You can modify your properties file or append additional `--conf` arguments to your `spark-submit` commands to customize your {cass-migrator-short} jobs. -For example, you can do the following: - -* Check for large field guardrail violations before migrating. -* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges. -* Use the `track-run` feature to monitor progress and rerun a failed migration or validation job from point of failure. - -For all options, see the {cass-migrator-repo}[{cass-migrator-short} repository]. -Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. - -== Troubleshoot {cass-migrator-short} - -.Java NoSuchMethodError -[%collapsible] -==== -If you installed Spark as a JAR file, and your Spark and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following: - -[source,console] ----- -Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()' ----- - -Make sure that your Spark binary is compatible with your {cass-migrator-short} version. -If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier Spark binary. -==== - -.Rerun a failed or partially completed job -[%collapsible] -==== -You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point. - -For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. -==== -// end::body[] \ No newline at end of file +include::ROOT:partial$cassandra-data-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/cdm-overview.adoc b/modules/ROOT/pages/cdm-overview.adoc index 79644ff4..de50f252 100644 --- a/modules/ROOT/pages/cdm-overview.adoc +++ b/modules/ROOT/pages/cdm-overview.adoc @@ -1,4 +1,4 @@ = {cass-migrator} ({cass-migrator-short}) overview :description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. -include::ROOT:cassandra-data-migrator.adoc[tags=body] \ No newline at end of file +include::ROOT:partial$cassandra-data-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator-overview.adoc b/modules/ROOT/pages/dsbulk-migrator-overview.adoc index 84769d92..5ec1a6c5 100644 --- a/modules/ROOT/pages/dsbulk-migrator-overview.adoc +++ b/modules/ROOT/pages/dsbulk-migrator-overview.adoc @@ -1,4 +1,4 @@ = {dsbulk-migrator} overview :description: {dsbulk-migrator} extends {dsbulk-loader} with migration commands. -include::ROOT:dsbulk-migrator.adoc[tags=body] \ No newline at end of file +include::ROOT:partial$dsbulk-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator.adoc b/modules/ROOT/pages/dsbulk-migrator.adoc index 29a9c736..79e1822e 100644 --- a/modules/ROOT/pages/dsbulk-migrator.adoc +++ b/modules/ROOT/pages/dsbulk-migrator.adoc @@ -4,647 +4,4 @@ //TODO: Reorganize this page and consider breaking it up into smaller pages. -// tag::body[] -{dsbulk-migrator} is an extension of {dsbulk-loader}. -It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. -You can also consider this tool for migrations where you can shard data from large tables into more manageable quantities. - -{dsbulk-migrator} extends {dsbulk-loader} with the following commands: - -* `migrate-live`: Start a live data migration using the embedded version of {dsbulk-loader} or your own {dsbulk-loader} installation. -A live migration means that the data migration starts immediately and is performed by the migrator tool through the specified {dsbulk-loader} installation. - -* `generate-script`: Generate a migration script that you can execute to perform a data migration with a your own {dsbulk-loader} installation. -This command _doesn't_ trigger the migration; it only generates the migration script that you must then execute. - -* `generate-ddl`: Read the schema from origin, and then generate CQL files to recreate it in your target {astra-db} database. - -[[prereqs-dsbulk-migrator]] -== {dsbulk-migrator} prerequisites - -* Java 11 - -* https://maven.apache.org/download.cgi[Maven] 3.9.x - -* Optional: If you don't want to use the embedded {dsbulk-loader} that is bundled with {dsbulk-migrator}, xref:dsbulk:installing:install.adoc[install {dsbulk-loader}] before installing {dsbulk-migrator}. - -== Build {dsbulk-migrator} - -. Clone the {dsbulk-migrator-repo}[{dsbulk-migrator} repository]: -+ -[source,bash] ----- -cd ~/github -git clone git@github.com:datastax/dsbulk-migrator.git -cd dsbulk-migrator ----- - -. Use Maven to build {dsbulk-migrator}: -+ -[source,bash] ----- -mvn clean package ----- - -The build produces two distributable fat jars: - -* `dsbulk-migrator-**VERSION**-embedded-driver.jar` contains an embedded Java driver. -Suitable for script generation or live migrations using an external {dsbulk-loader}. -+ -This jar isn't suitable for live migrations that use the embedded {dsbulk-loader} because no {dsbulk-loader} classes are present. - -* `dsbulk-migrator-**VERSION**-embedded-dsbulk.jar` contains an embedded {dsbulk-loader} and an embedded Java driver. -Suitable for all operations. -Much larger than the other JAR due to the presence of {dsbulk-loader} classes. - -== Test {dsbulk-migrator} - -The {dsbulk-migrator} project contains some integration tests that require https://github.com/datastax/simulacron[Simulacron]. - -. Clone and build Simulacron, as explained in the https://github.com/datastax/simulacron[Simulacron GitHub repository]. -Note the prerequisites for Simulacron, particularly for macOS. - -. Run the tests: - -[source,bash] ----- -mvn clean verify ----- - -== Run {dsbulk-migrator} - -Launch {dsbulk-migrator} with the command and options you want to run: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator.jar { migrate-live | generate-script | generate-ddl } [OPTIONS] ----- - -The role and availability of the options depends on the command you run: - -* During a live migration, the options configure {dsbulk-migrator} and establish connections to -the clusters. - -* When generating a migration script, most options become default values in the generated scripts. -However, even when generating scripts, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the tables to migrate. - -* When generating a DDL file, import options and {dsbulk-loader}-related options are ignored. -However, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the keyspaces and tables for the DDL statements. - -For more information about the commands and their options, see the following references: - -* <> -* <> -* <> - -For help and examples, see <> and <>. - -[[dsbulk-live]] -== Live migration command-line options - -The following options are available for the `migrate-live` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - -[cols="2,8,14"] -|=== - -| `-c` -| `--dsbulk-cmd=CMD` -| The external {dsbulk-loader} command to use. -Ignored if the embedded {dsbulk-loader} is being used. -The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. -Tables will be exported and imported in subdirectories of the data directory specified here. -There will be one subdirectory per keyspace in the data directory, then one subdirectory per table in each keyspace directory. - -| `-e` -| `--dsbulk-use-embedded` -| Use the embedded {dsbulk-loader} version instead of an external one. -The default is to use an external {dsbulk-loader} command. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. -The default is `LOCAL_QUORUM`. - -| -| `--export-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when exporting. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. -The default is `-1` (export the entire table). - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. -This is an advanced setting; you should rarely need to modify the default value. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| -| `--import-bundle=PATH` -| The path to a {scb} to connect to a target {astra-db} cluster. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. -The default is `LOCAL_QUORUM`. - -| -| `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. -The default is `1970-01-01T00:00:00Z`. - -| -| `--import-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when importing. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node on the target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. -Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. - -| -| `--import-password` -| The password to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--import-username=STRING` -| The username to use to authenticate against the target cluster. Options `--import-username` and `--import-password` must be provided together, or not at all. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-l` -| `--dsbulk-log-dir=PATH` -| The directory where the {dsbulk-loader} should store its logs. -The default is a `logs` subdirectory in the current working directory. -This subdirectory will be created if it does not exist. -Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. - -| -| `--max-concurrent-ops=NUM` -| The maximum number of concurrent operations (exports and imports) to carry. -The default is `1`. -Set this to higher values to allow exports and imports to occur concurrently. -For example, with a value of `2`, each table will be imported as soon as it is exported, while the next table is being exported. - -| -| `--skip-truncate-confirmation` -| Skip truncate confirmation before actually truncating tables. -Only applicable when migrating counter tables, ignored otherwise. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. -The default is `all`. - -| -| `--truncate-before-export` -| Truncate tables before the export instead of after. -The default is to truncate after the export. -Only applicable when migrating counter tables, ignored otherwise. - -| `-w` -| `--dsbulk-working-dir=PATH` -| The directory where `dsbulk` should be executed. -Ignored if the embedded {dsbulk-loader} is being used. -If unspecified, it defaults to the current working directory. - -|=== - -[[dsbulk-script]] -== Script generation command-line options - -The following options are available for the `generate-script` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - - -[cols="2,8,14"] -|=== - -| `-c` -| `--dsbulk-cmd=CMD` -| The {dsbulk-loader} command to use. -The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. -The default is `LOCAL_QUORUM`. - -| -| `--export-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when exporting. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. -The default is `-1` (export the entire table). - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. -This is an advanced setting. -You should rarely need to modify the default value. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| -| `--import-bundle=PATH` -| The path to a Secure Connect Bundle to connect to a target {astra-db} cluster. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. -The default is `LOCAL_QUORUM`. - -| -| `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. -The default is `1970-01-01T00:00:00Z`. - -| -| `--import-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when importing. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node on the target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. -Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. - -| -| `--import-password` -| The password to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--import-username=STRING` -| The username to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-l` -| `--dsbulk-log-dir=PATH` -| The directory where {dsbulk-loader} should store its logs. -The default is a `logs` subdirectory in the current working directory. -This subdirectory will be created if it does not exist. -Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. The default is `all`. - -|=== - - -[[dsbulk-ddl]] -== DDL generation command-line options - -The following options are available for the `generate-ddl` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - -[cols="2,8,14"] -|=== - -| `-a` -| `--optimize-for-astra` -| Produce CQL scripts optimized for {company} {astra-db}. -{astra-db} does not allow some options in DDL statements. -Using this {dsbulk-migrator} command option, forbidden {astra-db} options will be omitted from the generated CQL files. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. -The default is `all`. - -|=== - -[[dsbulk-examples]] -== {dsbulk-migrator} examples - -These examples show sample `username` and `password` values that are for demonstration purposes only. -Don't use these values in your environment. - -=== Generate a migration script - -Generate a migration script to migrate from an existing origin cluster to a target {astra-db} cluster: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password=s3cr3t \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password=s3cr3t ----- - -=== Live migration with an external {dsbulk-loader} installation - -Perform a live migration from an existing origin cluster to a target {astra-db} cluster using an external {dsbulk-loader} installation: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password # password will be prompted \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password # password will be prompted ----- - -Passwords are prompted interactively. - -=== Live migration with the embedded {dsbulk-loader} - -Perform a live migration from an existing origin cluster to a target {astra-db} cluster using the embedded {dsbulk-loader} installation: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-dsbulk.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-use-embedded \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password # password will be prompted \ - --export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ - --export-dsbulk-option "--executor.maxPerSecond=1000" \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password # password will be prompted \ - --import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ - --import-dsbulk-option "--executor.maxPerSecond=1000" ----- - -Passwords are prompted interactively. - -The preceding example passes additional {dsbulk-loader} options. - -The preceding example requires the `dsbulk-migrator--embedded-dsbulk.jar` fat jar. -Otherwise, an error is raised because no embedded {dsbulk-loader} can be found. - -=== Generate DDL files to recreate the origin schema on the target cluster - -Generate DDL files to recreate the origin schema on a target {astra-db} cluster: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar generate-ddl \ - --data-dir=/path/to/data/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password=s3cr3t \ - --optimize-for-astra ----- - -[[getting-help-with-dsbulk-migrator]] -== Get help with {dsbulk-migrator} - -Use the following command to display the available {dsbulk-migrator} commands: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar --help ----- - -For individual command help and each one's options: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help ----- - -== See also - -* xref:dsbulk:overview:dsbulk-about.adoc[{dsbulk-loader}] -* xref:dsbulk:reference:dsbulk-cmd.adoc#escaping-and-quoting-command-line-arguments[Escaping and quoting {dsbulk-loader} command line arguments] -// end::body[] \ No newline at end of file +include::ROOT:partial$dsbulk-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/partials/cassandra-data-migrator-body.adoc b/modules/ROOT/partials/cassandra-data-migrator-body.adoc new file mode 100644 index 00000000..f1ba3f01 --- /dev/null +++ b/modules/ROOT/partials/cassandra-data-migrator-body.adoc @@ -0,0 +1,344 @@ +{description} +It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following: + +* Logging and run tracking +* Automatic reconciliation +* Performance tuning +* Record filtering +* Column renaming +* Support for advanced data types, including sets, lists, maps, and UDTs +* Support for SSL, including custom cipher algorithms +* Use `writetime` timestamps to maintain chronological write history +* Use Time To Live (TTL) values to maintain data lifecycles + +For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. + +== {cass-migrator} requirements + +To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas. + +== {cass-migrator-short} with {product-proxy} + +You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. + +When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. + +Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. + +For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`. + +== Install {cass-migrator} + +{company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes. + +[tabs] +====== +Install as a container:: ++ +-- +Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. + +The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`. +-- + +Install as a JAR file:: ++ +-- +. Install Java 11 or later, which includes Spark binaries. + +. Install https://spark.apache.org/downloads.html[Apache Spark(TM)] version 3.5.x with Scala 2.13 and Hadoop 3.3 and later. ++ +[tabs] +==== +Single VM:: ++ +For one-off migrations, you can install the Spark binary on a single VM where you will run the {cass-migrator-short} job. ++ +. Get the Spark tarball from the Apache Spark archive. ++ +[source,bash,subs="+quotes"] +---- +wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz +---- ++ +Replace `**PATCH**` with your Spark patch version. ++ +. Change to the directory where you want install Spark, and then extract the tarball: ++ +[source,bash,subs="+quotes"] +---- +tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz +---- ++ +Replace `**PATCH**` with your Spark patch version. + +Spark cluster:: ++ +For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a Spark cluster or Spark Serverless platform. ++ +If you deploy CDM on a Spark cluster, you must modify your `spark-submit` commands as follows: ++ +* Replace `--master "local[*]"` with the host and port for your Spark cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`. +* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`. +==== + +. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}. + +. Add the `cassandra-data-migrator` dependency to `pom.xml`: ++ +[source,xml,subs="+quotes"] +---- + + datastax.cdm + cassandra-data-migrator + **VERSION** + +---- ++ +Replace `**VERSION**` with your {cass-migrator-short} version. + +. Run `mvn install`. + +If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. +-- +====== + +== Configure {cass-migrator-short} + +. Create a `cdm.properties` file. ++ +If you use a different name, make sure you specify the correct filename in your `spark-submit` commands. + +. Configure the properties for your environment. ++ +In the {cass-migrator-short} repository, you can find a {cass-migrator-repo}/blob/main/src/resources/cdm.properties[sample properties file with default values], as well as a {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. ++ +{cass-migrator-short} jobs process all uncommented parameters. +Any parameters that are commented out are ignored or use default values. ++ +If you want to reuse a properties file created for a previous {cass-migrator-short} version, make sure it is compatible with the version you are currently using. +Check the {cass-migrator-repo}/releases[{cass-migrator-short} release notes] for possible breaking changes in interim releases. +For example, the 4.x series of {cass-migrator-short} isn't backwards compatible with earlier properties files. + +. Store your properties file where it can be accessed while running {cass-migrator-short} jobs using `spark-submit`. + +[#migrate] +== Run a {cass-migrator-short} data migration job + +A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster. + +To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table. + +The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. +The migration job is specified in the `--class` argument. + +[tabs] +====== +Local installation:: ++ +-- +[source,bash,subs="+quotes,+attributes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ +--class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- + +Spark cluster:: ++ +-- +[source,bash,subs="+quotes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "spark://**MASTER_HOST**:**PORT**" \ +--class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--master`: Provide the URL of your Spark cluster. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- +====== + +This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console. + +For additional modifications to this command, see <>. + +[#cdm-validation-steps] +== Run a {cass-migrator-short} data validation job + +After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records. + +Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation. + +. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. +The data validation job is specified in the `--class` argument. ++ +[tabs] +====== +Local installation:: ++ +-- +[source,bash,subs="+quotes,+attributes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ +--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- + +Spark cluster:: ++ +-- +[source,bash,subs="+quotes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "spark://**MASTER_HOST**:**PORT**" \ +--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--master`: Provide the URL of your Spark cluster. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- +====== + +. Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries. ++ +The {cass-migrator-short} validation job records differences as `ERROR` entries in the log file, listed by primary key values. +For example: ++ +[source,plaintext] +---- +23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999) +23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3] +23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2] +23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2] +---- ++ +When validating large datasets or multiple tables, you might want to extract the complete list of missing or mismatched records. +There are many ways to do this. +For example, you can grep for all `ERROR` entries in your {cass-migrator-short} log files or use the `log4j2` example provided in the {cass-migrator-repo}?tab=readme-ov-file#steps-for-data-validation[{cass-migrator-short} repository]. + +=== Run a validation job in AutoCorrect mode + +Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect** mode, which offers the following functions: + +* `autocorrect.missing`: Add any missing records in the target with the value from the origin. + +* `autocorrect.mismatch`: Reconcile any mismatched records between the origin and target by replacing the target value with the origin value. ++ +[IMPORTANT] +==== +Timestamps have an effect on this function. + +If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster. + +This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster. +==== + +* `autocorrect.missing.counter`: By default, counter tables are not copied when missing, unless explicitly set. + +In your `cdm.properties` file, use the following properties to enable (`true`) or disable (`false`) autocorrect functions: + +[source,properties] +---- +spark.cdm.autocorrect.missing false|true +spark.cdm.autocorrect.mismatch false|true +spark.cdm.autocorrect.missing.counter false|true +---- + +The {cass-migrator-short} validation job never deletes records from either the origin or target. +Data validation only inserts or updates data on the target. + +For an initial data validation, consider disabling AutoCorrect so that you can generate a list of data discrepancies, investigate those discrepancies, and then decide whether you want to rerun the validation with AutoCorrect enabled. + +[#advanced] +== Additional {cass-migrator-short} options + +You can modify your properties file or append additional `--conf` arguments to your `spark-submit` commands to customize your {cass-migrator-short} jobs. +For example, you can do the following: + +* Check for large field guardrail violations before migrating. +* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges. +* Use the `track-run` feature to monitor progress and rerun a failed migration or validation job from point of failure. + +For all options, see the {cass-migrator-repo}[{cass-migrator-short} repository]. +Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. + +== Troubleshoot {cass-migrator-short} + +.Java NoSuchMethodError +[%collapsible] +==== +If you installed Spark as a JAR file, and your Spark and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following: + +[source,console] +---- +Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()' +---- + +Make sure that your Spark binary is compatible with your {cass-migrator-short} version. +If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier Spark binary. +==== + +.Rerun a failed or partially completed job +[%collapsible] +==== +You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point. + +For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. +==== \ No newline at end of file diff --git a/modules/ROOT/partials/dsbulk-migrator-body.adoc b/modules/ROOT/partials/dsbulk-migrator-body.adoc new file mode 100644 index 00000000..4a6cf0f9 --- /dev/null +++ b/modules/ROOT/partials/dsbulk-migrator-body.adoc @@ -0,0 +1,642 @@ +{dsbulk-migrator} is an extension of {dsbulk-loader}. +It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. +You can also consider this tool for migrations where you can shard data from large tables into more manageable quantities. + +{dsbulk-migrator} extends {dsbulk-loader} with the following commands: + +* `migrate-live`: Start a live data migration using the embedded version of {dsbulk-loader} or your own {dsbulk-loader} installation. +A live migration means that the data migration starts immediately and is performed by the migrator tool through the specified {dsbulk-loader} installation. + +* `generate-script`: Generate a migration script that you can execute to perform a data migration with a your own {dsbulk-loader} installation. +This command _doesn't_ trigger the migration; it only generates the migration script that you must then execute. + +* `generate-ddl`: Read the schema from origin, and then generate CQL files to recreate it in your target {astra-db} database. + +[[prereqs-dsbulk-migrator]] +== {dsbulk-migrator} prerequisites + +* Java 11 + +* https://maven.apache.org/download.cgi[Maven] 3.9.x + +* Optional: If you don't want to use the embedded {dsbulk-loader} that is bundled with {dsbulk-migrator}, xref:dsbulk:installing:install.adoc[install {dsbulk-loader}] before installing {dsbulk-migrator}. + +== Build {dsbulk-migrator} + +. Clone the {dsbulk-migrator-repo}[{dsbulk-migrator} repository]: ++ +[source,bash] +---- +cd ~/github +git clone git@github.com:datastax/dsbulk-migrator.git +cd dsbulk-migrator +---- + +. Use Maven to build {dsbulk-migrator}: ++ +[source,bash] +---- +mvn clean package +---- + +The build produces two distributable fat jars: + +* `dsbulk-migrator-**VERSION**-embedded-driver.jar` contains an embedded Java driver. +Suitable for script generation or live migrations using an external {dsbulk-loader}. ++ +This jar isn't suitable for live migrations that use the embedded {dsbulk-loader} because no {dsbulk-loader} classes are present. + +* `dsbulk-migrator-**VERSION**-embedded-dsbulk.jar` contains an embedded {dsbulk-loader} and an embedded Java driver. +Suitable for all operations. +Much larger than the other JAR due to the presence of {dsbulk-loader} classes. + +== Test {dsbulk-migrator} + +The {dsbulk-migrator} project contains some integration tests that require https://github.com/datastax/simulacron[Simulacron]. + +. Clone and build Simulacron, as explained in the https://github.com/datastax/simulacron[Simulacron GitHub repository]. +Note the prerequisites for Simulacron, particularly for macOS. + +. Run the tests: + +[source,bash] +---- +mvn clean verify +---- + +== Run {dsbulk-migrator} + +Launch {dsbulk-migrator} with the command and options you want to run: + +[source,bash] +---- +java -jar /path/to/dsbulk-migrator.jar { migrate-live | generate-script | generate-ddl } [OPTIONS] +---- + +The role and availability of the options depends on the command you run: + +* During a live migration, the options configure {dsbulk-migrator} and establish connections to +the clusters. + +* When generating a migration script, most options become default values in the generated scripts. +However, even when generating scripts, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the tables to migrate. + +* When generating a DDL file, import options and {dsbulk-loader}-related options are ignored. +However, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the keyspaces and tables for the DDL statements. + +For more information about the commands and their options, see the following references: + +* <> +* <> +* <> + +For help and examples, see <> and <>. + +[[dsbulk-live]] +== Live migration command-line options + +The following options are available for the `migrate-live` command. +Most options have sensible default values and do not need to be specified, unless you want to override the default value. + +[cols="2,8,14"] +|=== + +| `-c` +| `--dsbulk-cmd=CMD` +| The external {dsbulk-loader} command to use. +Ignored if the embedded {dsbulk-loader} is being used. +The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. + +| `-d` +| `--data-dir=PATH` +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. +Tables will be exported and imported in subdirectories of the data directory specified here. +There will be one subdirectory per keyspace in the data directory, then one subdirectory per table in each keyspace directory. + +| `-e` +| `--dsbulk-use-embedded` +| Use the embedded {dsbulk-loader} version instead of an external one. +The default is to use an external {dsbulk-loader} command. + +| +| `--export-bundle=PATH` +| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-consistency=CONSISTENCY` +| The consistency level to use when exporting data. +The default is `LOCAL_QUORUM`. + +| +| `--export-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when exporting. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--export-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node from the origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to write to. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-records=NUM` +| The maximum number of records to export for each table. +Must be a positive number or `-1`. +The default is `-1` (export the entire table). + +| +| `--export-password` +| The password to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--export-splits=NUM\|NC` +| The maximum number of token range queries to generate. +Use the `NC` syntax to specify a multiple of the number of available cores. +For example, `8C` = 8 times the number of available cores. +The default is `8C`. +This is an advanced setting; you should rarely need to modify the default value. + +| +| `--export-username=STRING` +| The username to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. + +| `-h` +| `--help` +| Displays this help text. + +| +| `--import-bundle=PATH` +| The path to a {scb} to connect to a target {astra-db} cluster. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-consistency=CONSISTENCY` +| The consistency level to use when importing data. +The default is `LOCAL_QUORUM`. + +| +| `--import-default-timestamp=` +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 syntax. +The default is `1970-01-01T00:00:00Z`. + +| +| `--import-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when importing. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--import-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node on the target cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to read from. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-errors=NUM` +| The maximum number of failed records to tolerate when importing data. +The default is `1000`. +Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. + +| +| `--import-password` +| The password to use to authenticate against the target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--import-username=STRING` +| The username to use to authenticate against the target cluster. Options `--import-username` and `--import-password` must be provided together, or not at all. + +| `-k` +| `--keyspaces=REGEX` +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. +Case-sensitive keyspace names must be entered in their exact case. + +| `-l` +| `--dsbulk-log-dir=PATH` +| The directory where the {dsbulk-loader} should store its logs. +The default is a `logs` subdirectory in the current working directory. +This subdirectory will be created if it does not exist. +Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. + +| +| `--max-concurrent-ops=NUM` +| The maximum number of concurrent operations (exports and imports) to carry. +The default is `1`. +Set this to higher values to allow exports and imports to occur concurrently. +For example, with a value of `2`, each table will be imported as soon as it is exported, while the next table is being exported. + +| +| `--skip-truncate-confirmation` +| Skip truncate confirmation before actually truncating tables. +Only applicable when migrating counter tables, ignored otherwise. + +| `-t` +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +Case-sensitive table names must be entered in their exact case. + +| +| `--table-types=regular\|counter\|all` +| The table types to migrate. +The default is `all`. + +| +| `--truncate-before-export` +| Truncate tables before the export instead of after. +The default is to truncate after the export. +Only applicable when migrating counter tables, ignored otherwise. + +| `-w` +| `--dsbulk-working-dir=PATH` +| The directory where `dsbulk` should be executed. +Ignored if the embedded {dsbulk-loader} is being used. +If unspecified, it defaults to the current working directory. + +|=== + +[[dsbulk-script]] +== Script generation command-line options + +The following options are available for the `generate-script` command. +Most options have sensible default values and do not need to be specified, unless you want to override the default value. + + +[cols="2,8,14"] +|=== + +| `-c` +| `--dsbulk-cmd=CMD` +| The {dsbulk-loader} command to use. +The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. + +| `-d` +| `--data-dir=PATH` +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. + +| +| `--export-bundle=PATH` +| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-consistency=CONSISTENCY` +| The consistency level to use when exporting data. +The default is `LOCAL_QUORUM`. + +| +| `--export-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when exporting. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--export-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node from the origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to write to. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-records=NUM` +| The maximum number of records to export for each table. +Must be a positive number or `-1`. +The default is `-1` (export the entire table). + +| +| `--export-password` +| The password to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--export-splits=NUM\|NC` +| The maximum number of token range queries to generate. +Use the `NC` syntax to specify a multiple of the number of available cores. +For example, `8C` = 8 times the number of available cores. +The default is `8C`. +This is an advanced setting. +You should rarely need to modify the default value. + +| +| `--export-username=STRING` +| The username to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. + +| `-h` +| `--help` +| Displays this help text. + +| +| `--import-bundle=PATH` +| The path to a Secure Connect Bundle to connect to a target {astra-db} cluster. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-consistency=CONSISTENCY` +| The consistency level to use when importing data. +The default is `LOCAL_QUORUM`. + +| +| `--import-default-timestamp=` +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 syntax. +The default is `1970-01-01T00:00:00Z`. + +| +| `--import-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when importing. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--import-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node on the target cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to read from. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-errors=NUM` +| The maximum number of failed records to tolerate when importing data. +The default is `1000`. +Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. + +| +| `--import-password` +| The password to use to authenticate against the target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--import-username=STRING` +| The username to use to authenticate against the target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. + +| `-k` +| `--keyspaces=REGEX` +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. +Case-sensitive keyspace names must be entered in their exact case. + +| `-l` +| `--dsbulk-log-dir=PATH` +| The directory where {dsbulk-loader} should store its logs. +The default is a `logs` subdirectory in the current working directory. +This subdirectory will be created if it does not exist. +Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. + +| `-t` +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +Case-sensitive table names must be entered in their exact case. + +| +| `--table-types=regular\|counter\|all` +| The table types to migrate. The default is `all`. + +|=== + + +[[dsbulk-ddl]] +== DDL generation command-line options + +The following options are available for the `generate-ddl` command. +Most options have sensible default values and do not need to be specified, unless you want to override the default value. + +[cols="2,8,14"] +|=== + +| `-a` +| `--optimize-for-astra` +| Produce CQL scripts optimized for {company} {astra-db}. +{astra-db} does not allow some options in DDL statements. +Using this {dsbulk-migrator} command option, forbidden {astra-db} options will be omitted from the generated CQL files. + +| `-d` +| `--data-dir=PATH` +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. + +| +| `--export-bundle=PATH` +| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node from the origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-password` +| The password to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--export-username=STRING` +| The username to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. + +| `-h` +| `--help` +| Displays this help text. + +| `-k` +| `--keyspaces=REGEX` +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. +Case-sensitive keyspace names must be entered in their exact case. + +| `-t` +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +Case-sensitive table names must be entered in their exact case. + +| +| `--table-types=regular\|counter\|all` +| The table types to migrate. +The default is `all`. + +|=== + +[[dsbulk-examples]] +== {dsbulk-migrator} examples + +These examples show sample `username` and `password` values that are for demonstration purposes only. +Don't use these values in your environment. + +=== Generate a migration script + +Generate a migration script to migrate from an existing origin cluster to a target {astra-db} cluster: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password=s3cr3t \ + --import-bundle=/path/to/bundle \ + --import-username=user1 \ + --import-password=s3cr3t +---- + +=== Live migration with an external {dsbulk-loader} installation + +Perform a live migration from an existing origin cluster to a target {astra-db} cluster using an external {dsbulk-loader} installation: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password # password will be prompted \ + --import-bundle=/path/to/bundle \ + --import-username=user1 \ + --import-password # password will be prompted +---- + +Passwords are prompted interactively. + +=== Live migration with the embedded {dsbulk-loader} + +Perform a live migration from an existing origin cluster to a target {astra-db} cluster using the embedded {dsbulk-loader} installation: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-dsbulk.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-use-embedded \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password # password will be prompted \ + --export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ + --export-dsbulk-option "--executor.maxPerSecond=1000" \ + --import-bundle=/path/to/bundle \ + --import-username=user1 \ + --import-password # password will be prompted \ + --import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ + --import-dsbulk-option "--executor.maxPerSecond=1000" +---- + +Passwords are prompted interactively. + +The preceding example passes additional {dsbulk-loader} options. + +The preceding example requires the `dsbulk-migrator--embedded-dsbulk.jar` fat jar. +Otherwise, an error is raised because no embedded {dsbulk-loader} can be found. + +=== Generate DDL files to recreate the origin schema on the target cluster + +Generate DDL files to recreate the origin schema on a target {astra-db} cluster: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-driver.jar generate-ddl \ + --data-dir=/path/to/data/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password=s3cr3t \ + --optimize-for-astra +---- + +[[getting-help-with-dsbulk-migrator]] +== Get help with {dsbulk-migrator} + +Use the following command to display the available {dsbulk-migrator} commands: + +[source,bash] +---- +java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar --help +---- + +For individual command help and each one's options: + +[source,bash] +---- +java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help +---- + +== See also + +* xref:dsbulk:overview:dsbulk-about.adoc[{dsbulk-loader}] +* xref:dsbulk:reference:dsbulk-cmd.adoc#escaping-and-quoting-command-line-arguments[Escaping and quoting {dsbulk-loader} command line arguments] \ No newline at end of file From ac45bf075d89d21b8c1126c0ca4bcff5ffd2061d Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 26 Aug 2025 11:36:43 -0700 Subject: [PATCH 2/3] convert sideloader tags to dedicated partials --- .../sideloader/pages/cleanup-sideloader.adoc | 2 +- .../sideloader/pages/migrate-sideloader.adoc | 16 +-- .../sideloader/pages/sideloader-overview.adoc | 8 +- modules/sideloader/pages/sideloader-zdm.adoc | 2 +- .../pages/stop-restart-sideloader.adoc | 4 +- .../pages/troubleshoot-sideloader.adoc | 4 +- modules/sideloader/partials/check-status.adoc | 11 ++ .../partials/command-placeholders-common.adoc | 7 ++ modules/sideloader/partials/import.adoc | 35 ++++++ modules/sideloader/partials/initialize.adoc | 24 ++++ modules/sideloader/partials/no-return.adoc | 2 + .../partials/sideloader-partials.adoc | 107 ------------------ .../sideloader/partials/sideloader-zdm.adoc | 4 + modules/sideloader/partials/validate.adoc | 4 + 14 files changed, 105 insertions(+), 125 deletions(-) create mode 100644 modules/sideloader/partials/check-status.adoc create mode 100644 modules/sideloader/partials/command-placeholders-common.adoc create mode 100644 modules/sideloader/partials/import.adoc create mode 100644 modules/sideloader/partials/initialize.adoc create mode 100644 modules/sideloader/partials/no-return.adoc delete mode 100644 modules/sideloader/partials/sideloader-partials.adoc create mode 100644 modules/sideloader/partials/sideloader-zdm.adoc create mode 100644 modules/sideloader/partials/validate.adoc diff --git a/modules/sideloader/pages/cleanup-sideloader.adoc b/modules/sideloader/pages/cleanup-sideloader.adoc index 42dae738..4ed5ed40 100644 --- a/modules/sideloader/pages/cleanup-sideloader.adoc +++ b/modules/sideloader/pages/cleanup-sideloader.adoc @@ -47,7 +47,7 @@ If the request fails due to `ImportInProgress`, you must either wait for the imp . Wait a few minutes, and then check the migration status: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] + While the cleanup is running, the migration status is `CleaningUpFiles`. When complete, the migration status is `Closed`. diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index 4deec95b..326bdcb5 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -272,7 +272,7 @@ Use the {devops-api} to initialize the migration and get your migration director .What happens during initialization? [%collapsible] ==== -include::sideloader:partial$sideloader-partials.adoc[tags=initialize] +include::sideloader:partial$initialize.adoc[] ==== The initialization process can take several minutes to complete, especially if the migration bucket doesn't already exist. @@ -322,7 +322,7 @@ Replace *`MIGRATION_ID`* with the `migrationID` returned by the `initialize` end . Check the migration status: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . Check the `status` field in the response: + @@ -520,7 +520,7 @@ aws s3 sync --only-show-errors --exclude '{asterisk}' --include '{asterisk}/snap + Replace the following: + -include::sideloader:partial$sideloader-partials.adoc[tags=command-placeholders-common] +include::sideloader:partial$command-placeholders-common.adoc[] + .Example: Upload a snapshot with AWS CLI @@ -604,7 +604,7 @@ gsutil -m rsync -r -d **CASSANDRA_DATA_DIR**/**KEYSPACE_NAME**/{asterisk}{asteri + Replace the following: + -include::sideloader:partial$sideloader-partials.adoc[tags=command-placeholders-common] +include::sideloader:partial$command-placeholders-common.adoc[] + .Example: Upload a snapshot with gcloud and gsutil @@ -743,7 +743,7 @@ If one step fails, then the entire import operation stops and the migration fail .What happens during data import? [%collapsible] ====== -include::sideloader:partial$sideloader-partials.adoc[tags=import] +include::sideloader:partial$import.adoc[] ====== [WARNING] @@ -752,7 +752,7 @@ include::sideloader:partial$sideloader-partials.adoc[tags=import] For commands to monitor upload progress and compare uploaded data against the original snapshots, see xref:sideloader:migrate-sideloader.adoc#upload-snapshots-to-migration-directory[Upload snapshots to the migration directory]. * If necessary, you can xref:sideloader:stop-restart-sideloader.adoc[pause or abort the migration] during the import process. -include::sideloader:partial$sideloader-partials.adoc[tags=no-return] +include::sideloader:partial$no-return.adoc[] ==== . Use the {devops-api} to launch the data import: @@ -769,7 +769,7 @@ Although this call returns immediately, the import process takes time. . Check the migration status periodically: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . Check the `status` field in the response: + @@ -784,7 +784,7 @@ Wait a few minutes before you check the status again. [#validate-the-migrated-data] == Validate the migrated data -include::sideloader:partial$sideloader-partials.adoc[tags=validate] +include::sideloader:partial$validate.adoc[] == See also diff --git a/modules/sideloader/pages/sideloader-overview.adoc b/modules/sideloader/pages/sideloader-overview.adoc index dc3cbce9..1b6cd07b 100644 --- a/modules/sideloader/pages/sideloader-overview.adoc +++ b/modules/sideloader/pages/sideloader-overview.adoc @@ -61,7 +61,7 @@ For specific requirements and more information, see xref:sideloader:migrate-side === Initialize a migration -include::sideloader:partial$sideloader-partials.adoc[tags=initialize] +include::sideloader:partial$initialize.adoc[] For instructions and more information, see xref:sideloader:migrate-sideloader.adoc#initialize-migration[Migrate data with {sstable-sideloader}: Initialize the migration]. @@ -105,17 +105,17 @@ In this case, consider creating your target database in a co-located datacenter, === Import data -include::sideloader:partial$sideloader-partials.adoc[tags=import] +include::sideloader:partial$import.adoc[] For instructions and more information, see xref:sideloader:migrate-sideloader.adoc#import-data[Migrate data with {sstable-sideloader}: Import data] === Validate imported data -include::sideloader:partial$sideloader-partials.adoc[tags=validate] +include::sideloader:partial$validate.adoc[] == Use {sstable-sideloader} with {product-proxy} -include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] +include::sideloader:partial$sideloader-zdm.adoc[] == Next steps diff --git a/modules/sideloader/pages/sideloader-zdm.adoc b/modules/sideloader/pages/sideloader-zdm.adoc index 8a9d340e..1111f833 100644 --- a/modules/sideloader/pages/sideloader-zdm.adoc +++ b/modules/sideloader/pages/sideloader-zdm.adoc @@ -22,4 +22,4 @@ For more information and instructions, see xref:sideloader:sideloader-overview.a You can use {sstable-sideloader} alone or with {product-proxy}. -include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] \ No newline at end of file +include::sideloader:partial$sideloader-zdm.adoc[] \ No newline at end of file diff --git a/modules/sideloader/pages/stop-restart-sideloader.adoc b/modules/sideloader/pages/stop-restart-sideloader.adoc index a35faad3..e4625b00 100644 --- a/modules/sideloader/pages/stop-restart-sideloader.adoc +++ b/modules/sideloader/pages/stop-restart-sideloader.adoc @@ -49,12 +49,12 @@ curl -X POST \ | jq . ---- + -include::sideloader:partial$sideloader-partials.adoc[tags=no-return] +include::sideloader:partial$no-return.adoc[] For more information about what happens during each phase of a migration and the point of no return, see xref:sideloader:sideloader-overview.adoc[]. . Wait a few minutes, and then check the migration status to confirm that the migration stopped: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] == Retry a failed migration diff --git a/modules/sideloader/pages/troubleshoot-sideloader.adoc b/modules/sideloader/pages/troubleshoot-sideloader.adoc index 2e96a1ec..cb74602a 100644 --- a/modules/sideloader/pages/troubleshoot-sideloader.adoc +++ b/modules/sideloader/pages/troubleshoot-sideloader.adoc @@ -18,7 +18,7 @@ If your credentials expire, do the following: . Use the `MigrationStatus` endpoint to generate new credentials: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . Continue the migration with the fresh credentials. + @@ -48,7 +48,7 @@ For more information, see <>. . Check the migration status for an error message related to the failure: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . If possible, resolve the issue described in the error message. + diff --git a/modules/sideloader/partials/check-status.adoc b/modules/sideloader/partials/check-status.adoc new file mode 100644 index 00000000..74428f06 --- /dev/null +++ b/modules/sideloader/partials/check-status.adoc @@ -0,0 +1,11 @@ +[source,bash] +---- +curl -X GET \ + -H "Authorization: Bearer ${token}" \ + https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \ + | jq . +---- ++ +A successful response contains a `MigrationStatus` object. +It can take a few minutes for the {devops-api} to reflect status changes during a migration. +Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status. \ No newline at end of file diff --git a/modules/sideloader/partials/command-placeholders-common.adoc b/modules/sideloader/partials/command-placeholders-common.adoc new file mode 100644 index 00000000..69ccfdd2 --- /dev/null +++ b/modules/sideloader/partials/command-placeholders-common.adoc @@ -0,0 +1,7 @@ +* *`CASSANDRA_DATA_DIR`*: The absolute file system path to where {cass-short} data is stored on the node. +For example, `/var/lib/cassandra/data`. +* *`KEYSPACE_NAME`*: The name of the keyspace that contains the tables you want to migrate. +* *`SNAPSHOT_NAME`*: The name of the xref:sideloader:migrate-sideloader.adoc#create-snapshots[snapshot backup] that you created with `nodetool snapshot`. +* *`MIGRATION_DIR`*: The entire `uploadBucketDir` value that was generated when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration], including the trailing slash. +* *`NODE_NAME`*: The host name of the node that your snapshots are from. +It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket. \ No newline at end of file diff --git a/modules/sideloader/partials/import.adoc b/modules/sideloader/partials/import.adoc new file mode 100644 index 00000000..b164d764 --- /dev/null +++ b/modules/sideloader/partials/import.adoc @@ -0,0 +1,35 @@ +After uploading the snapshots to the migration directory, use the {devops-api} to start the data import process. + +During the import process, {sstable-sideloader} does the following: + +. Revokes access to the migration directory. ++ +You cannot read or write to the migration directory after starting the data import process. + +. Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets. + +. Runs validation checks on each subset. + +. Converts all SSTables of each subset. + +. Disables new compactions on the target database. ++ +[WARNING] +==== +This is the last point at which you can xref:sideloader:stop-restart-sideloader.adoc#abort-migration[abort the migration]. + +Once {sstable-sideloader} begins to import SSTable metadata (the next step), you cannot stop the migration. +==== + +. Imports metadata from each SSTable. ++ +If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. +Since compaction is disabled, there is no risk of permanent inconsistencies. +However, in the context of xref:ROOT:introduction.adoc[{product}], it's important that the {product-short} proxy continues to read from the origin cluster. + +. Re-enables compactions on the {astra-db} Serverless database. + +Each step must finish successfully. +If one step fails, the import operation stops and no data is imported into your target database. + +If all steps finish successfully, the migration is complete and you can access the imported data in your target database. \ No newline at end of file diff --git a/modules/sideloader/partials/initialize.adoc b/modules/sideloader/partials/initialize.adoc new file mode 100644 index 00000000..3e288f43 --- /dev/null +++ b/modules/sideloader/partials/initialize.adoc @@ -0,0 +1,24 @@ +After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the {astra} {devops-api} to initialize the migration. + +.{sstable-sideloader} moves data from the migration bucket to {astra-db}. +svg::sideloader:data-importer-workflow.svg[] + +When you initialize a migration, {sstable-sideloader} does the following: + +. Creates a secure migration bucket. ++ +The migration bucket is only created during the first initialization. +All subsequent migrations use different directories in the same migration bucket. ++ +{company} owns the migration bucket, and it is located within the {astra} perimeter. + +. Generates a migration ID that is unique to the new migration. + +. Creates a migration directory within the migration bucket that is unique to the new migration. ++ +The migration directory is also referred to as the `uploadBucketDir`. +In the next phase of the migration process, you will upload your snapshots to this migration directory. + +. Generates upload credentials that grant read/write access to the migration directory. ++ +The credentials are formatted according to the cloud provider where your target database is deployed. \ No newline at end of file diff --git a/modules/sideloader/partials/no-return.adoc b/modules/sideloader/partials/no-return.adoc new file mode 100644 index 00000000..8561543b --- /dev/null +++ b/modules/sideloader/partials/no-return.adoc @@ -0,0 +1,2 @@ +You can abort a migration up until the point at which {sstable-sideloader} starts importing SSTable metadata. +After this point, you must wait for the migration to finish, and then you can use `cqlsh` to drop the keyspace/table in your target database before repeating the entire migration procedure. \ No newline at end of file diff --git a/modules/sideloader/partials/sideloader-partials.adoc b/modules/sideloader/partials/sideloader-partials.adoc deleted file mode 100644 index f465a929..00000000 --- a/modules/sideloader/partials/sideloader-partials.adoc +++ /dev/null @@ -1,107 +0,0 @@ -// tag::check-status[] -[source,bash] ----- -curl -X GET \ - -H "Authorization: Bearer ${token}" \ - https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \ - | jq . ----- -+ -A successful response contains a `MigrationStatus` object. -It can take a few minutes for the {devops-api} to reflect status changes during a migration. -Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status. -// end::check-status[] - -// tag::command-placeholders-common[] -* *`CASSANDRA_DATA_DIR`*: The absolute file system path to where {cass-short} data is stored on the node. -For example, `/var/lib/cassandra/data`. -* *`KEYSPACE_NAME`*: The name of the keyspace that contains the tables you want to migrate. -* *`SNAPSHOT_NAME`*: The name of the xref:sideloader:migrate-sideloader.adoc#create-snapshots[snapshot backup] that you created with `nodetool snapshot`. -* *`MIGRATION_DIR`*: The entire `uploadBucketDir` value that was generated when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration], including the trailing slash. -* *`NODE_NAME`*: The host name of the node that your snapshots are from. -It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket. -// end::command-placeholders-common[] - -// tag::validate[] -After the migration is complete, you can query the migrated data using the xref:astra-db-serverless:cql:develop-with-cql.adoc#connect-to-the-cql-shell[cqlsh] or xref:astra-db-serverless:api-reference:row-methods/find-many.adoc[{data-api}]. - -You can xref:ROOT:cassandra-data-migrator.adoc#cdm-validation-steps[run {cass-migrator} ({cass-migrator-short}) in validation mode] for more thorough validation. -{cass-migrator-short} also offers an AutoCorrect mode to reconcile any differences that it detects. -// end::validate[] - -// tag::initialize[] -After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the {astra} {devops-api} to initialize the migration. - -.{sstable-sideloader} moves data from the migration bucket to {astra-db}. -svg::sideloader:data-importer-workflow.svg[] - -When you initialize a migration, {sstable-sideloader} does the following: - -. Creates a secure migration bucket. -+ -The migration bucket is only created during the first initialization. -All subsequent migrations use different directories in the same migration bucket. -+ -{company} owns the migration bucket, and it is located within the {astra} perimeter. - -. Generates a migration ID that is unique to the new migration. - -. Creates a migration directory within the migration bucket that is unique to the new migration. -+ -The migration directory is also referred to as the `uploadBucketDir`. -In the next phase of the migration process, you will upload your snapshots to this migration directory. - -. Generates upload credentials that grant read/write access to the migration directory. -+ -The credentials are formatted according to the cloud provider where your target database is deployed. -// end::initialize[] - -// tag::import[] -After uploading the snapshots to the migration directory, use the {devops-api} to start the data import process. - -During the import process, {sstable-sideloader} does the following: - -. Revokes access to the migration directory. -+ -You cannot read or write to the migration directory after starting the data import process. - -. Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets. - -. Runs validation checks on each subset. - -. Converts all SSTables of each subset. - -. Disables new compactions on the target database. -+ -[WARNING] -==== -This is the last point at which you can xref:sideloader:stop-restart-sideloader.adoc#abort-migration[abort the migration]. - -Once {sstable-sideloader} begins to import SSTable metadata (the next step), you cannot stop the migration. -==== - -. Imports metadata from each SSTable. -+ -If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. -Since compaction is disabled, there is no risk of permanent inconsistencies. -However, in the context of xref:ROOT:introduction.adoc[{product}], it's important that the {product-short} proxy continues to read from the origin cluster. - -. Re-enables compactions on the {astra-db} Serverless database. - -Each step must finish successfully. -If one step fails, the import operation stops and no data is imported into your target database. - -If all steps finish successfully, the migration is complete and you can access the imported data in your target database. -// end::import[] - -// tag::no-return[] -You can abort a migration up until the point at which {sstable-sideloader} starts importing SSTable metadata. -After this point, you must wait for the migration to finish, and then you can use `cqlsh` to drop the keyspace/table in your target database before repeating the entire migration procedure. -// end::no-return[] - -// tag::sideloader-zdm[] -If you need to migrate a live database, you can use {sstable-sideloader} instead of {dsbulk-migrator} or {cass-migrator} during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product}]. - -.Use {sstable-sideloader} with {product-proxy} -svg::sideloader:astra-migration-toolkit.svg[] -// end::sideloader-zdm[] \ No newline at end of file diff --git a/modules/sideloader/partials/sideloader-zdm.adoc b/modules/sideloader/partials/sideloader-zdm.adoc new file mode 100644 index 00000000..bf4fd583 --- /dev/null +++ b/modules/sideloader/partials/sideloader-zdm.adoc @@ -0,0 +1,4 @@ +If you need to migrate a live database, you can use {sstable-sideloader} instead of {dsbulk-migrator} or {cass-migrator} during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product}]. + +.Use {sstable-sideloader} with {product-proxy} +svg::sideloader:astra-migration-toolkit.svg[] \ No newline at end of file diff --git a/modules/sideloader/partials/validate.adoc b/modules/sideloader/partials/validate.adoc new file mode 100644 index 00000000..ac94e778 --- /dev/null +++ b/modules/sideloader/partials/validate.adoc @@ -0,0 +1,4 @@ +After the migration is complete, you can query the migrated data using the xref:astra-db-serverless:cql:develop-with-cql.adoc#connect-to-the-cql-shell[cqlsh] or xref:astra-db-serverless:api-reference:row-methods/find-many.adoc[{data-api}]. + +You can xref:ROOT:cassandra-data-migrator.adoc#cdm-validation-steps[run {cass-migrator} ({cass-migrator-short}) in validation mode] for more thorough validation. +{cass-migrator-short} also offers an AutoCorrect mode to reconcile any differences that it detects. \ No newline at end of file From 58bf3830412fa3b37d2b24e934e4bb11037c8772 Mon Sep 17 00:00:00 2001 From: April M Date: Tue, 26 Aug 2025 11:45:32 -0700 Subject: [PATCH 3/3] delete unused images --- .../ROOT/images/migration-introduction9.png | Bin 129088 -> 0 bytes modules/ROOT/images/migration-phase1.png | Bin 720783 -> 0 bytes modules/ROOT/images/migration-phase1ra9.png | Bin 131206 -> 0 bytes modules/ROOT/images/migration-phase2.png | Bin 789144 -> 0 bytes modules/ROOT/images/migration-phase2ra9.png | Bin 146318 -> 0 bytes modules/ROOT/images/migration-phase2ra9a.png | Bin 235624 -> 0 bytes modules/ROOT/images/migration-phase3.png | Bin 739725 -> 0 bytes modules/ROOT/images/migration-phase4.png | Bin 713133 -> 0 bytes modules/ROOT/images/migration-phase4ra.png | Bin 97834 -> 0 bytes modules/ROOT/images/migration-phase5.png | Bin 676195 -> 0 bytes modules/ROOT/images/migration-phase5ra9.png | Bin 114542 -> 0 bytes modules/ROOT/images/pre-migration0.png | Bin 463547 -> 0 bytes modules/ROOT/images/pre-migration0ra9.png | Bin 97942 -> 0 bytes .../ROOT/images/zdm-ansible-container-ls.png | Bin 467014 -> 0 bytes .../ROOT/images/zdm-go-utility-results1.png | Bin 166276 -> 0 bytes .../ROOT/images/zdm-go-utility-results2.png | Bin 97031 -> 0 bytes modules/ROOT/images/zdm-go-utility-success.png | Bin 643978 -> 0 bytes .../images/zdm-migration-before-starting.png | Bin 336007 -> 0 bytes modules/ROOT/images/zdm-migration-phase1.png | Bin 513383 -> 0 bytes modules/ROOT/images/zdm-migration-phase2.png | Bin 601046 -> 0 bytes modules/ROOT/images/zdm-migration-phase3.png | Bin 573686 -> 0 bytes modules/ROOT/images/zdm-migration-phase4.png | Bin 581612 -> 0 bytes modules/ROOT/images/zdm-migration-phase5.png | Bin 602427 -> 0 bytes .../zdm-provision-infrastructure-terraform.png | Bin 506820 -> 0 bytes modules/ROOT/images/zdm-token-management1.png | Bin 205086 -> 0 bytes modules/ROOT/images/zdm-tokens-generated.png | Bin 164856 -> 0 bytes modules/ROOT/images/zdm-workflow3.png | Bin 413052 -> 0 bytes modules/ROOT/pages/introduction.adoc | 8 ++++---- 28 files changed, 4 insertions(+), 4 deletions(-) delete mode 100644 modules/ROOT/images/migration-introduction9.png delete mode 100644 modules/ROOT/images/migration-phase1.png delete mode 100644 modules/ROOT/images/migration-phase1ra9.png delete mode 100644 modules/ROOT/images/migration-phase2.png delete mode 100644 modules/ROOT/images/migration-phase2ra9.png delete mode 100644 modules/ROOT/images/migration-phase2ra9a.png delete mode 100644 modules/ROOT/images/migration-phase3.png delete mode 100644 modules/ROOT/images/migration-phase4.png delete mode 100644 modules/ROOT/images/migration-phase4ra.png delete mode 100644 modules/ROOT/images/migration-phase5.png delete mode 100644 modules/ROOT/images/migration-phase5ra9.png delete mode 100644 modules/ROOT/images/pre-migration0.png delete mode 100644 modules/ROOT/images/pre-migration0ra9.png delete mode 100644 modules/ROOT/images/zdm-ansible-container-ls.png delete mode 100644 modules/ROOT/images/zdm-go-utility-results1.png delete mode 100644 modules/ROOT/images/zdm-go-utility-results2.png delete mode 100644 modules/ROOT/images/zdm-go-utility-success.png delete mode 100644 modules/ROOT/images/zdm-migration-before-starting.png delete mode 100644 modules/ROOT/images/zdm-migration-phase1.png delete mode 100644 modules/ROOT/images/zdm-migration-phase2.png delete mode 100644 modules/ROOT/images/zdm-migration-phase3.png delete mode 100644 modules/ROOT/images/zdm-migration-phase4.png delete mode 100644 modules/ROOT/images/zdm-migration-phase5.png delete mode 100644 modules/ROOT/images/zdm-provision-infrastructure-terraform.png delete mode 100644 modules/ROOT/images/zdm-token-management1.png delete mode 100644 modules/ROOT/images/zdm-tokens-generated.png delete mode 100644 modules/ROOT/images/zdm-workflow3.png diff --git a/modules/ROOT/images/migration-introduction9.png b/modules/ROOT/images/migration-introduction9.png deleted file mode 100644 index 93bbb9d71192493ecc19b6422d50b104e9359ccc..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 129088 zcmeFZXIPZWwk`^YA|jwL1(eVt2qH-&rv^nOgNl-qAd+L#rx(mh^iHQ9ziAV{L zfIm0DAMjsOh>6aDzljNRDI|aWl*}jP++X8!MT7@!GMX)jh-8TrA3b>CM!cRVJ;v5D zS|RL3+I=rpu<6pTd?IaM^ZVv+G>zY!7vXL!C6P*Hi@8RW%69YJMKF^}D`S*ASI1&y zm75VYz`Nh0)5cmpU;rOBFozqwG&nn)DWs5mN6LPam|Q`Ygyyr)e>3DsY@B3N=J&-Z z|Gz!?=TeRQq*BD|N&jC~RxrK|o~;f0RL1b%i||hw!Mf1@tm{L-_dhrQo&66E|FFpa z;P4+D{*}Q0qYwYlhyUoq{~6f)$1eV37yoJ(Q-6Hlxaf5>>t9n;f3oUA6;ld+>if1a z+Q=J)oq0$7NqFrf2VIy4{=jfjbbmB0_*Oqs)T}d7iqw{dafRPVBJukxSCptep7MqoSRbhIU6K)qf(MZdDWot zobmO?K;Q@>PGm9^QH@2%+jDd+MYRr2==F)(lPjdKr=X*RPI*BdVXO*K25>4K4(kMl61aInA%E2yGz`O3RsTP5qm=U1TvG)Kn@Prd~5-Ti7 zT?BS=VSlS%U1oy?>a}dpD40!Q@VRk_++lDeoKyY4qfC>|3ZrX1$-2+O{L_}u76SU-@S--X|(UcZ1P2gNw`owu6w%@f(KM4b=IAwY5~AbD=y@DD{< z5^YP7zB9%xL_A$&b}+*4NDQ8Q;kNr_eUD(kLrT z!2#cHqz@_By`5?^o{;Wkg59gNX?7CEf155o-<23Lb(hzZY%3u*QnJsDertEB#}#?f zpNw9Js|UMdWNS=YZouQPlhiV&#bd6-c__y59Nt3GH!1*V$HnD10JH=EdNp@|ga9-L z0J_S_Srq`yOaOZIuI(HN0J_;nT{u~?4~jqC9ZYf-j||ub3N?OZcin zGy0Gix%Hhaj|cTnAhp?netj9t><5rO1dvh`h)n_mrXwi%>++Q{u!E;0z;ZrlcBxJr zP|d^}%Or7alHk@$do(@!7R8870T8g8jW130+(c1vlKyq75Fn!PFoHnxjr$j~QNyOT zY6Nr#^bh~c^M~1ruXWY2^tPZEm>XdQ5=p~2KtYMF#69JHd&(r^gvDy(Pg?@%Yve;& z3MzhEVXX3o-YObD0~Yf5Wpp>tHwsQ*A+;oMBLL+eUgLK}FSHz&HY6 zT3*Id0qURtI>uME<7@zcVFHpnzRYX_z`P|;2fe29jqUk1Y>kh* zOU*xxdrRTOyeFvu$7f7nq!a8bMEXr>oG~v^RDG(?l(^$2Zga(Mo}YMlAYY{8+cJ0M^+Gv!16w zcDwjGAd5G-V{d`{f;RzKq|i8n9gbWE2Pyc{_5o1A>sNr#-9)<+`c+sYa_z@NZC~;* z`6=*Z;|ToOVGYFl=vCXKEk_kz=y`$od9O4VZkc{SjX zlOA?f5s>EtBxu;$Ge8*seI9J4F6g&}24KOrtNU5W-eb~UN55-)=wFyu-}No0naM5T zR8Lypwyky7uS^5`-lf9CjLy!R;3z-RZX;1qpK#lV4^7T)=NwJ5uL$>{RZNj;0!?4+4%r zMI8`eQi(X&B+W%)VNrH{fEZ0ol24qZv?+{WWmgCq88NuXPg;A zGd!I3FK}L(hC*Zv_ch1|CiFH8(BORoz>wlT;t9N}0@OP(q#%kg{@XNxVwvk&bO0G< zyd1Un-i@E^k-^s@@#r-9&?t004v4~i3&<_?#kh;0>D*(lct4TS5h6FY^Q-m1T<(Bl z{+g?74$kufh}`H6eX9wP+b2TU@a)?ClF~oj7}x)gZtN=$IKZ#DYxso45TU8_r>Doe zlgYTftlM-Cl7SSFS7x5=O(XGV7=MVj_C}4*;6&tlKAjv1H_*bu6dK;QKxzgcwMnj$ zJb3;ZuvD$50dzn)wJ#GqnaN#K6M0}pCXE`gJww1peIA&6)_t80`85QiZIgI4bRd>TM`Sx4isFrEHu=T4iNJvB{wY6Rm!yXG8ug2q;zQ zrCw%W=7T`mms&+1!hQZNmeT75^K-L=py?HKce-PDELqyKVwZ2`csPxzfvse3cz~ZxD@ef)l7_h-WK#F|1)y|p#UNM4K_>s4A z>#*+J(JilCh10Fsrb%FkAS~X^rmFkg_=bETw}ijp)QJ#-7eoQt9?LB^{x`n}JbRJs zB%p0N5IY#WfUo3(6LlZ~f`8EJM`5Px9PO0)yZiw$$s8|)AB&}P0<<{D9bDJjAjn$J zlf9MP&LVVSwFhqkDV-cxLgnR3HdulLD8;3_y|;iR1_DCi%gl9t0i-4-NPQ`HHUE|y z$q|puuF~FjGRQjf$y^{mdNF1SJRQ%r72R-ZcDPYFjR9#--e8o$U!a5lP^#|szXG6q z2cVqSN-89PLZFqx4rcC^K@wBbEq)mcnhD0J)$uxU~ayUfnse233}%0X)yH_9c9aP()pNp;;b7u1b(x!;yu?I$tJ6d&aUH=aOUrm4fwOo$B{XbmY&1H z32Di8x9W66hBy(27DJCC5j^HKk`0^J-#?Dp5>}WQBP9f^HRYA^!mgDk63`*L-&3Xt zhVcON&y5k_+_+W@-(X{&yskxy9o}2QeEl|6V#g4r3_rZ5UTFwYk-DJih@8QRmF2H_ zy*gt2wo{f8_jQxvfVm0J?r!9vmP+$9sF;C^yXnhCkTz>URBjx<%iK75(S_Id@VQgl zv+2YXWq!_3)|rr+34`*zo#h77A(fflG(a<6zY!@hG=z zV##DZ_j`S{KV#_i{dI}%Iu>hKSYrkdS($taS{LW(!F(@?_no@V9m^w6C7|FD&PAqr zUDDmC&Gwi!jT>?#^EAKN;E6iZ{gyzyTk7=!LB-OOc4)AG-wHNPnUz*!E{2n z;E9tsG3LoU;#~rJMhPG9J@=**R^RDbqp(AwC7*ZUKiWE;w6(XJOi}*wd|v{yFu7Av z%8pda{Q*-@@K#JN^WOql{i2bx_8_xnK+u?<4 zZbt1zYC-l}YNt*C_8CR+F^uf0@fd76@7?6lgVD=`JYp)m<^b_oq!5kEo$pO5x8{>j zPHxJQ9Q|!$l0iq?bJI5mWL+Ax&`EFcR_6q4r(2>o3Eff%OoL){x_@&)?B=+ElCc%Z zC|OqbQq%$S60S-nDxmu{>cG(s)stGlC5q5Ri^h<>%J1Y{>u|v6Qh z>u=@EPdv9C9Oyh$<<(aMp^K+@#$)ZHX}ha-s|)R<_piu^&r;G{zJAX+6jdd&JTsHF zetC_yF{6$t`!#vArdWhuS;E9ID7P3Yf2te7=$X`9Q)-~jkcR)3HOw$V99McdB?1m= zl&vV*Q2jE^YxBH5Bcf!Wrat^CASBOchmS(A(6=tefb{@}4W~#+Q41dqhu81B(Tgkv z!4#X?qen~=rlhIt-7*+7NZ3H6F8^NqEtjyXepf*#dG(>&)bR86I7O3pq=gRvWn5UC zb&rsknfQ@+;<+kT=~+E-HWF3BcQP)0$phKT4pUuS-#Y7ynZHV~PkHN#C}5xCxv1X) z5`h|$@f4*}lE~VOf(KCJo>I`?>P{#?1eKaCaVH7Dtb{|WJi>ER& z^QT_dWCgn7fa~P#@;zne>6^aRAX+MV;O@o}0z;z4UGp9J#&qvLHoBfBcwiR)fMgh? zY=-eiv!_ak8HSo`tESsK3?GCRk9co1ofLHO4tVJ8^62za0d2YfGRGSc{!7nRWecvB zd`0^;&zgz*>pSsc*@q;jq<{4m{6$Pr1Y+v6y57#k@$f`q=F2yR7M0PG^3wupsgFh% z`XZQ35M;9M{zTilt~*f&9n%*LENj*+juUOm)khkssc4QJ9#&m8i`5IIG7ICW=n;2B zlh1@zG%U&bP~08iA7&buFtTzDjMKvmtK9m=KS1VShJV(2Sou$Pv@aFE7}Mj3;lIp( zeF^J#{A;UpE@y|0;vDy8G4eWI#E=jTFY;(Rswx0KzEd5d$$!(mfb=rwZ(!f|A8?o|&)f(5F+0sv>T(NLH*!cAmq!oo2Nl#%SRa^f*sSn1Pe69J z@ZC38}j&iCC&iWoV?Sk z0QbVp8?CsyyV`MkTQgg#k(;pqOE;z>wIEv$HS>&lXJmRJJ6cVfQlwuXo_QuwSeuxc8TWV0x6h z>hxi2lD*a84OmBB)A1dR5Z_phSB#e6-2Y+op-ZV~!$MRW7yd@tz@&#Ac32x1)_6oP zgn5nl;K!Rp7!GnxW}p(tatKwcN}aM~ zMYr)vhSV%l%UyKI)oO=04F7s0gyvhK(_&%UG#+<&c%a=g`Kpa%`!bET1o0S!7N6ZL zsI64iRTe73P`QVo{&MJg_6_MuPr7mFSk!Dg++ba#4EULElNBY9#N9Rzs4i9Hb%NfM zcyQ}Cz5O2t!#5imXLU{1KX?o1Ha*owmTuMG*F`u~UQE7)=Nq`l3kg%-oIp^h;4t zykD#yPWcKyYd5NK&N8e%{>YM}$0f?3%>&|}_oCDGkns~_(PfUBvX+z6lFRgAtT|^& z_ypw&b7Sg;lN}*bJM-Sq^3Ovx7?Rg$cAbHg1tZK-a46jX(_lRF1%VaztkDtEz+Xsu zFdvko15`-!PI5=TtIBVsB?E*!fIOmqrI&zC)isUI7JV0(%lYbY5rfVhSs zi_0_z6>ZJ8-(SbKVd#oQw^-X^7^&5DrpKm4{>+=KkCFDy@5GxfTy!|JJZ6ce{KNg} zRTN{v<^9MPsKAt<@ac(`BTUUEMuxXy;=|{sV`kD?-B+}yl>&`M&=^x_d2-3xa}s5? z#`>OXAQ1MpG!tJaiW|?=u-3^5PS*}YyS$smK6D|guSYH=`N^Ok0rS(a)t`J|oW8SJ-?Zp9s4 zO4s!P)8tRR$qEy7SW^^{D)@GJF#(2}nx={Q)Tk-sde^(M42$L!6y03Ax?Fg3OvWX$ z9WEJQgc{~al5+osd?6VA#JTEJer7fN5A7pDG}-CH#_Z@iI0S{RfwqrF@-uPE7A30B z-xAxnUM0V~y>kLEaB)5=2@X{=nM~VjyL53Y8_wrotWz0$?GJk+QsGL2aO*++y*R|l zQ`xo}PFbc;b0Xv_z#J1v*KP#?U3PEw=4cX2dH`Fob}zrQmMnwBb+{##R6HQ51c}wu z+Y_lB{Xq16PTzVK3jN>l?lc{S(Rgi-N@e5AfaMteh_*}{O1Ry#^bKNe7c(~V{+>j_ z@i)whX^}pUx>wm40Xh(g7{;%>+|ir0?V`)^Ckkc3Nc>DHsMMQO{RBY_w6Jbgt8GIO zM#t5y&X6o`o@a;NvshcR+EYGAmc9&0+$l|w{V2Snt8FqWG-pkjf{|s{(drJO)$aib zjI9wmHf*?rUPIvkGcIFvg)3`1XR~q9V}*3d<6$bIfIVCZzs$oA?aPVuOr4H^cJLP&CS_1i>BG)}DNd_SHr z3P?@c*GBC(NOGN8-$E+RTp*aBXrqK#6qKZzw!iB==5=R%zP|YDqe!CAyK`U4;4@aa zn8Zt1Ns3k5Lea-*mccDB?rc5#DBVFFl*3YtaTg&D@2+98DmVYL*$+Q?v z3>z_rll$xWII?0ZJA9JXYG&McPdhJ2?f`Oh?Pg;-@#r_}Dk!(hLT48t??}-Ey;D}5 zXbmm5#Ms`6%OmvzcI98HYx=9)l5e-~m%5<(uUa8f19Xgk_p_ps5Z?aUi@=pFJwCl< zkC7twy4qmIfdh?7*GW?mms;gcP?;ob$ITz2WqUuy!20R`xB%>NP}Fdl^J_MfJIfnhKx zii>K_(t&PC6&E2)Q6&zIKAvQpAMNXVe$gR|IaFajJ@C&`JukVG$DT#9^#5H=y>=B9 z>f;^R_6E1jN6ez!QR7ob;U+j}_u%$DWs-zb1c-z%@%Htx8AwyH40OnYXN;hT#Q