diff --git a/modules/ROOT/images/migration-introduction9.png b/modules/ROOT/images/migration-introduction9.png deleted file mode 100644 index 93bbb9d7..00000000 Binary files a/modules/ROOT/images/migration-introduction9.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase1.png b/modules/ROOT/images/migration-phase1.png deleted file mode 100644 index 206f4924..00000000 Binary files a/modules/ROOT/images/migration-phase1.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase1ra9.png b/modules/ROOT/images/migration-phase1ra9.png deleted file mode 100644 index 0fa032a8..00000000 Binary files a/modules/ROOT/images/migration-phase1ra9.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase2.png b/modules/ROOT/images/migration-phase2.png deleted file mode 100644 index 7df91760..00000000 Binary files a/modules/ROOT/images/migration-phase2.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase2ra9.png b/modules/ROOT/images/migration-phase2ra9.png deleted file mode 100644 index 83794547..00000000 Binary files a/modules/ROOT/images/migration-phase2ra9.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase2ra9a.png b/modules/ROOT/images/migration-phase2ra9a.png deleted file mode 100644 index 3bdb1f3e..00000000 Binary files a/modules/ROOT/images/migration-phase2ra9a.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase3.png b/modules/ROOT/images/migration-phase3.png deleted file mode 100644 index 8936d815..00000000 Binary files a/modules/ROOT/images/migration-phase3.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase4.png b/modules/ROOT/images/migration-phase4.png deleted file mode 100644 index b75ac272..00000000 Binary files a/modules/ROOT/images/migration-phase4.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase4ra.png b/modules/ROOT/images/migration-phase4ra.png deleted file mode 100644 index 94f2ddaf..00000000 Binary files a/modules/ROOT/images/migration-phase4ra.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase5.png b/modules/ROOT/images/migration-phase5.png deleted file mode 100644 index a72768db..00000000 Binary files a/modules/ROOT/images/migration-phase5.png and /dev/null differ diff --git a/modules/ROOT/images/migration-phase5ra9.png b/modules/ROOT/images/migration-phase5ra9.png deleted file mode 100644 index 0a67617c..00000000 Binary files a/modules/ROOT/images/migration-phase5ra9.png and /dev/null differ diff --git a/modules/ROOT/images/pre-migration0.png b/modules/ROOT/images/pre-migration0.png deleted file mode 100644 index 8e823e02..00000000 Binary files a/modules/ROOT/images/pre-migration0.png and /dev/null differ diff --git a/modules/ROOT/images/pre-migration0ra9.png b/modules/ROOT/images/pre-migration0ra9.png deleted file mode 100644 index 318da488..00000000 Binary files a/modules/ROOT/images/pre-migration0ra9.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-ansible-container-ls.png b/modules/ROOT/images/zdm-ansible-container-ls.png deleted file mode 100644 index 9e073f7e..00000000 Binary files a/modules/ROOT/images/zdm-ansible-container-ls.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-go-utility-results1.png b/modules/ROOT/images/zdm-go-utility-results1.png deleted file mode 100644 index 0a753087..00000000 Binary files a/modules/ROOT/images/zdm-go-utility-results1.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-go-utility-results2.png b/modules/ROOT/images/zdm-go-utility-results2.png deleted file mode 100644 index 0ba87052..00000000 Binary files a/modules/ROOT/images/zdm-go-utility-results2.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-go-utility-success.png b/modules/ROOT/images/zdm-go-utility-success.png deleted file mode 100644 index f3df85da..00000000 Binary files a/modules/ROOT/images/zdm-go-utility-success.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-migration-before-starting.png b/modules/ROOT/images/zdm-migration-before-starting.png deleted file mode 100644 index 9455c8b1..00000000 Binary files a/modules/ROOT/images/zdm-migration-before-starting.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-migration-phase1.png b/modules/ROOT/images/zdm-migration-phase1.png deleted file mode 100644 index c1f7387d..00000000 Binary files a/modules/ROOT/images/zdm-migration-phase1.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-migration-phase2.png b/modules/ROOT/images/zdm-migration-phase2.png deleted file mode 100644 index cc22d173..00000000 Binary files a/modules/ROOT/images/zdm-migration-phase2.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-migration-phase3.png b/modules/ROOT/images/zdm-migration-phase3.png deleted file mode 100644 index 3d3a3df9..00000000 Binary files a/modules/ROOT/images/zdm-migration-phase3.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-migration-phase4.png b/modules/ROOT/images/zdm-migration-phase4.png deleted file mode 100644 index 7ec2d655..00000000 Binary files a/modules/ROOT/images/zdm-migration-phase4.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-migration-phase5.png b/modules/ROOT/images/zdm-migration-phase5.png deleted file mode 100644 index 6dd29692..00000000 Binary files a/modules/ROOT/images/zdm-migration-phase5.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-provision-infrastructure-terraform.png b/modules/ROOT/images/zdm-provision-infrastructure-terraform.png deleted file mode 100644 index 47902c43..00000000 Binary files a/modules/ROOT/images/zdm-provision-infrastructure-terraform.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-token-management1.png b/modules/ROOT/images/zdm-token-management1.png deleted file mode 100644 index 40a432cd..00000000 Binary files a/modules/ROOT/images/zdm-token-management1.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-tokens-generated.png b/modules/ROOT/images/zdm-tokens-generated.png deleted file mode 100644 index cc63aeb9..00000000 Binary files a/modules/ROOT/images/zdm-tokens-generated.png and /dev/null differ diff --git a/modules/ROOT/images/zdm-workflow3.png b/modules/ROOT/images/zdm-workflow3.png deleted file mode 100644 index 2dcd5edd..00000000 Binary files a/modules/ROOT/images/zdm-workflow3.png and /dev/null differ diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index a467d4e8..576cc572 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -5,349 +5,4 @@ //This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav. -// tag::body[] -{description} -It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following: - -* Logging and run tracking -* Automatic reconciliation -* Performance tuning -* Record filtering -* Column renaming -* Support for advanced data types, including sets, lists, maps, and UDTs -* Support for SSL, including custom cipher algorithms -* Use `writetime` timestamps to maintain chronological write history -* Use Time To Live (TTL) values to maintain data lifecycles - -For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. - -== {cass-migrator} requirements - -To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas. - -== {cass-migrator-short} with {product-proxy} - -You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. - -When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. - -Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. - -For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`. - -== Install {cass-migrator} - -{company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes. - -[tabs] -====== -Install as a container:: -+ --- -Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. - -The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`. --- - -Install as a JAR file:: -+ --- -. Install Java 11 or later, which includes Spark binaries. - -. Install https://spark.apache.org/downloads.html[Apache Spark(TM)] version 3.5.x with Scala 2.13 and Hadoop 3.3 and later. -+ -[tabs] -==== -Single VM:: -+ -For one-off migrations, you can install the Spark binary on a single VM where you will run the {cass-migrator-short} job. -+ -. Get the Spark tarball from the Apache Spark archive. -+ -[source,bash,subs="+quotes"] ----- -wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz ----- -+ -Replace `**PATCH**` with your Spark patch version. -+ -. Change to the directory where you want install Spark, and then extract the tarball: -+ -[source,bash,subs="+quotes"] ----- -tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz ----- -+ -Replace `**PATCH**` with your Spark patch version. - -Spark cluster:: -+ -For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a Spark cluster or Spark Serverless platform. -+ -If you deploy CDM on a Spark cluster, you must modify your `spark-submit` commands as follows: -+ -* Replace `--master "local[*]"` with the host and port for your Spark cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`. -* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`. -==== - -. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}. - -. Add the `cassandra-data-migrator` dependency to `pom.xml`: -+ -[source,xml,subs="+quotes"] ----- - - datastax.cdm - cassandra-data-migrator - **VERSION** - ----- -+ -Replace `**VERSION**` with your {cass-migrator-short} version. - -. Run `mvn install`. - -If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. --- -====== - -== Configure {cass-migrator-short} - -. Create a `cdm.properties` file. -+ -If you use a different name, make sure you specify the correct filename in your `spark-submit` commands. - -. Configure the properties for your environment. -+ -In the {cass-migrator-short} repository, you can find a {cass-migrator-repo}/blob/main/src/resources/cdm.properties[sample properties file with default values], as well as a {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. -+ -{cass-migrator-short} jobs process all uncommented parameters. -Any parameters that are commented out are ignored or use default values. -+ -If you want to reuse a properties file created for a previous {cass-migrator-short} version, make sure it is compatible with the version you are currently using. -Check the {cass-migrator-repo}/releases[{cass-migrator-short} release notes] for possible breaking changes in interim releases. -For example, the 4.x series of {cass-migrator-short} isn't backwards compatible with earlier properties files. - -. Store your properties file where it can be accessed while running {cass-migrator-short} jobs using `spark-submit`. - -[#migrate] -== Run a {cass-migrator-short} data migration job - -A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster. - -To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table. - -The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. -The migration job is specified in the `--class` argument. - -[tabs] -====== -Local installation:: -+ --- -[source,bash,subs="+quotes,+attributes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ ---class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -Spark cluster:: -+ --- -[source,bash,subs="+quotes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "spark://**MASTER_HOST**:**PORT**" \ ---class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--master`: Provide the URL of your Spark cluster. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- -====== - -This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console. - -For additional modifications to this command, see <>. - -[#cdm-validation-steps] -== Run a {cass-migrator-short} data validation job - -After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records. - -Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation. - -. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. -The data validation job is specified in the `--class` argument. -+ -[tabs] -====== -Local installation:: -+ --- -[source,bash,subs="+quotes,+attributes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ ---class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -Spark cluster:: -+ --- -[source,bash,subs="+quotes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "spark://**MASTER_HOST**:**PORT**" \ ---class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--master`: Provide the URL of your Spark cluster. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- -====== - -. Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries. -+ -The {cass-migrator-short} validation job records differences as `ERROR` entries in the log file, listed by primary key values. -For example: -+ -[source,plaintext] ----- -23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999) -23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3] -23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2] -23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2] ----- -+ -When validating large datasets or multiple tables, you might want to extract the complete list of missing or mismatched records. -There are many ways to do this. -For example, you can grep for all `ERROR` entries in your {cass-migrator-short} log files or use the `log4j2` example provided in the {cass-migrator-repo}?tab=readme-ov-file#steps-for-data-validation[{cass-migrator-short} repository]. - -=== Run a validation job in AutoCorrect mode - -Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect** mode, which offers the following functions: - -* `autocorrect.missing`: Add any missing records in the target with the value from the origin. - -* `autocorrect.mismatch`: Reconcile any mismatched records between the origin and target by replacing the target value with the origin value. -+ -[IMPORTANT] -==== -Timestamps have an effect on this function. - -If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster. - -This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster. -==== - -* `autocorrect.missing.counter`: By default, counter tables are not copied when missing, unless explicitly set. - -In your `cdm.properties` file, use the following properties to enable (`true`) or disable (`false`) autocorrect functions: - -[source,properties] ----- -spark.cdm.autocorrect.missing false|true -spark.cdm.autocorrect.mismatch false|true -spark.cdm.autocorrect.missing.counter false|true ----- - -The {cass-migrator-short} validation job never deletes records from either the origin or target. -Data validation only inserts or updates data on the target. - -For an initial data validation, consider disabling AutoCorrect so that you can generate a list of data discrepancies, investigate those discrepancies, and then decide whether you want to rerun the validation with AutoCorrect enabled. - -[#advanced] -== Additional {cass-migrator-short} options - -You can modify your properties file or append additional `--conf` arguments to your `spark-submit` commands to customize your {cass-migrator-short} jobs. -For example, you can do the following: - -* Check for large field guardrail violations before migrating. -* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges. -* Use the `track-run` feature to monitor progress and rerun a failed migration or validation job from point of failure. - -For all options, see the {cass-migrator-repo}[{cass-migrator-short} repository]. -Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. - -== Troubleshoot {cass-migrator-short} - -.Java NoSuchMethodError -[%collapsible] -==== -If you installed Spark as a JAR file, and your Spark and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following: - -[source,console] ----- -Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()' ----- - -Make sure that your Spark binary is compatible with your {cass-migrator-short} version. -If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier Spark binary. -==== - -.Rerun a failed or partially completed job -[%collapsible] -==== -You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point. - -For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. -==== -// end::body[] \ No newline at end of file +include::ROOT:partial$cassandra-data-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/cdm-overview.adoc b/modules/ROOT/pages/cdm-overview.adoc index 79644ff4..de50f252 100644 --- a/modules/ROOT/pages/cdm-overview.adoc +++ b/modules/ROOT/pages/cdm-overview.adoc @@ -1,4 +1,4 @@ = {cass-migrator} ({cass-migrator-short}) overview :description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. -include::ROOT:cassandra-data-migrator.adoc[tags=body] \ No newline at end of file +include::ROOT:partial$cassandra-data-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator-overview.adoc b/modules/ROOT/pages/dsbulk-migrator-overview.adoc index 84769d92..5ec1a6c5 100644 --- a/modules/ROOT/pages/dsbulk-migrator-overview.adoc +++ b/modules/ROOT/pages/dsbulk-migrator-overview.adoc @@ -1,4 +1,4 @@ = {dsbulk-migrator} overview :description: {dsbulk-migrator} extends {dsbulk-loader} with migration commands. -include::ROOT:dsbulk-migrator.adoc[tags=body] \ No newline at end of file +include::ROOT:partial$dsbulk-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator.adoc b/modules/ROOT/pages/dsbulk-migrator.adoc index 29a9c736..79e1822e 100644 --- a/modules/ROOT/pages/dsbulk-migrator.adoc +++ b/modules/ROOT/pages/dsbulk-migrator.adoc @@ -4,647 +4,4 @@ //TODO: Reorganize this page and consider breaking it up into smaller pages. -// tag::body[] -{dsbulk-migrator} is an extension of {dsbulk-loader}. -It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. -You can also consider this tool for migrations where you can shard data from large tables into more manageable quantities. - -{dsbulk-migrator} extends {dsbulk-loader} with the following commands: - -* `migrate-live`: Start a live data migration using the embedded version of {dsbulk-loader} or your own {dsbulk-loader} installation. -A live migration means that the data migration starts immediately and is performed by the migrator tool through the specified {dsbulk-loader} installation. - -* `generate-script`: Generate a migration script that you can execute to perform a data migration with a your own {dsbulk-loader} installation. -This command _doesn't_ trigger the migration; it only generates the migration script that you must then execute. - -* `generate-ddl`: Read the schema from origin, and then generate CQL files to recreate it in your target {astra-db} database. - -[[prereqs-dsbulk-migrator]] -== {dsbulk-migrator} prerequisites - -* Java 11 - -* https://maven.apache.org/download.cgi[Maven] 3.9.x - -* Optional: If you don't want to use the embedded {dsbulk-loader} that is bundled with {dsbulk-migrator}, xref:dsbulk:installing:install.adoc[install {dsbulk-loader}] before installing {dsbulk-migrator}. - -== Build {dsbulk-migrator} - -. Clone the {dsbulk-migrator-repo}[{dsbulk-migrator} repository]: -+ -[source,bash] ----- -cd ~/github -git clone git@github.com:datastax/dsbulk-migrator.git -cd dsbulk-migrator ----- - -. Use Maven to build {dsbulk-migrator}: -+ -[source,bash] ----- -mvn clean package ----- - -The build produces two distributable fat jars: - -* `dsbulk-migrator-**VERSION**-embedded-driver.jar` contains an embedded Java driver. -Suitable for script generation or live migrations using an external {dsbulk-loader}. -+ -This jar isn't suitable for live migrations that use the embedded {dsbulk-loader} because no {dsbulk-loader} classes are present. - -* `dsbulk-migrator-**VERSION**-embedded-dsbulk.jar` contains an embedded {dsbulk-loader} and an embedded Java driver. -Suitable for all operations. -Much larger than the other JAR due to the presence of {dsbulk-loader} classes. - -== Test {dsbulk-migrator} - -The {dsbulk-migrator} project contains some integration tests that require https://github.com/datastax/simulacron[Simulacron]. - -. Clone and build Simulacron, as explained in the https://github.com/datastax/simulacron[Simulacron GitHub repository]. -Note the prerequisites for Simulacron, particularly for macOS. - -. Run the tests: - -[source,bash] ----- -mvn clean verify ----- - -== Run {dsbulk-migrator} - -Launch {dsbulk-migrator} with the command and options you want to run: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator.jar { migrate-live | generate-script | generate-ddl } [OPTIONS] ----- - -The role and availability of the options depends on the command you run: - -* During a live migration, the options configure {dsbulk-migrator} and establish connections to -the clusters. - -* When generating a migration script, most options become default values in the generated scripts. -However, even when generating scripts, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the tables to migrate. - -* When generating a DDL file, import options and {dsbulk-loader}-related options are ignored. -However, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the keyspaces and tables for the DDL statements. - -For more information about the commands and their options, see the following references: - -* <> -* <> -* <> - -For help and examples, see <> and <>. - -[[dsbulk-live]] -== Live migration command-line options - -The following options are available for the `migrate-live` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - -[cols="2,8,14"] -|=== - -| `-c` -| `--dsbulk-cmd=CMD` -| The external {dsbulk-loader} command to use. -Ignored if the embedded {dsbulk-loader} is being used. -The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. -Tables will be exported and imported in subdirectories of the data directory specified here. -There will be one subdirectory per keyspace in the data directory, then one subdirectory per table in each keyspace directory. - -| `-e` -| `--dsbulk-use-embedded` -| Use the embedded {dsbulk-loader} version instead of an external one. -The default is to use an external {dsbulk-loader} command. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. -The default is `LOCAL_QUORUM`. - -| -| `--export-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when exporting. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. -The default is `-1` (export the entire table). - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. -This is an advanced setting; you should rarely need to modify the default value. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| -| `--import-bundle=PATH` -| The path to a {scb} to connect to a target {astra-db} cluster. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. -The default is `LOCAL_QUORUM`. - -| -| `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. -The default is `1970-01-01T00:00:00Z`. - -| -| `--import-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when importing. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node on the target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. -Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. - -| -| `--import-password` -| The password to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--import-username=STRING` -| The username to use to authenticate against the target cluster. Options `--import-username` and `--import-password` must be provided together, or not at all. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-l` -| `--dsbulk-log-dir=PATH` -| The directory where the {dsbulk-loader} should store its logs. -The default is a `logs` subdirectory in the current working directory. -This subdirectory will be created if it does not exist. -Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. - -| -| `--max-concurrent-ops=NUM` -| The maximum number of concurrent operations (exports and imports) to carry. -The default is `1`. -Set this to higher values to allow exports and imports to occur concurrently. -For example, with a value of `2`, each table will be imported as soon as it is exported, while the next table is being exported. - -| -| `--skip-truncate-confirmation` -| Skip truncate confirmation before actually truncating tables. -Only applicable when migrating counter tables, ignored otherwise. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. -The default is `all`. - -| -| `--truncate-before-export` -| Truncate tables before the export instead of after. -The default is to truncate after the export. -Only applicable when migrating counter tables, ignored otherwise. - -| `-w` -| `--dsbulk-working-dir=PATH` -| The directory where `dsbulk` should be executed. -Ignored if the embedded {dsbulk-loader} is being used. -If unspecified, it defaults to the current working directory. - -|=== - -[[dsbulk-script]] -== Script generation command-line options - -The following options are available for the `generate-script` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - - -[cols="2,8,14"] -|=== - -| `-c` -| `--dsbulk-cmd=CMD` -| The {dsbulk-loader} command to use. -The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. -The default is `LOCAL_QUORUM`. - -| -| `--export-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when exporting. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. -The default is `-1` (export the entire table). - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. -This is an advanced setting. -You should rarely need to modify the default value. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| -| `--import-bundle=PATH` -| The path to a Secure Connect Bundle to connect to a target {astra-db} cluster. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. -The default is `LOCAL_QUORUM`. - -| -| `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. -The default is `1970-01-01T00:00:00Z`. - -| -| `--import-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when importing. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node on the target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. -Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. - -| -| `--import-password` -| The password to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--import-username=STRING` -| The username to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-l` -| `--dsbulk-log-dir=PATH` -| The directory where {dsbulk-loader} should store its logs. -The default is a `logs` subdirectory in the current working directory. -This subdirectory will be created if it does not exist. -Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. The default is `all`. - -|=== - - -[[dsbulk-ddl]] -== DDL generation command-line options - -The following options are available for the `generate-ddl` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - -[cols="2,8,14"] -|=== - -| `-a` -| `--optimize-for-astra` -| Produce CQL scripts optimized for {company} {astra-db}. -{astra-db} does not allow some options in DDL statements. -Using this {dsbulk-migrator} command option, forbidden {astra-db} options will be omitted from the generated CQL files. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. -The default is `all`. - -|=== - -[[dsbulk-examples]] -== {dsbulk-migrator} examples - -These examples show sample `username` and `password` values that are for demonstration purposes only. -Don't use these values in your environment. - -=== Generate a migration script - -Generate a migration script to migrate from an existing origin cluster to a target {astra-db} cluster: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password=s3cr3t \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password=s3cr3t ----- - -=== Live migration with an external {dsbulk-loader} installation - -Perform a live migration from an existing origin cluster to a target {astra-db} cluster using an external {dsbulk-loader} installation: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password # password will be prompted \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password # password will be prompted ----- - -Passwords are prompted interactively. - -=== Live migration with the embedded {dsbulk-loader} - -Perform a live migration from an existing origin cluster to a target {astra-db} cluster using the embedded {dsbulk-loader} installation: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-dsbulk.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-use-embedded \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password # password will be prompted \ - --export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ - --export-dsbulk-option "--executor.maxPerSecond=1000" \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password # password will be prompted \ - --import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ - --import-dsbulk-option "--executor.maxPerSecond=1000" ----- - -Passwords are prompted interactively. - -The preceding example passes additional {dsbulk-loader} options. - -The preceding example requires the `dsbulk-migrator--embedded-dsbulk.jar` fat jar. -Otherwise, an error is raised because no embedded {dsbulk-loader} can be found. - -=== Generate DDL files to recreate the origin schema on the target cluster - -Generate DDL files to recreate the origin schema on a target {astra-db} cluster: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar generate-ddl \ - --data-dir=/path/to/data/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password=s3cr3t \ - --optimize-for-astra ----- - -[[getting-help-with-dsbulk-migrator]] -== Get help with {dsbulk-migrator} - -Use the following command to display the available {dsbulk-migrator} commands: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar --help ----- - -For individual command help and each one's options: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help ----- - -== See also - -* xref:dsbulk:overview:dsbulk-about.adoc[{dsbulk-loader}] -* xref:dsbulk:reference:dsbulk-cmd.adoc#escaping-and-quoting-command-line-arguments[Escaping and quoting {dsbulk-loader} command line arguments] -// end::body[] \ No newline at end of file +include::ROOT:partial$dsbulk-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/introduction.adoc b/modules/ROOT/pages/introduction.adoc index edbee1ea..2e405053 100644 --- a/modules/ROOT/pages/introduction.adoc +++ b/modules/ROOT/pages/introduction.adoc @@ -49,7 +49,7 @@ The _target_ is your new {cass-short}-based environment where you want to migrat Before you begin a migration, your client applications perform read/write operations with your existing CQL-compatible database, such as {cass}, {dse-short}, {hcd-short}, or {astra-db}. -image:pre-migration0ra9.png["Pre-migration environment."] +image:pre-migration0ra.png["Pre-migration environment."] While your application is stable with the current data model and database platform, you might need to make some adjustments before enabling {product-proxy}. @@ -77,7 +77,7 @@ Writes are sent to both the origin and target databases, while reads are execute For more information and instructions, see xref:ROOT:phase1.adoc[]. -image:migration-phase1ra9.png["Migration Phase 1."] +image:migration-phase1ra.png["Migration Phase 1."] === Phase 2: Migrate data @@ -87,7 +87,7 @@ Then, you thoroughly validate the migrated data, resolving missing and mismatche For more information and instructions, see xref:ROOT:migrate-and-validate-data.adoc[]. -image:migration-phase2ra9a.png["Migration Phase 2."] +image:migration-phase2ra.png["Migration Phase 2."] === Phase 3: Enable asynchronous dual reads @@ -123,7 +123,7 @@ However, be aware that the origin database is no longer synchronized with the ta For more information, see xref:ROOT:connect-clients-to-target.adoc[]. -image:migration-phase5ra9.png["Migration Phase 5."] +image:migration-phase5ra.png["Migration Phase 5."] [#lab] == {product} interactive lab diff --git a/modules/ROOT/partials/cassandra-data-migrator-body.adoc b/modules/ROOT/partials/cassandra-data-migrator-body.adoc new file mode 100644 index 00000000..f1ba3f01 --- /dev/null +++ b/modules/ROOT/partials/cassandra-data-migrator-body.adoc @@ -0,0 +1,344 @@ +{description} +It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following: + +* Logging and run tracking +* Automatic reconciliation +* Performance tuning +* Record filtering +* Column renaming +* Support for advanced data types, including sets, lists, maps, and UDTs +* Support for SSL, including custom cipher algorithms +* Use `writetime` timestamps to maintain chronological write history +* Use Time To Live (TTL) values to maintain data lifecycles + +For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. + +== {cass-migrator} requirements + +To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas. + +== {cass-migrator-short} with {product-proxy} + +You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. + +When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. + +Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. + +For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`. + +== Install {cass-migrator} + +{company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes. + +[tabs] +====== +Install as a container:: ++ +-- +Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. + +The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`. +-- + +Install as a JAR file:: ++ +-- +. Install Java 11 or later, which includes Spark binaries. + +. Install https://spark.apache.org/downloads.html[Apache Spark(TM)] version 3.5.x with Scala 2.13 and Hadoop 3.3 and later. ++ +[tabs] +==== +Single VM:: ++ +For one-off migrations, you can install the Spark binary on a single VM where you will run the {cass-migrator-short} job. ++ +. Get the Spark tarball from the Apache Spark archive. ++ +[source,bash,subs="+quotes"] +---- +wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz +---- ++ +Replace `**PATCH**` with your Spark patch version. ++ +. Change to the directory where you want install Spark, and then extract the tarball: ++ +[source,bash,subs="+quotes"] +---- +tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz +---- ++ +Replace `**PATCH**` with your Spark patch version. + +Spark cluster:: ++ +For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a Spark cluster or Spark Serverless platform. ++ +If you deploy CDM on a Spark cluster, you must modify your `spark-submit` commands as follows: ++ +* Replace `--master "local[*]"` with the host and port for your Spark cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`. +* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`. +==== + +. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}. + +. Add the `cassandra-data-migrator` dependency to `pom.xml`: ++ +[source,xml,subs="+quotes"] +---- + + datastax.cdm + cassandra-data-migrator + **VERSION** + +---- ++ +Replace `**VERSION**` with your {cass-migrator-short} version. + +. Run `mvn install`. + +If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. +-- +====== + +== Configure {cass-migrator-short} + +. Create a `cdm.properties` file. ++ +If you use a different name, make sure you specify the correct filename in your `spark-submit` commands. + +. Configure the properties for your environment. ++ +In the {cass-migrator-short} repository, you can find a {cass-migrator-repo}/blob/main/src/resources/cdm.properties[sample properties file with default values], as well as a {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. ++ +{cass-migrator-short} jobs process all uncommented parameters. +Any parameters that are commented out are ignored or use default values. ++ +If you want to reuse a properties file created for a previous {cass-migrator-short} version, make sure it is compatible with the version you are currently using. +Check the {cass-migrator-repo}/releases[{cass-migrator-short} release notes] for possible breaking changes in interim releases. +For example, the 4.x series of {cass-migrator-short} isn't backwards compatible with earlier properties files. + +. Store your properties file where it can be accessed while running {cass-migrator-short} jobs using `spark-submit`. + +[#migrate] +== Run a {cass-migrator-short} data migration job + +A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster. + +To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table. + +The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. +The migration job is specified in the `--class` argument. + +[tabs] +====== +Local installation:: ++ +-- +[source,bash,subs="+quotes,+attributes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ +--class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- + +Spark cluster:: ++ +-- +[source,bash,subs="+quotes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "spark://**MASTER_HOST**:**PORT**" \ +--class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--master`: Provide the URL of your Spark cluster. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- +====== + +This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console. + +For additional modifications to this command, see <>. + +[#cdm-validation-steps] +== Run a {cass-migrator-short} data validation job + +After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records. + +Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation. + +. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. +The data validation job is specified in the `--class` argument. ++ +[tabs] +====== +Local installation:: ++ +-- +[source,bash,subs="+quotes,+attributes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ +--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- + +Spark cluster:: ++ +-- +[source,bash,subs="+quotes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "spark://**MASTER_HOST**:**PORT**" \ +--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--master`: Provide the URL of your Spark cluster. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- +====== + +. Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries. ++ +The {cass-migrator-short} validation job records differences as `ERROR` entries in the log file, listed by primary key values. +For example: ++ +[source,plaintext] +---- +23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999) +23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3] +23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2] +23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2] +---- ++ +When validating large datasets or multiple tables, you might want to extract the complete list of missing or mismatched records. +There are many ways to do this. +For example, you can grep for all `ERROR` entries in your {cass-migrator-short} log files or use the `log4j2` example provided in the {cass-migrator-repo}?tab=readme-ov-file#steps-for-data-validation[{cass-migrator-short} repository]. + +=== Run a validation job in AutoCorrect mode + +Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect** mode, which offers the following functions: + +* `autocorrect.missing`: Add any missing records in the target with the value from the origin. + +* `autocorrect.mismatch`: Reconcile any mismatched records between the origin and target by replacing the target value with the origin value. ++ +[IMPORTANT] +==== +Timestamps have an effect on this function. + +If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster. + +This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster. +==== + +* `autocorrect.missing.counter`: By default, counter tables are not copied when missing, unless explicitly set. + +In your `cdm.properties` file, use the following properties to enable (`true`) or disable (`false`) autocorrect functions: + +[source,properties] +---- +spark.cdm.autocorrect.missing false|true +spark.cdm.autocorrect.mismatch false|true +spark.cdm.autocorrect.missing.counter false|true +---- + +The {cass-migrator-short} validation job never deletes records from either the origin or target. +Data validation only inserts or updates data on the target. + +For an initial data validation, consider disabling AutoCorrect so that you can generate a list of data discrepancies, investigate those discrepancies, and then decide whether you want to rerun the validation with AutoCorrect enabled. + +[#advanced] +== Additional {cass-migrator-short} options + +You can modify your properties file or append additional `--conf` arguments to your `spark-submit` commands to customize your {cass-migrator-short} jobs. +For example, you can do the following: + +* Check for large field guardrail violations before migrating. +* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges. +* Use the `track-run` feature to monitor progress and rerun a failed migration or validation job from point of failure. + +For all options, see the {cass-migrator-repo}[{cass-migrator-short} repository]. +Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. + +== Troubleshoot {cass-migrator-short} + +.Java NoSuchMethodError +[%collapsible] +==== +If you installed Spark as a JAR file, and your Spark and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following: + +[source,console] +---- +Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()' +---- + +Make sure that your Spark binary is compatible with your {cass-migrator-short} version. +If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier Spark binary. +==== + +.Rerun a failed or partially completed job +[%collapsible] +==== +You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point. + +For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. +==== \ No newline at end of file diff --git a/modules/ROOT/partials/dsbulk-migrator-body.adoc b/modules/ROOT/partials/dsbulk-migrator-body.adoc new file mode 100644 index 00000000..4a6cf0f9 --- /dev/null +++ b/modules/ROOT/partials/dsbulk-migrator-body.adoc @@ -0,0 +1,642 @@ +{dsbulk-migrator} is an extension of {dsbulk-loader}. +It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. +You can also consider this tool for migrations where you can shard data from large tables into more manageable quantities. + +{dsbulk-migrator} extends {dsbulk-loader} with the following commands: + +* `migrate-live`: Start a live data migration using the embedded version of {dsbulk-loader} or your own {dsbulk-loader} installation. +A live migration means that the data migration starts immediately and is performed by the migrator tool through the specified {dsbulk-loader} installation. + +* `generate-script`: Generate a migration script that you can execute to perform a data migration with a your own {dsbulk-loader} installation. +This command _doesn't_ trigger the migration; it only generates the migration script that you must then execute. + +* `generate-ddl`: Read the schema from origin, and then generate CQL files to recreate it in your target {astra-db} database. + +[[prereqs-dsbulk-migrator]] +== {dsbulk-migrator} prerequisites + +* Java 11 + +* https://maven.apache.org/download.cgi[Maven] 3.9.x + +* Optional: If you don't want to use the embedded {dsbulk-loader} that is bundled with {dsbulk-migrator}, xref:dsbulk:installing:install.adoc[install {dsbulk-loader}] before installing {dsbulk-migrator}. + +== Build {dsbulk-migrator} + +. Clone the {dsbulk-migrator-repo}[{dsbulk-migrator} repository]: ++ +[source,bash] +---- +cd ~/github +git clone git@github.com:datastax/dsbulk-migrator.git +cd dsbulk-migrator +---- + +. Use Maven to build {dsbulk-migrator}: ++ +[source,bash] +---- +mvn clean package +---- + +The build produces two distributable fat jars: + +* `dsbulk-migrator-**VERSION**-embedded-driver.jar` contains an embedded Java driver. +Suitable for script generation or live migrations using an external {dsbulk-loader}. ++ +This jar isn't suitable for live migrations that use the embedded {dsbulk-loader} because no {dsbulk-loader} classes are present. + +* `dsbulk-migrator-**VERSION**-embedded-dsbulk.jar` contains an embedded {dsbulk-loader} and an embedded Java driver. +Suitable for all operations. +Much larger than the other JAR due to the presence of {dsbulk-loader} classes. + +== Test {dsbulk-migrator} + +The {dsbulk-migrator} project contains some integration tests that require https://github.com/datastax/simulacron[Simulacron]. + +. Clone and build Simulacron, as explained in the https://github.com/datastax/simulacron[Simulacron GitHub repository]. +Note the prerequisites for Simulacron, particularly for macOS. + +. Run the tests: + +[source,bash] +---- +mvn clean verify +---- + +== Run {dsbulk-migrator} + +Launch {dsbulk-migrator} with the command and options you want to run: + +[source,bash] +---- +java -jar /path/to/dsbulk-migrator.jar { migrate-live | generate-script | generate-ddl } [OPTIONS] +---- + +The role and availability of the options depends on the command you run: + +* During a live migration, the options configure {dsbulk-migrator} and establish connections to +the clusters. + +* When generating a migration script, most options become default values in the generated scripts. +However, even when generating scripts, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the tables to migrate. + +* When generating a DDL file, import options and {dsbulk-loader}-related options are ignored. +However, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the keyspaces and tables for the DDL statements. + +For more information about the commands and their options, see the following references: + +* <> +* <> +* <> + +For help and examples, see <> and <>. + +[[dsbulk-live]] +== Live migration command-line options + +The following options are available for the `migrate-live` command. +Most options have sensible default values and do not need to be specified, unless you want to override the default value. + +[cols="2,8,14"] +|=== + +| `-c` +| `--dsbulk-cmd=CMD` +| The external {dsbulk-loader} command to use. +Ignored if the embedded {dsbulk-loader} is being used. +The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. + +| `-d` +| `--data-dir=PATH` +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. +Tables will be exported and imported in subdirectories of the data directory specified here. +There will be one subdirectory per keyspace in the data directory, then one subdirectory per table in each keyspace directory. + +| `-e` +| `--dsbulk-use-embedded` +| Use the embedded {dsbulk-loader} version instead of an external one. +The default is to use an external {dsbulk-loader} command. + +| +| `--export-bundle=PATH` +| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-consistency=CONSISTENCY` +| The consistency level to use when exporting data. +The default is `LOCAL_QUORUM`. + +| +| `--export-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when exporting. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--export-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node from the origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to write to. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-records=NUM` +| The maximum number of records to export for each table. +Must be a positive number or `-1`. +The default is `-1` (export the entire table). + +| +| `--export-password` +| The password to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--export-splits=NUM\|NC` +| The maximum number of token range queries to generate. +Use the `NC` syntax to specify a multiple of the number of available cores. +For example, `8C` = 8 times the number of available cores. +The default is `8C`. +This is an advanced setting; you should rarely need to modify the default value. + +| +| `--export-username=STRING` +| The username to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. + +| `-h` +| `--help` +| Displays this help text. + +| +| `--import-bundle=PATH` +| The path to a {scb} to connect to a target {astra-db} cluster. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-consistency=CONSISTENCY` +| The consistency level to use when importing data. +The default is `LOCAL_QUORUM`. + +| +| `--import-default-timestamp=` +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 syntax. +The default is `1970-01-01T00:00:00Z`. + +| +| `--import-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when importing. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--import-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node on the target cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to read from. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-errors=NUM` +| The maximum number of failed records to tolerate when importing data. +The default is `1000`. +Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. + +| +| `--import-password` +| The password to use to authenticate against the target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--import-username=STRING` +| The username to use to authenticate against the target cluster. Options `--import-username` and `--import-password` must be provided together, or not at all. + +| `-k` +| `--keyspaces=REGEX` +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. +Case-sensitive keyspace names must be entered in their exact case. + +| `-l` +| `--dsbulk-log-dir=PATH` +| The directory where the {dsbulk-loader} should store its logs. +The default is a `logs` subdirectory in the current working directory. +This subdirectory will be created if it does not exist. +Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. + +| +| `--max-concurrent-ops=NUM` +| The maximum number of concurrent operations (exports and imports) to carry. +The default is `1`. +Set this to higher values to allow exports and imports to occur concurrently. +For example, with a value of `2`, each table will be imported as soon as it is exported, while the next table is being exported. + +| +| `--skip-truncate-confirmation` +| Skip truncate confirmation before actually truncating tables. +Only applicable when migrating counter tables, ignored otherwise. + +| `-t` +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +Case-sensitive table names must be entered in their exact case. + +| +| `--table-types=regular\|counter\|all` +| The table types to migrate. +The default is `all`. + +| +| `--truncate-before-export` +| Truncate tables before the export instead of after. +The default is to truncate after the export. +Only applicable when migrating counter tables, ignored otherwise. + +| `-w` +| `--dsbulk-working-dir=PATH` +| The directory where `dsbulk` should be executed. +Ignored if the embedded {dsbulk-loader} is being used. +If unspecified, it defaults to the current working directory. + +|=== + +[[dsbulk-script]] +== Script generation command-line options + +The following options are available for the `generate-script` command. +Most options have sensible default values and do not need to be specified, unless you want to override the default value. + + +[cols="2,8,14"] +|=== + +| `-c` +| `--dsbulk-cmd=CMD` +| The {dsbulk-loader} command to use. +The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. + +| `-d` +| `--data-dir=PATH` +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. + +| +| `--export-bundle=PATH` +| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-consistency=CONSISTENCY` +| The consistency level to use when exporting data. +The default is `LOCAL_QUORUM`. + +| +| `--export-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when exporting. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--export-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node from the origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to write to. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--export-max-records=NUM` +| The maximum number of records to export for each table. +Must be a positive number or `-1`. +The default is `-1` (export the entire table). + +| +| `--export-password` +| The password to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--export-splits=NUM\|NC` +| The maximum number of token range queries to generate. +Use the `NC` syntax to specify a multiple of the number of available cores. +For example, `8C` = 8 times the number of available cores. +The default is `8C`. +This is an advanced setting. +You should rarely need to modify the default value. + +| +| `--export-username=STRING` +| The username to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. + +| `-h` +| `--help` +| Displays this help text. + +| +| `--import-bundle=PATH` +| The path to a Secure Connect Bundle to connect to a target {astra-db} cluster. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-consistency=CONSISTENCY` +| The consistency level to use when importing data. +The default is `LOCAL_QUORUM`. + +| +| `--import-default-timestamp=` +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 syntax. +The default is `1970-01-01T00:00:00Z`. + +| +| `--import-dsbulk-option=OPT=VALUE` +| An extra {dsbulk-loader} option to use when importing. +Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. +{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. +Short options are not supported. + +| +| `--import-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node on the target cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--import-host` and `--import-bundle` are mutually exclusive. + +| +| `--import-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to read from. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-concurrent-queries=NUM\|AUTO` +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. +The default is `AUTO`. + +| +| `--import-max-errors=NUM` +| The maximum number of failed records to tolerate when importing data. +The default is `1000`. +Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. + +| +| `--import-password` +| The password to use to authenticate against the target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--import-username=STRING` +| The username to use to authenticate against the target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. + +| `-k` +| `--keyspaces=REGEX` +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. +Case-sensitive keyspace names must be entered in their exact case. + +| `-l` +| `--dsbulk-log-dir=PATH` +| The directory where {dsbulk-loader} should store its logs. +The default is a `logs` subdirectory in the current working directory. +This subdirectory will be created if it does not exist. +Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. + +| `-t` +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +Case-sensitive table names must be entered in their exact case. + +| +| `--table-types=regular\|counter\|all` +| The table types to migrate. The default is `all`. + +|=== + + +[[dsbulk-ddl]] +== DDL generation command-line options + +The following options are available for the `generate-ddl` command. +Most options have sensible default values and do not need to be specified, unless you want to override the default value. + +[cols="2,8,14"] +|=== + +| `-a` +| `--optimize-for-astra` +| Produce CQL scripts optimized for {company} {astra-db}. +{astra-db} does not allow some options in DDL statements. +Using this {dsbulk-migrator} command option, forbidden {astra-db} options will be omitted from the generated CQL files. + +| `-d` +| `--data-dir=PATH` +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. + +| +| `--export-bundle=PATH` +| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-host=HOST[:PORT]` +| The host name or IP and, optionally, the port of a node from the origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--export-host` and `--export-bundle` are mutually exclusive. + +| +| `--export-password` +| The password to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. +Omit the parameter value to be prompted for the password interactively. + +| +| `--export-username=STRING` +| The username to use to authenticate against the origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. + +| `-h` +| `--help` +| Displays this help text. + +| `-k` +| `--keyspaces=REGEX` +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. +Case-sensitive keyspace names must be entered in their exact case. + +| `-t` +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +Case-sensitive table names must be entered in their exact case. + +| +| `--table-types=regular\|counter\|all` +| The table types to migrate. +The default is `all`. + +|=== + +[[dsbulk-examples]] +== {dsbulk-migrator} examples + +These examples show sample `username` and `password` values that are for demonstration purposes only. +Don't use these values in your environment. + +=== Generate a migration script + +Generate a migration script to migrate from an existing origin cluster to a target {astra-db} cluster: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password=s3cr3t \ + --import-bundle=/path/to/bundle \ + --import-username=user1 \ + --import-password=s3cr3t +---- + +=== Live migration with an external {dsbulk-loader} installation + +Perform a live migration from an existing origin cluster to a target {astra-db} cluster using an external {dsbulk-loader} installation: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password # password will be prompted \ + --import-bundle=/path/to/bundle \ + --import-username=user1 \ + --import-password # password will be prompted +---- + +Passwords are prompted interactively. + +=== Live migration with the embedded {dsbulk-loader} + +Perform a live migration from an existing origin cluster to a target {astra-db} cluster using the embedded {dsbulk-loader} installation: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-dsbulk.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-use-embedded \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password # password will be prompted \ + --export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ + --export-dsbulk-option "--executor.maxPerSecond=1000" \ + --import-bundle=/path/to/bundle \ + --import-username=user1 \ + --import-password # password will be prompted \ + --import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ + --import-dsbulk-option "--executor.maxPerSecond=1000" +---- + +Passwords are prompted interactively. + +The preceding example passes additional {dsbulk-loader} options. + +The preceding example requires the `dsbulk-migrator--embedded-dsbulk.jar` fat jar. +Otherwise, an error is raised because no embedded {dsbulk-loader} can be found. + +=== Generate DDL files to recreate the origin schema on the target cluster + +Generate DDL files to recreate the origin schema on a target {astra-db} cluster: + +[source,bash] +---- + java -jar target/dsbulk-migrator--embedded-driver.jar generate-ddl \ + --data-dir=/path/to/data/dir \ + --export-host=my-origin-cluster.com \ + --export-username=user1 \ + --export-password=s3cr3t \ + --optimize-for-astra +---- + +[[getting-help-with-dsbulk-migrator]] +== Get help with {dsbulk-migrator} + +Use the following command to display the available {dsbulk-migrator} commands: + +[source,bash] +---- +java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar --help +---- + +For individual command help and each one's options: + +[source,bash] +---- +java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help +---- + +== See also + +* xref:dsbulk:overview:dsbulk-about.adoc[{dsbulk-loader}] +* xref:dsbulk:reference:dsbulk-cmd.adoc#escaping-and-quoting-command-line-arguments[Escaping and quoting {dsbulk-loader} command line arguments] \ No newline at end of file diff --git a/modules/sideloader/pages/cleanup-sideloader.adoc b/modules/sideloader/pages/cleanup-sideloader.adoc index 42dae738..4ed5ed40 100644 --- a/modules/sideloader/pages/cleanup-sideloader.adoc +++ b/modules/sideloader/pages/cleanup-sideloader.adoc @@ -47,7 +47,7 @@ If the request fails due to `ImportInProgress`, you must either wait for the imp . Wait a few minutes, and then check the migration status: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] + While the cleanup is running, the migration status is `CleaningUpFiles`. When complete, the migration status is `Closed`. diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index 4deec95b..326bdcb5 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -272,7 +272,7 @@ Use the {devops-api} to initialize the migration and get your migration director .What happens during initialization? [%collapsible] ==== -include::sideloader:partial$sideloader-partials.adoc[tags=initialize] +include::sideloader:partial$initialize.adoc[] ==== The initialization process can take several minutes to complete, especially if the migration bucket doesn't already exist. @@ -322,7 +322,7 @@ Replace *`MIGRATION_ID`* with the `migrationID` returned by the `initialize` end . Check the migration status: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . Check the `status` field in the response: + @@ -520,7 +520,7 @@ aws s3 sync --only-show-errors --exclude '{asterisk}' --include '{asterisk}/snap + Replace the following: + -include::sideloader:partial$sideloader-partials.adoc[tags=command-placeholders-common] +include::sideloader:partial$command-placeholders-common.adoc[] + .Example: Upload a snapshot with AWS CLI @@ -604,7 +604,7 @@ gsutil -m rsync -r -d **CASSANDRA_DATA_DIR**/**KEYSPACE_NAME**/{asterisk}{asteri + Replace the following: + -include::sideloader:partial$sideloader-partials.adoc[tags=command-placeholders-common] +include::sideloader:partial$command-placeholders-common.adoc[] + .Example: Upload a snapshot with gcloud and gsutil @@ -743,7 +743,7 @@ If one step fails, then the entire import operation stops and the migration fail .What happens during data import? [%collapsible] ====== -include::sideloader:partial$sideloader-partials.adoc[tags=import] +include::sideloader:partial$import.adoc[] ====== [WARNING] @@ -752,7 +752,7 @@ include::sideloader:partial$sideloader-partials.adoc[tags=import] For commands to monitor upload progress and compare uploaded data against the original snapshots, see xref:sideloader:migrate-sideloader.adoc#upload-snapshots-to-migration-directory[Upload snapshots to the migration directory]. * If necessary, you can xref:sideloader:stop-restart-sideloader.adoc[pause or abort the migration] during the import process. -include::sideloader:partial$sideloader-partials.adoc[tags=no-return] +include::sideloader:partial$no-return.adoc[] ==== . Use the {devops-api} to launch the data import: @@ -769,7 +769,7 @@ Although this call returns immediately, the import process takes time. . Check the migration status periodically: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . Check the `status` field in the response: + @@ -784,7 +784,7 @@ Wait a few minutes before you check the status again. [#validate-the-migrated-data] == Validate the migrated data -include::sideloader:partial$sideloader-partials.adoc[tags=validate] +include::sideloader:partial$validate.adoc[] == See also diff --git a/modules/sideloader/pages/sideloader-overview.adoc b/modules/sideloader/pages/sideloader-overview.adoc index dc3cbce9..1b6cd07b 100644 --- a/modules/sideloader/pages/sideloader-overview.adoc +++ b/modules/sideloader/pages/sideloader-overview.adoc @@ -61,7 +61,7 @@ For specific requirements and more information, see xref:sideloader:migrate-side === Initialize a migration -include::sideloader:partial$sideloader-partials.adoc[tags=initialize] +include::sideloader:partial$initialize.adoc[] For instructions and more information, see xref:sideloader:migrate-sideloader.adoc#initialize-migration[Migrate data with {sstable-sideloader}: Initialize the migration]. @@ -105,17 +105,17 @@ In this case, consider creating your target database in a co-located datacenter, === Import data -include::sideloader:partial$sideloader-partials.adoc[tags=import] +include::sideloader:partial$import.adoc[] For instructions and more information, see xref:sideloader:migrate-sideloader.adoc#import-data[Migrate data with {sstable-sideloader}: Import data] === Validate imported data -include::sideloader:partial$sideloader-partials.adoc[tags=validate] +include::sideloader:partial$validate.adoc[] == Use {sstable-sideloader} with {product-proxy} -include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] +include::sideloader:partial$sideloader-zdm.adoc[] == Next steps diff --git a/modules/sideloader/pages/sideloader-zdm.adoc b/modules/sideloader/pages/sideloader-zdm.adoc index 8a9d340e..1111f833 100644 --- a/modules/sideloader/pages/sideloader-zdm.adoc +++ b/modules/sideloader/pages/sideloader-zdm.adoc @@ -22,4 +22,4 @@ For more information and instructions, see xref:sideloader:sideloader-overview.a You can use {sstable-sideloader} alone or with {product-proxy}. -include::sideloader:partial$sideloader-partials.adoc[tags=sideloader-zdm] \ No newline at end of file +include::sideloader:partial$sideloader-zdm.adoc[] \ No newline at end of file diff --git a/modules/sideloader/pages/stop-restart-sideloader.adoc b/modules/sideloader/pages/stop-restart-sideloader.adoc index a35faad3..e4625b00 100644 --- a/modules/sideloader/pages/stop-restart-sideloader.adoc +++ b/modules/sideloader/pages/stop-restart-sideloader.adoc @@ -49,12 +49,12 @@ curl -X POST \ | jq . ---- + -include::sideloader:partial$sideloader-partials.adoc[tags=no-return] +include::sideloader:partial$no-return.adoc[] For more information about what happens during each phase of a migration and the point of no return, see xref:sideloader:sideloader-overview.adoc[]. . Wait a few minutes, and then check the migration status to confirm that the migration stopped: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] == Retry a failed migration diff --git a/modules/sideloader/pages/troubleshoot-sideloader.adoc b/modules/sideloader/pages/troubleshoot-sideloader.adoc index 2e96a1ec..cb74602a 100644 --- a/modules/sideloader/pages/troubleshoot-sideloader.adoc +++ b/modules/sideloader/pages/troubleshoot-sideloader.adoc @@ -18,7 +18,7 @@ If your credentials expire, do the following: . Use the `MigrationStatus` endpoint to generate new credentials: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . Continue the migration with the fresh credentials. + @@ -48,7 +48,7 @@ For more information, see <>. . Check the migration status for an error message related to the failure: + -include::sideloader:partial$sideloader-partials.adoc[tags=check-status] +include::sideloader:partial$check-status.adoc[] . If possible, resolve the issue described in the error message. + diff --git a/modules/sideloader/partials/check-status.adoc b/modules/sideloader/partials/check-status.adoc new file mode 100644 index 00000000..74428f06 --- /dev/null +++ b/modules/sideloader/partials/check-status.adoc @@ -0,0 +1,11 @@ +[source,bash] +---- +curl -X GET \ + -H "Authorization: Bearer ${token}" \ + https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \ + | jq . +---- ++ +A successful response contains a `MigrationStatus` object. +It can take a few minutes for the {devops-api} to reflect status changes during a migration. +Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status. \ No newline at end of file diff --git a/modules/sideloader/partials/command-placeholders-common.adoc b/modules/sideloader/partials/command-placeholders-common.adoc new file mode 100644 index 00000000..69ccfdd2 --- /dev/null +++ b/modules/sideloader/partials/command-placeholders-common.adoc @@ -0,0 +1,7 @@ +* *`CASSANDRA_DATA_DIR`*: The absolute file system path to where {cass-short} data is stored on the node. +For example, `/var/lib/cassandra/data`. +* *`KEYSPACE_NAME`*: The name of the keyspace that contains the tables you want to migrate. +* *`SNAPSHOT_NAME`*: The name of the xref:sideloader:migrate-sideloader.adoc#create-snapshots[snapshot backup] that you created with `nodetool snapshot`. +* *`MIGRATION_DIR`*: The entire `uploadBucketDir` value that was generated when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration], including the trailing slash. +* *`NODE_NAME`*: The host name of the node that your snapshots are from. +It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket. \ No newline at end of file diff --git a/modules/sideloader/partials/import.adoc b/modules/sideloader/partials/import.adoc new file mode 100644 index 00000000..b164d764 --- /dev/null +++ b/modules/sideloader/partials/import.adoc @@ -0,0 +1,35 @@ +After uploading the snapshots to the migration directory, use the {devops-api} to start the data import process. + +During the import process, {sstable-sideloader} does the following: + +. Revokes access to the migration directory. ++ +You cannot read or write to the migration directory after starting the data import process. + +. Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets. + +. Runs validation checks on each subset. + +. Converts all SSTables of each subset. + +. Disables new compactions on the target database. ++ +[WARNING] +==== +This is the last point at which you can xref:sideloader:stop-restart-sideloader.adoc#abort-migration[abort the migration]. + +Once {sstable-sideloader} begins to import SSTable metadata (the next step), you cannot stop the migration. +==== + +. Imports metadata from each SSTable. ++ +If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. +Since compaction is disabled, there is no risk of permanent inconsistencies. +However, in the context of xref:ROOT:introduction.adoc[{product}], it's important that the {product-short} proxy continues to read from the origin cluster. + +. Re-enables compactions on the {astra-db} Serverless database. + +Each step must finish successfully. +If one step fails, the import operation stops and no data is imported into your target database. + +If all steps finish successfully, the migration is complete and you can access the imported data in your target database. \ No newline at end of file diff --git a/modules/sideloader/partials/initialize.adoc b/modules/sideloader/partials/initialize.adoc new file mode 100644 index 00000000..3e288f43 --- /dev/null +++ b/modules/sideloader/partials/initialize.adoc @@ -0,0 +1,24 @@ +After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the {astra} {devops-api} to initialize the migration. + +.{sstable-sideloader} moves data from the migration bucket to {astra-db}. +svg::sideloader:data-importer-workflow.svg[] + +When you initialize a migration, {sstable-sideloader} does the following: + +. Creates a secure migration bucket. ++ +The migration bucket is only created during the first initialization. +All subsequent migrations use different directories in the same migration bucket. ++ +{company} owns the migration bucket, and it is located within the {astra} perimeter. + +. Generates a migration ID that is unique to the new migration. + +. Creates a migration directory within the migration bucket that is unique to the new migration. ++ +The migration directory is also referred to as the `uploadBucketDir`. +In the next phase of the migration process, you will upload your snapshots to this migration directory. + +. Generates upload credentials that grant read/write access to the migration directory. ++ +The credentials are formatted according to the cloud provider where your target database is deployed. \ No newline at end of file diff --git a/modules/sideloader/partials/no-return.adoc b/modules/sideloader/partials/no-return.adoc new file mode 100644 index 00000000..8561543b --- /dev/null +++ b/modules/sideloader/partials/no-return.adoc @@ -0,0 +1,2 @@ +You can abort a migration up until the point at which {sstable-sideloader} starts importing SSTable metadata. +After this point, you must wait for the migration to finish, and then you can use `cqlsh` to drop the keyspace/table in your target database before repeating the entire migration procedure. \ No newline at end of file diff --git a/modules/sideloader/partials/sideloader-partials.adoc b/modules/sideloader/partials/sideloader-partials.adoc deleted file mode 100644 index f465a929..00000000 --- a/modules/sideloader/partials/sideloader-partials.adoc +++ /dev/null @@ -1,107 +0,0 @@ -// tag::check-status[] -[source,bash] ----- -curl -X GET \ - -H "Authorization: Bearer ${token}" \ - https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \ - | jq . ----- -+ -A successful response contains a `MigrationStatus` object. -It can take a few minutes for the {devops-api} to reflect status changes during a migration. -Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status. -// end::check-status[] - -// tag::command-placeholders-common[] -* *`CASSANDRA_DATA_DIR`*: The absolute file system path to where {cass-short} data is stored on the node. -For example, `/var/lib/cassandra/data`. -* *`KEYSPACE_NAME`*: The name of the keyspace that contains the tables you want to migrate. -* *`SNAPSHOT_NAME`*: The name of the xref:sideloader:migrate-sideloader.adoc#create-snapshots[snapshot backup] that you created with `nodetool snapshot`. -* *`MIGRATION_DIR`*: The entire `uploadBucketDir` value that was generated when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration], including the trailing slash. -* *`NODE_NAME`*: The host name of the node that your snapshots are from. -It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket. -// end::command-placeholders-common[] - -// tag::validate[] -After the migration is complete, you can query the migrated data using the xref:astra-db-serverless:cql:develop-with-cql.adoc#connect-to-the-cql-shell[cqlsh] or xref:astra-db-serverless:api-reference:row-methods/find-many.adoc[{data-api}]. - -You can xref:ROOT:cassandra-data-migrator.adoc#cdm-validation-steps[run {cass-migrator} ({cass-migrator-short}) in validation mode] for more thorough validation. -{cass-migrator-short} also offers an AutoCorrect mode to reconcile any differences that it detects. -// end::validate[] - -// tag::initialize[] -After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the {astra} {devops-api} to initialize the migration. - -.{sstable-sideloader} moves data from the migration bucket to {astra-db}. -svg::sideloader:data-importer-workflow.svg[] - -When you initialize a migration, {sstable-sideloader} does the following: - -. Creates a secure migration bucket. -+ -The migration bucket is only created during the first initialization. -All subsequent migrations use different directories in the same migration bucket. -+ -{company} owns the migration bucket, and it is located within the {astra} perimeter. - -. Generates a migration ID that is unique to the new migration. - -. Creates a migration directory within the migration bucket that is unique to the new migration. -+ -The migration directory is also referred to as the `uploadBucketDir`. -In the next phase of the migration process, you will upload your snapshots to this migration directory. - -. Generates upload credentials that grant read/write access to the migration directory. -+ -The credentials are formatted according to the cloud provider where your target database is deployed. -// end::initialize[] - -// tag::import[] -After uploading the snapshots to the migration directory, use the {devops-api} to start the data import process. - -During the import process, {sstable-sideloader} does the following: - -. Revokes access to the migration directory. -+ -You cannot read or write to the migration directory after starting the data import process. - -. Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets. - -. Runs validation checks on each subset. - -. Converts all SSTables of each subset. - -. Disables new compactions on the target database. -+ -[WARNING] -==== -This is the last point at which you can xref:sideloader:stop-restart-sideloader.adoc#abort-migration[abort the migration]. - -Once {sstable-sideloader} begins to import SSTable metadata (the next step), you cannot stop the migration. -==== - -. Imports metadata from each SSTable. -+ -If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. -Since compaction is disabled, there is no risk of permanent inconsistencies. -However, in the context of xref:ROOT:introduction.adoc[{product}], it's important that the {product-short} proxy continues to read from the origin cluster. - -. Re-enables compactions on the {astra-db} Serverless database. - -Each step must finish successfully. -If one step fails, the import operation stops and no data is imported into your target database. - -If all steps finish successfully, the migration is complete and you can access the imported data in your target database. -// end::import[] - -// tag::no-return[] -You can abort a migration up until the point at which {sstable-sideloader} starts importing SSTable metadata. -After this point, you must wait for the migration to finish, and then you can use `cqlsh` to drop the keyspace/table in your target database before repeating the entire migration procedure. -// end::no-return[] - -// tag::sideloader-zdm[] -If you need to migrate a live database, you can use {sstable-sideloader} instead of {dsbulk-migrator} or {cass-migrator} during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product}]. - -.Use {sstable-sideloader} with {product-proxy} -svg::sideloader:astra-migration-toolkit.svg[] -// end::sideloader-zdm[] \ No newline at end of file diff --git a/modules/sideloader/partials/sideloader-zdm.adoc b/modules/sideloader/partials/sideloader-zdm.adoc new file mode 100644 index 00000000..bf4fd583 --- /dev/null +++ b/modules/sideloader/partials/sideloader-zdm.adoc @@ -0,0 +1,4 @@ +If you need to migrate a live database, you can use {sstable-sideloader} instead of {dsbulk-migrator} or {cass-migrator} during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product}]. + +.Use {sstable-sideloader} with {product-proxy} +svg::sideloader:astra-migration-toolkit.svg[] \ No newline at end of file diff --git a/modules/sideloader/partials/validate.adoc b/modules/sideloader/partials/validate.adoc new file mode 100644 index 00000000..ac94e778 --- /dev/null +++ b/modules/sideloader/partials/validate.adoc @@ -0,0 +1,4 @@ +After the migration is complete, you can query the migrated data using the xref:astra-db-serverless:cql:develop-with-cql.adoc#connect-to-the-cql-shell[cqlsh] or xref:astra-db-serverless:api-reference:row-methods/find-many.adoc[{data-api}]. + +You can xref:ROOT:cassandra-data-migrator.adoc#cdm-validation-steps[run {cass-migrator} ({cass-migrator-short}) in validation mode] for more thorough validation. +{cass-migrator-short} also offers an AutoCorrect mode to reconcile any differences that it detects. \ No newline at end of file