Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,8 @@ asciidoc:
product-automation-repo: 'https://github.com/datastax/zdm-proxy-automation'
product-automation-shield: 'image:https://img.shields.io/github/v/release/datastax/zdm-proxy-automation?label=latest[alt="Latest zdm-proxy-automation release on GitHub",link="{product-automation-repo}/releases"]'
product-demo: 'ZDM Demo Client'
dsbulk-migrator: 'DSBulk Migrator'
dsbulk-migrator-repo: 'https://github.com/datastax/dsbulk-migrator'
dsbulk-loader: 'DSBulk Loader'
dsbulk-loader-repo: 'https://github.com/datastax/dsbulk'
cass-migrator: 'Cassandra Data Migrator'
cass-migrator-short: 'CDM'
cass-migrator-repo: 'https://github.com/datastax/cassandra-data-migrator'
Expand Down
4 changes: 2 additions & 2 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,5 @@
* xref:ROOT:cassandra-data-migrator.adoc[]
* {cass-migrator-repo}/releases[{cass-migrator-short} release notes]

.{dsbulk-migrator}
* xref:ROOT:dsbulk-migrator.adoc[]
.{dsbulk-loader}
* xref:dsbulk:overview:dsbulk-about.adoc[]
2 changes: 1 addition & 1 deletion modules/ROOT/pages/astra-migration-paths.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ If you have questions about migrating from a specific source to {astra-db}, cont
.Migration tool compatibility
[cols="2,1,1,1,1"]
|===
|Origin |{sstable-sideloader} |{cass-migrator} |{product-proxy} |{dsbulk-migrator}/{dsbulk-loader}
|Origin |{sstable-sideloader} |{cass-migrator} |{product-proxy} |{dsbulk-loader}

|Aiven for {cass-short}
|icon:check[role="text-success",alt="Supported"]
Expand Down
16 changes: 11 additions & 5 deletions modules/ROOT/pages/components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -152,15 +152,21 @@ You can use {cass-migrator-short} by itself, with {product-proxy}, or for data v

For more information, see xref:ROOT:cassandra-data-migrator.adoc[].

=== {dsbulk-migrator}
=== {dsbulk-loader}

{dsbulk-migrator} extends {dsbulk-loader} with migration-specific commands: `migrate-live`, `generate-script`, and `generate-ddl`.
{dsbulk-loader} is a high-performance data loading and unloading tool for {cass-short}-based databases.
You can use it to load, unload, and count records.

It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts.
Because {dsbulk-loader} doesn't have the same data validation capabilities as {cass-migrator-short}, it is best for migrations that don't require extensive data validation, aside from post-migration row counts.

You can use {dsbulk-migrator} alone or with {product-proxy}.
You can use {dsbulk-loader} alone or with {product-proxy}.

For more information, see xref:ROOT:dsbulk-migrator.adoc[].
For more information, see xref:dsbulk:overview:dsbulk-about.adoc[].

[TIP]
====
include::ROOT:partial$dsbulk-migrator-deprecation.adoc[]
====

=== Other data migration processes

Expand Down
3 changes: 0 additions & 3 deletions modules/ROOT/pages/create-target.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,6 @@ scp -i some-key.pem /path/to/scb.zip user@client-ip-or-host:
[IMPORTANT]
====
On your new database, the keyspace names, table names, column names, data types, and primary keys must be identical to the schema on the origin cluster or the migration will fail.

To help you prepare the schema from the DDL in your origin cluster, consider using the `generate-ddl` functionality in the {dsbulk-migrator-repo}[{dsbulk-migrator}].
====
+
Note the following limitations and exceptions for tables in {astra-db}:
Expand Down Expand Up @@ -106,7 +104,6 @@ On your new cluster, the keyspace names, table names, column names, data types,
====
+
To copy the schema, you can run CQL `DESCRIBE` on the origin cluster to get the schema that is being migrated, and then run the output on your new cluster.
Alternatively, you can use the `generate-ddl` functionality in the {dsbulk-migrator-repo}[{dsbulk-migrator}].
+
If your origin cluster is running an earlier version, you might need to edit CQL clauses that are no longer supported in newer versions, such as `COMPACT STORAGE`.
For specific changes in each version, see the release notes for your database platform and {cass-short} driver.
Expand Down
18 changes: 9 additions & 9 deletions modules/ROOT/pages/deployment-infrastructure.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ In a sidecar deployment, each client application instance would be connecting to
[[_machines]]
== Hardware requirements

You need a number of machines to run your {product-proxy} instances, plus additional machines for the centralized jumphost, and for running {dsbulk-migrator} or {cass-migrator} (which are recommended data migration and validation tools).
You need a number of machines to run your {product-proxy} instances, plus additional machines for the centralized jumphost, and for running {dsbulk-loader} or {cass-migrator} (which are recommended data migration and validation tools).

This section uses the term _machine_ broadly to refer to a cloud instance (on any cloud provider), a VM, or a physical server.

Expand Down Expand Up @@ -83,19 +83,17 @@ The jumphost machine must meet the following specifications:
* 200 to 500 GB of storage depending on the amount of metrics history that you want to retain
* Equivalent to AWS **c5.2xlarge**, GCP **e2-standard-8**, or Azure **A8 v2**

Data migration tools ({dsbulk-migrator} or {cass-migrator})::
You need at least one machine to run {dsbulk-migrator} or {cass-migrator} for the data migration and validation (xref:ROOT:migrate-and-validate-data.adoc[Phase 2]).
Data migration tools ({dsbulk-loader} or {cass-migrator})::
You need at least one machine to run {dsbulk-loader} or {cass-migrator} for the data migration and validation (xref:ROOT:migrate-and-validate-data.adoc[Phase 2]).
Even if you plan to use another data migration tool, you might need infrastructure for these tools or your chosen tool.
For example, you can use {dsbulk-migrator} to generate DDL statements to help you recreate your origin cluster's schema on your target cluster, and {cass-migrator} is used for data validation after migrating data with {sstable-sideloader}.
For example, {cass-migrator} is used for data validation after migrating data with {sstable-sideloader}.
+
{company} recommends that you start with at least one VM that meets the following minimum specifications:
+
* Ubuntu Linux 20.04 or 22.04, Red Hat Family Linux 7 or newer
* 16 vCPUs
* 64 GB RAM
* 200 GB to 2 TB of storage
+
If you plan to use {dsbulk-migrator} to unload and load multiple terabytes of data from the origin cluster to the target cluster, consider allocating additional space for data that needs to be staged between unloading and loading.
* 200 GB to 2 TB of storage or more for larger migrations
* Equivalent to AWS **m5.4xlarge**, GCP **e2-standard-16**, or Azure **D16 v5**

+
Expand All @@ -104,15 +102,17 @@ Whether you need additional machines depends on the total amount of data you nee
All machines must meet the minimum specifications.

For example, if you have 20 TBs of existing data to migrate, you could use 4 VMs to speed up the migration.
Then, you would run {dsbulk-migrator} or {cass-migrator} in parallel on each VM with each one responsible for migrating specific tables or a portion of the data, such as 25% or 5 TB.
Then, you would run {dsbulk-loader} or {cass-migrator} in parallel on each VM with each one responsible for migrating specific tables or a portion of the data, such as 25% or 5 TB.

If you have one especially large table, such as 75% of the total data in one table, you can use multiple VMs to migrate that one table.
For example, if you have 4 VMs, you can use three VMs in parallel for the large table by splitting the table's full token range into three groups.
Then, each VM migrates one group of tokens, and you use the fourth VM to migrate the remaining smaller portion of the data.

[IMPORTANT]
====
Make sure that your origin and target clusters can handle high traffic from your chosen data migration tool in addition to the live traffic from your application.
Regardless of the number of data migration machines or the amount of data you need to migrate, make sure the machines have enough space to stage data between unloading and loading during the migration.

Additionally, make sure that your origin and target clusters can handle high traffic from your chosen data migration tool in addition to the live traffic from your application.

Test migrations in a lower environment before you proceed with production migrations.

Expand Down
Loading