Skip to content
Merged
7 changes: 2 additions & 5 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,8 @@
** xref:ROOT:change-read-routing.adoc[]
* Phase 5
** xref:ROOT:connect-clients-to-target.adoc[]
* Support
** xref:ROOT:troubleshooting-tips.adoc[]
** xref:ROOT:troubleshooting-scenarios.adoc[]
** xref:ROOT:faqs.adoc[]
** xref:ROOT:glossary.adoc[]
* xref:ROOT:troubleshooting-tips.adoc[]
* xref:ROOT:faqs.adoc[]
* Release notes
** {product-proxy-repo}/releases[{product-proxy} release notes]
** {product-automation-repo}/releases[{product-automation} release notes]
Expand Down
56 changes: 38 additions & 18 deletions modules/ROOT/pages/components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ This tool is open-source software.
It doesn't perform data migrations and it doesn't have awareness of ongoing migrations.
Instead, you use a <<data-migration-tools,data migration tool>> to perform the data migration and validate migrated data.

{product-proxy} reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters.
{product-proxy} reduces risks to upgrades and migrations by decoupling the origin (source) cluster from the target (destination) cluster and maintaining consistency between both clusters.
You decide when you want to switch permanently to the target cluster.

After migrating your data, changes to your application code are usually minimal, depending on your client's compatibility with the origin and target clusters.
Expand All @@ -30,13 +30,16 @@ Typically, you only need to update the connection string.
=== How {product-proxy} handles reads and writes

{company} created {product-proxy} to orchestrate requests between a client application and both the origin and target clusters.
These clusters can be any CQL-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.
These clusters can be any xref:cql:ROOT:index.adoc[CQL]-compatible data store, such as {cass-reg}, {dse}, and {astra-db}.

During the migration process, you designate one cluster as the _primary cluster_, which serves as the source of truth for reads.
For the majority of the migration process, this is typically the origin cluster.
Towards the end of the migration process, when you are ready to read from your target cluster, you set the target cluster as the primary cluster.
Towards the end of the migration process, when you are ready to read exclusively from your target cluster, you set the target cluster as the primary cluster.

==== Writes
The other cluster is referred to as the _secondary cluster_.
While {product-proxy} is active, write requests are sent to both clusters to ensure data consistency, but only the primary cluster serves read requests.

==== Writes (dual-write logic)

{product-proxy} sends every write operation (`INSERT`, `UPDATE`, `DELETE`) synchronously to both clusters at the requested consistency level:

Expand All @@ -46,7 +49,7 @@ The client can then retry the request, if appropriate, based on the client's ret

This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application.

For information about how {product-proxy} handles lightweight transactions (LWTs), see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the applied flag].
For information about how {product-proxy} handles Lightweight Transactions (LWTs), see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the applied flag].

==== Reads

Expand Down Expand Up @@ -80,22 +83,23 @@ For simplicity, you can use {product-utility} and {product-automation} to set up

== {product-utility} and {product-automation}

You can use {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage {product-proxy} and the associated monitoring stack.
You can use {product-automation-repo}[{product-utility} and {product-automation}] to set up and run Ansible playbooks that deploy and manage multiple {product-proxy} instances and the associated monitoring stack (Prometheus metrics and associated Grafana visualizations).

https://www.ansible.com/[Ansible] is a suite of software tools that enables infrastructure as code.
It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality.
The Ansible automation for {product-short} is organized into playbooks, each implementing a specific operation.
The machine from which the playbooks are run is known as the Ansible Control Host.
In {product-short}, the Ansible Control Host runs as a Docker container.
https://www.ansible.com/[Ansible] is a suite of software tools that enable infrastructure as code.
It is open source, and its capabilities include software provisioning, configuration management, and application deployment.

You use {product-utility} to set up Ansible in a Docker container, and then you use {product-automation} to run the Ansible playbooks from the Docker container created by {product-utility}.
Ansible playbooks streamline and automate the deployment and management of {product-proxy} instances and their monitoring components.
Playbooks are YAML files that define a series of tasks to be executed on one or more remote machines, including installing software, configuring settings, and managing services.
They are repeatable and reusable, and they simplify deployment and configuration management because each playbook focuses on a specific operation, such as rolling restarts.

{product-utility} creates the Docker container acting as the Ansible Control Host, from which {product-automation} allows you to deploy and manage the {product-proxy} instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data.
You run playbooks from a centralized machine known as the Ansible Control Host.
{product-utility}, which is included with {product-automation}, creates the Docker container that acts as the Ansible Control Host.

To use {product-utility} and {product-automation}, you must prepare the recommended infrastructure, as explained in xref:deployment-infrastructure.adoc[].
To use {product-utility} and {product-automation}, you must xref:deployment-infrastructure.adoc[prepare the recommended infrastructure].

For more information, see xref:setup-ansible-playbooks.adoc[] and xref:deploy-proxy-monitoring.adoc[].
For more information about the role of Ansible and Ansible playbooks in the {product-short} process, see xref:setup-ansible-playbooks.adoc[] and xref:deploy-proxy-monitoring.adoc[].

[#data-migration-tools]
== Data migration tools

You use data migration tools to move data between clusters and validate the migrated data.
Expand Down Expand Up @@ -130,7 +134,7 @@ For more information, see xref:ROOT:dsbulk-migrator.adoc[].

=== Other data migration processes

Depending on your source and target databases, there might be other data migration tools available for your migration.
Depending on your origin and target databases, there might be other data migration tools available for your migration.
For example, if you want to write your own custom data migration processes, you can use a tool like Apache Spark(TM).

To use a data migration tool with {product-proxy}, it must meet the following requirements:
Expand All @@ -142,5 +146,21 @@ To use a data migration tool with {product-proxy}, it must meet the following re
Because {product-proxy} requires that both databases can successfully process the same read/write statements, migrations that perform significant data transformations might not be compatible with {product-proxy}.
The impact of data transformations depends on your specific data model, database platforms, and the scale of your migration.

For data-only migrations that aren't concerned with live application traffic or minimizing downtime, your chosen tool depends on your source and target databases, the compatibility of the data models, and the scale of your migration.
Describing the full range of these tools is beyond the scope of this document, which focuses on full-scale platform migrations with the {product-short} tools and verified {product-short}-compatible data migration tools.
For data-only migrations that aren't concerned with live application traffic or minimizing downtime, your chosen tool depends on your origin and target databases, the compatibility of the data models, and the scale of your migration.
Describing the full range of these tools is beyond the scope of this document, which focuses on full-scale platform migrations with the {product-short} tools and verified {product-short}-compatible data migration tools.

== In-place migrations

[WARNING]
====
In-place migrations carry a higher risk of data loss or corruption, require progressive manual reconfiguration of the cluster, and are more cumbersome to rollback compared to the {product-short} process.

Whenever possible, {company} recommends using the {product-short} process to orchestrate live migrations between separate clusters, which eliminates the need for progressive configuration changes, and allows you to seamlessly xref:ROOT:rollback.adoc[rollback to your origin cluster] if there is a problem during the migration.
====

For certain migration paths, it is possible to perform in-place database platform replacements on the same cluster where you data already exists.
Supported paths for in-place migrations include xref:6.9@dse:planning:migrate-cassandra-to-dse.adoc[{cass} to {dse-short}] and xref:1.2@hyper-converged-database:migrate:dse-68-to-hcd-12.adoc[{dse-short} to {hcd-short}].

== See also

* xref:ROOT:zdm-proxy-migration-paths.adoc#incompatible-clusters-and-migrations-with-some-downtime[Incompatible clusters and migrations with some downtime]
20 changes: 9 additions & 11 deletions modules/ROOT/pages/connect-clients-to-proxy.adoc
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
= Connect your client applications to {product-proxy}
:navtitle: Connect client applications to {product-proxy}

{product-proxy} is designed to be similar to a conventional {cass-reg} cluster.
You communicate with it using the CQL query language used in your existing client applications.
It understands the same messaging protocols used by {cass-short}, {dse}, and {astra-db}.
As a result, most of your client applications won't be able to distinguish between connecting to {product-proxy} and connecting directly to your {cass-short} cluster.
{product-proxy} is designed to mimic communication with a typical cluster based on {cass-reg}.
This means that your client applications connect to {product-proxy} in the same way that they already connect to your existing {cass-short}-based clusters.

On this page, we explain how to connect your client applications to a {cass-short} cluster.
We then move on to discuss how this process changes when connecting to a {product-proxy}.
We conclude by describing two sample client applications that serve as real-world examples of how to build a client application that works effectively with {product-proxy}.
You can communicate with {product-proxy} using the same xref:cql:ROOT:index.adoc[CQL] statements used in your existing client applications.
It understands the same messaging protocols used by {cass-short}, {dse-short}, {hcd-short}, and {astra-db}.

You can use the provided sample client applications, in addition to your own, as a quick way to validate that the deployed {product-proxy} is reading and writing data from the expected origin and target clusters.
As a result, most client applications won't be able to distinguish between connections to {product-proxy} and direct connections to a {cass-short}-based cluster.

This topic also explains how to connect `cqlsh` to {product-proxy}.
This page explains how to connect your client applications to a {cass-short}-based cluster, compares this process to connections to {product-proxy}, provides realistic examples of client applications that work effectively with {product-proxy}, and, finally, explains how to connect `cqlsh` to {product-proxy}.
You can use the provided sample client applications, in addition to your own, as a quick way to validate that the deployed {product-proxy} is reading and writing data from the expected origin and target clusters.

== {company}-compatible drivers

Expand Down Expand Up @@ -166,7 +164,7 @@ This is disabled by default in all drivers, but if it was enabled in your client
Token-aware routing isn't enforced when connecting through {product-proxy} because these instances don't hold actual token ranges in the same way as database nodes.
Instead, each {product-proxy} instance has a unique, non-overlapping set of synthetic tokens that simulate token ownership and enable balanced load distribution across the instances.

Upon receiving a request, a {product-proxy} instance routes the request to appropriate source and target database nodes, independent of token ownership.
Upon receiving a request, a {product-proxy} instance routes the request to appropriate origin and target database nodes, independent of token ownership.

If your clients have token-aware routing enabled, you don't need to disable this behavior while using {product-proxy}.
Clients can continue to operate with token-aware routing enabled without negative impacts to functionality or performance.
Expand Down Expand Up @@ -209,7 +207,7 @@ The configuration logic as well as the cluster and session management code have

== Connect cqlsh to {product-proxy}

`cqlsh` is a command-line tool that you can use to send {cass-short} Query Language (CQL) statements to your {cass-short}-based clusters, including {astra-db}, {dse-short}, {hcd-short}, and {cass} databases.
`cqlsh` is a command-line tool that you can use to send CQL statements and `cqlsh`-specific commands to your {cass-short}-based clusters, including {astra-db}, {dse-short}, {hcd-short}, and {cass} databases.

You can use your database's included version of `cqlsh`, or you can download and run a standalone `cqlsh`.

Expand Down
15 changes: 14 additions & 1 deletion modules/ROOT/pages/deployment-infrastructure.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@ Here's a typical deployment showing connectivity between client applications, {p

image::zdm-during-migration3.png[Connectivity between client applications, proxy instances, and clusters.]

=== Don't deploy {product-proxy} as a sidecar

Don't deploy {product-proxy} as a sidecar because it was designed to mimic communication with a {cass-short}-based cluster.
For this reason, {company} recommends deploying multiple {product-proxy} instances, each running on a dedicated machine, instance, or VM.

For best performance, deploy your {product-proxy} instances as close as possible to your client applications, ideally on the same local network, but don't co-deploy them on the same machines as the client applications.
This way, each client application instance can connect to all {product-proxy} instances, just as it would connect to all nodes in a {cass-short}-based cluster or datacenter.

This deployment model provides maximum resilience and failure tolerance guarantees, and it allows the client application driver to continue using the same load balancing and retry mechanisms that it would normally use.

Conversely, deploying a single {product-proxy} instance undermines this resilience mechanism and creates a single point of failure, which can affect client applications if one or more nodes of the underlying origin or target clusters go offline.
In a sidecar deployment, each client application instance would be connecting to a single {product-proxy} instance, and would, therefore, be exposed to this risk.

== Infrastructure requirements

To deploy {product-proxy} and its companion monitoring stack, you must provision infrastructure that meets the following requirements.
Expand Down Expand Up @@ -87,7 +100,7 @@ The only direct access to these machines should be from the jumphost.
The {product-proxy} machines must be able to connect to the origin and target cluster nodes:

* For self-managed clusters ({cass} or {dse-short}), connectivity is needed to the {cass-short} native protocol port (typically 9042).
* For {astra-db}, you will need to ensure outbound connectivity to the {astra} endpoint indicated in the {scb}.
* For {astra-db}, you will need to ensure outbound connectivity to the {astra} endpoint indicated in the xref:astra-db-serverless:databases:secure-connect-bundle.adoc[{scb}].
Connectivity over Private Link is also supported.

The connectivity requirements for the jumphost / monitoring machine are:
Expand Down
2 changes: 1 addition & 1 deletion modules/ROOT/pages/dse-migration-paths.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Migrate data from {dse-short}::
When migrating _from_ {dse-short} to another {cass-short}-based database, follow the migration guidance for your target database to determine cluster compatibility, migration options, and recommendations.
For example, for {astra-db}, see xref:ROOT:astra-migration-paths.adoc[], and for {hcd-short}, see xref:ROOT:hcd-migration-paths.adoc[].

For information about source and target clusters that are supported by the {product-short} tools, see xref:ROOT:zdm-proxy-migration-paths.adoc[].
For information about origin and target clusters that are supported by the {product-short} tools, see xref:ROOT:zdm-proxy-migration-paths.adoc[].

If your target database isn't directly compatible with a migration from {dse-short}, you might need to take interim steps to prepare your data for migration, such as upgrading your {dse-short} version, modifying the data in your existing database to be compatible with the target database, or running an extract, transform, load (ETL) pipeline.
--
Expand Down
4 changes: 2 additions & 2 deletions modules/ROOT/pages/enable-async-dual-reads.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ This allows you to assess the target cluster's performance and make any adjustme
== Response and error handling with asynchronous dual reads

With or without asynchronous dual reads, the client application only receives results from synchronous reads on the primary cluster.
The client never receives results from asynchronous reads on the secondary cluster because these results are used only for {product-proxy}'s asynchronous dual read metrics.
The client never receives results from asynchronous reads on the secondary cluster because these results are used only for {product-proxy}'s asynchronous dual read metrics and testing purposes.

By design, if an asynchronous read fails or times out, it has no impact on client operations and the client application doesn't receive an error.
However, the increased workload from read requests can cause write requests to fail or time out on the secondary cluster.
With or without asynchronous dual reads, a failed write on either cluster returns an error to the client application and potentially triggers a retry.

This functionality is intentional so you can simulate production-scale read traffic on the secondary cluster, in addition to the existing write traffic from {product-proxy}'s xref:components.adoc#how-zdm-proxy-handles-reads-and-writes[dual writes], with the least impact to your applications.

To avoid unnecessary failures due to unmigrated data, enable asynchronous dual reads only after you migrate, validate, and reconcile all data from the origin cluster to the target cluster.
To avoid unnecessary failures due to missing unmigrated data, don't enable asynchronous dual reads until you migrate, validate, and reconcile _all_ data from the origin cluster to the target cluster.

[#configure-asynchronous-dual-reads]
== Configure asynchronous dual reads
Expand Down
Loading