From c50270b0fe5e1696e2e1f4705e6a51c3d9222024 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 19 Nov 2025 14:05:42 -0800 Subject: [PATCH 1/6] combine rolling restart instructions --- modules/ROOT/pages/change-read-routing.adoc | 39 +------ modules/ROOT/pages/components.adoc | 2 +- .../ROOT/pages/deploy-proxy-monitoring.adoc | 14 +-- .../ROOT/pages/enable-async-dual-reads.adoc | 30 +---- .../ROOT/pages/manage-proxy-instances.adoc | 105 +++++++----------- modules/ROOT/pages/troubleshooting-tips.adoc | 4 +- 6 files changed, 57 insertions(+), 137 deletions(-) diff --git a/modules/ROOT/pages/change-read-routing.adoc b/modules/ROOT/pages/change-read-routing.adoc index 3c6cb325..d6768222 100644 --- a/modules/ROOT/pages/change-read-routing.adoc +++ b/modules/ROOT/pages/change-read-routing.adoc @@ -26,46 +26,13 @@ This is harmless but unnecessary. [#change-the-read-routing-configuration] == Change the read routing configuration -Read routing is controlled by a mutable configuration variable. -For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. +Read routing is controlled by a xref:manage-proxy-instances.adoc#change-mutable-config-variable[mutable configuration variable]. -. Connect to your Ansible Control Host container. -+ -For example, `ssh` into the jumphost: -+ -[source,bash] ----- -ssh -F ~/.ssh/zdm_ssh_config jumphost ----- -+ -Then, connect to the Ansible Control Host container: -+ -[source,bash] ----- -docker exec -it zdm-ansible-container bash ----- -+ -.Result -[%collapsible] -==== -[source,bash] ----- -ubuntu@52772568517c:~$ ----- -==== - -. Edit the {product-proxy} core configuration file: `vars/zdm_proxy_core_config.yml`. +. Edit the {product-proxy} core configuration file `vars/zdm_proxy_core_config.yml`. . Change the `primary_cluster` variable to `TARGET`. -. Run the rolling restart playbook to apply the configuration change to your entire {product-proxy} deployment: -+ -[source,bash] ----- -ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ----- - -. Wait while Ansible restarts the {product-proxy} instances, one by one. +. xref:ROOT:manage-proxy-instances.adoc#perform-a-rolling-restart-of-the-proxies[Perform a rolling restart] to apply the configuration change to your entire {product-proxy} deployment. Once the instances are restarted, all reads are routed to the target cluster instead of the origin cluster. diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 6fbb47fa..92c3e24c 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -77,7 +77,7 @@ Throughout the {product-short} documentation, the term _{product-proxy} deployme ==== You can scale {product-proxy} instances horizontally and vertically. -To avoid downtime when applying configuration changes, you can perform rolling restarts on your {product-proxy} instances. +To avoid downtime when applying configuration changes, you can xref:ROOT:manage-proxy-instances.adoc#perform-a-rolling-restart-of-the-proxies[perform a rolling restart] of your {product-proxy} instances. For simplicity, you can use {product-utility} and {product-automation} to set up and run Ansible playbooks that deploy and manage {product-proxy} and its monitoring stack. diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index bdaac209..da966946 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -170,7 +170,8 @@ For instructions, see xref:ROOT:tls.adoc[]. There are additional configuration variables in `vars/zdm_proxy_advanced_config.yml` that you might want to change _at deployment time_ in specific cases. -All advanced configuration variables not listed here are considered mutable and can be changed later if needed (changes can be easily applied to existing deployments in a rolling fashion using the relevant Ansible playbook, as explained later, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]). +All advanced configuration variables that aren't listed here are considered mutable and can be changed later without recreating the entire deployment. +For more information, see xref:manage-proxy-instances.adoc[]. ==== Multi-datacenter clusters @@ -320,14 +321,11 @@ CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS If the {product-proxy} instances fail to start up due to mistakes in the configuration, you can simply rectify the incorrect configuration values and run the deployment playbook again. -[NOTE] +[IMPORTANT] ==== -With the exception of the origin credentials, target credentials, and the `primary_cluster` variable, which can all be changed for existing deployments in a rolling fashion, all cluster connection configuration variables are considered immutable and can only be changed by recreating the deployment. - -If you wish to change any of the cluster connection configuration variables (other than credentials and `primary_cluster`) on an existing deployment, you will need to re-run the `deploy_zdm_proxy.yml` playbook. -This playbook can be run as many times as necessary. - -Be aware that running the `deploy_zdm_proxy.yml` playbook results in a brief window of unavailability of the whole {product-proxy} deployment while all the {product-proxy} instances are torn down and recreated. +The origin credentials, target credentials, and the `primary_cluster` variable are mutable variables that you can change after deploying {product-proxy}. +All other cluster connection configuration variables are immutable; the only way to change these values is by completely recreating the {product-proxy} deployment. +For more information, see xref:ROOT:manage-proxy-instances.adoc[]. ==== [[_setting_up_the_monitoring_stack]] diff --git a/modules/ROOT/pages/enable-async-dual-reads.adoc b/modules/ROOT/pages/enable-async-dual-reads.adoc index 0a0e0616..b5763f40 100644 --- a/modules/ROOT/pages/enable-async-dual-reads.adoc +++ b/modules/ROOT/pages/enable-async-dual-reads.adoc @@ -34,9 +34,9 @@ To avoid unnecessary failures due to missing unmigrated data, don't enable async == Configure asynchronous dual reads Use the `read_mode` variable to enable or disable asynchronous dual reads. -Then, perform rolling restarts of your {product-proxy} instances to apply the configuration change. +Then, perform a rolling restart of your {product-proxy} instances to apply the configuration change. -. In `vars/zdm_proxy_core_config.yml`, edit the `read_mode` variable: +. Edit `vars/zdm_proxy_core_config.yml`, and then set the `read_mode` variable: + [tabs] ====== @@ -59,31 +59,7 @@ read_mode: PRIMARY_ONLY -- ====== -. Perform rolling restarts to apply the configuration change to your {product-proxy} instances. -+ -[tabs] -====== -With {product-automation}:: -+ --- -If you use {product-automation} to manage your {product-proxy} deployment, run the following command: - -[source,bash] ----- -ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ----- --- - -Without {product-automation}:: -+ --- -If you don't use {product-automation}, you must manually restart each instance. - -To avoid downtime, wait for each instance to fully restart and begin receiving traffic before restarting the next instance. --- -====== -+ -For more information about rolling restarts and changing {product-proxy} configuration variables, see xref:manage-proxy-instances.adoc[]. +. xref:ROOT:manage-proxy-instances.adoc#perform-a-rolling-restart-of-the-proxies[Perform a rolling restart] to apply the configuration change to your {product-proxy} instances. == Monitor the target cluster's performance diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index 12fbe247..c366c19e 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -4,10 +4,17 @@ After you deploy {product-proxy} instances, you might need to perform various ma If you are using {product-automation}, you can use Ansible playbooks for all of these operations. +[#perform-a-rolling-restart-of-the-proxies] == Perform a rolling restart of the proxies Rolling restarts of the {product-proxy} instances are useful to apply configuration changes or to upgrade the {product-proxy} version without impacting the availability of the deployment. +[IMPORTANT] +==== +A rolling restart is a destructive action because it stops the previous containers, and then starts new containers. +xref:ROOT:troubleshooting-tips.adoc#proxy-logs[Collect the logs] before you apply the configuration change if you want to keep them. +==== + [tabs] ====== With {product-automation}:: @@ -15,8 +22,16 @@ With {product-automation}:: -- If you use {product-automation} to manage your {product-proxy} deployment, you can use a dedicated playbook to perform rolling restarts of all {product-proxy} instances in a deployment: -. Connect to the Ansible Control Host Docker container. -You can do this from the jumphost machine by running the following command: +. Connect to your Ansible Control Host container. ++ +For example, `ssh` into the jumphost: ++ +[source,bash] +---- +ssh -F ~/.ssh/zdm_ssh_config jumphost +---- ++ +Then, connect to the Ansible Control Host container: + [source,bash] ---- @@ -38,14 +53,20 @@ ubuntu@52772568517c:~$ ---- ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ---- + +The rolling restart playbook recreates each {product-proxy} container, one by one. +The {product-proxy} deployment remains available at all times, and you can safely use it throughout this operation. +If you modified mutable configuration variables, the new containers use the updated configuration files. + +The playbook performs the following actions automatically: + +. {product-automation} stops one container gracefully, and then waits for it to shut down. +. {product-automation} recreates the container, and then starts it. +. {product-automation} calls the xref:deploy-proxy-monitoring.adoc#_indications_of_success_on_origin_and_target_clusters[readiness endpoint] to check the container's status: + -While running, this playbook gracefully stops one container and waits for it to shut down before restarting the container. -Then, it calls the xref:deploy-proxy-monitoring.adoc#_indications_of_success_on_origin_and_target_clusters[readiness endpoint] to check the container's status: -+ -* If the check fails, the playbook repeats the check every five seconds for a maximum of six attempts. -If all six attempts fail, the playbook interrupts the entire rolling restart process. -* If the check succeeds, the playbook waits before proceeding to the next container. -+ +* If the status check fails, {product-automation} repeats the check up to six times at 5-second intervals. +If all six attempts fail, {product-automation} interrupts the entire rolling restart process. +* If the check succeeds, {product-automation} waits a fixed amount of time, and then moves on to the next container. The default pause between containers is 10 seconds. You can change the pause duration in `zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml`. -- @@ -69,7 +90,9 @@ For information about configuring, retrieving, and interpreting {product-proxy} Some, but not all, configuration variables can be changed after you deploy a {product-proxy} instance. -This section lists the _mutable_ configuration variables that you can change on an existing {product-proxy} deployment using the rolling restart playbook. +This section lists the _mutable_ configuration variables that you can change on an existing {product-proxy} deployment. + +After you edit mutable variables in their corresponding configuration files (`vars/zdm_proxy_core_config.yml`, `vars/zdm_proxy_cluster_config.yml`, or `vars/zdm_proxy_advanced_config.yml`), you must <> to apply the configuration changes to your {product-proxy} instances. === Mutable variables in `vars/zdm_proxy_core_config.yml` @@ -189,41 +212,15 @@ This deprecated variable is no longer functional. Instead, the expected credentials are based on the authentication requirements of the origin and target clusters. For more information, see xref:ROOT:connect-clients-to-proxy.adoc#_client_application_credentials[Client application credentials]. -=== Apply mutable configuration changes - -Edit mutable variables in their corresponding configuration files (`vars/zdm_proxy_core_config.yml`, `vars/zdm_proxy_cluster_config.yml`, or `vars/zdm_proxy_advanced_config.yml`), and then apply the configuration changes to your {product-proxy} instances using the rolling restart playbook. +== Change immutable configuration variables -[IMPORTANT] -==== -A configuration change is a destructive action because the rolling restart playbook removes the previous containers and their logs, replacing them with new containers and the new configuration. -xref:ROOT:troubleshooting-tips.adoc#proxy-logs[Collect the logs] before you run the playbook if you want to keep them. -==== +All configuration variables not listed in <> are _immutable_ and can only be changed by recreating the deployment with the xref:ROOT:deploy-proxy-monitoring.adoc[initial deployment playbook] (`deploy_zdm_proxy.yml`): [source,bash] ---- -ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory +ansible-playbook deploy_zdm_proxy.yml -i zdm_ansible_inventory ---- -The rolling restart playbook recreates each {product-proxy} container, one by one, with the updated configuration files. -The {product-proxy} deployment remains available at all times, and you can safely use it throughout this operation. - -The playbook performs the following actions automatically: - -. {product-automation} stops one container gracefully, and then waits for it to shut down. -. {product-automation} recreates the container, and then starts it. -. {product-automation} checks that the container started successfully by checking the readiness endpoint: -+ -* If unsuccessful, {product-automation} repeats the check up to six times at 5-second intervals. -If it still fails, {product-automation} interrupts the entire rolling restart process. -* If successful, {product-automation} waits 10 seconds (default), and then moves on to the next container. -+ -The pause between the restart of each {product-proxy} instance defaults to 10 seconds. -To change this value, you can set the desired number of seconds in `zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml`. - -== Change immutable configuration variables - -All configuration variables not listed in <> are _immutable_ and can only be changed by recreating the deployment with the xref:ROOT:deploy-proxy-monitoring.adoc[initial deployment playbook] (`deploy_zdm_proxy.yml`). - You can re-run the deployment playbook as many times as necessary. However, this playbook decommissions and recreates _all_ {product-proxy} instances simultaneously. This results in a brief period of time where the entire {product-proxy} deployment is offline because no instances are available. @@ -259,28 +256,7 @@ For example: zdm_proxy_image: datastax/zdm-proxy:2.3.4 ---- -. Run the `rolling_update_zdm_proxy.yml` playbook: -+ -[source,bash] ----- -ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ----- -+ -The rolling restart playbook recreates each {product-proxy} container, one by one, with the new image. -The {product-proxy} deployment remains available at all times, and you can safely use it throughout this operation. -+ -The playbook performs the following actions automatically: -+ -.. {product-automation} stops one container gracefully, and then waits for it to shut down. -.. {product-automation} recreates the container, and then starts it. -.. {product-automation} checks that the container started successfully by checking the readiness endpoint: -+ -** If unsuccessful, {product-automation} repeats the check up to six times at 5-second intervals. -If it still fails, {product-automation} interrupts the entire rolling restart process. -** If successful, {product-automation} waits 10 seconds (default), and then moves on to the next container. -+ -The pause between the restart of each {product-proxy} instance defaults to 10 seconds. -To change this value, you can set the desired number of seconds in `zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml`. +. <> to update all {product-proxy} instances to the new version. == Scale {product-proxy} instances @@ -318,7 +294,12 @@ Change the topology of your existing {product-proxy} deployment, and then restar + For example, if you want to add three nodes to a deployment with six nodes, then the amended inventory file must contain nine total IPs, including the six existing IPs and the three new IPs. + -. Run the `deploy_zdm_proxy.yml` playbook to apply the change and start the new instances. +. Run the `deploy_zdm_proxy.yml` playbook to apply the change and start the new instances: ++ +[source,bash] +---- +ansible-playbook deploy_zdm_proxy.yml -i zdm_ansible_inventory +---- + Rerunning the playbook stops the existing instances, destroys them, and then creates and starts a new deployment with new instances based on the amended inventory. This results in a brief interruption of service for your entire {product-proxy} deployment. diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index cafdd09a..54a0bf7a 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -259,9 +259,7 @@ The following sections provide troubleshooting advice for specific issues or err [#configuration-changes-arent-applied-by-zdm-automation] === Configuration changes aren't applied by {product-automation} -If you change some configuration variables, and then performing a rolling restart with the `rolling_update_zdm_proxy.yml` playbook, you might notice that some changes aren't applied to your {product-proxy} instances. - -Typically, this happens because you modified an immutable configuration variable. +If some configuration changes aren't applied to your {product-proxy} instances after a rolling restart, this typically means that you modified an immutable configuration variable. Not all {product-proxy} configuration variables can be changed after deployment, with or without a rolling restart. For a list of variables that you can change on a live deployment, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change mutable configuration variables]. From 8923cbbf3a373213cff71ea63b4440ac37d15259 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 19 Nov 2025 14:20:54 -0800 Subject: [PATCH 2/6] zdm-535 --- modules/ROOT/pages/components.adoc | 22 ++++++++++++++++++- .../ROOT/pages/feasibility-checklists.adoc | 7 ++++++ 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 92c3e24c..2f60557d 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -41,7 +41,7 @@ While {product-proxy} is active, write requests are sent to both clusters to ens ==== Writes (dual-write logic) -{product-proxy} sends every write operation (`INSERT`, `UPDATE`, `DELETE`) synchronously to both clusters at the requested consistency level: +{product-proxy} sends every write operation (`INSERT`, `UPDATE`, `DELETE`) synchronously to both clusters at the client application's requested consistency level: * If the write is acknowledged in both clusters at the requested consistency level, then the operation returns a successful write acknowledgement to the client that issued the request. * If the write fails in either cluster, then {product-proxy} passes a write failure, originating from the primary cluster, back to the client. @@ -64,6 +64,26 @@ The results of asynchronous reads aren't returned to the client because asynchro For more information, see xref:ROOT:enable-async-dual-reads.adoc[]. +==== Consistency levels + +{product-proxy} doesn't directly manage or track consistency levels. +Instead, it passes the requested consistency level from the client application to each cluster (origin and target) when routing requests. + +For reads, the consistency level is always passed to the primary cluster, which always receives read requests. +The request is then executed within the primary cluster at the requested consistency level. + +If asynchronous dual reads are enabled, the consistency level is passed to both clusters, and each cluster executes the read request at the requested consistency level independently. +If the request fails to attain the required quorum on the primary cluster, the failure is returned to the client application as normal. +However, failure of asynchronous reads on the secondary cluster are logged but not returned to the client application. + +For writes, the consistency level is passed to both clusters, and each cluster executes the write request at the requested consistency level independently. +If either request fails to attain the required quorum, the failure is returned to the client application as normal. + +If either cluster is an {astra-db} database, be aware that `CL.ONE` isn't supported by {astra}. +Requests sent with `CL.ONE` to {astra-db} databases always fail. +{product-proxy} doesn't mute these failures because you need to be aware of them. +You must adapt your client application to use a consistency level that is supported by both clusters to ensure that the migration is seamless and error-free. + === High availability and multiple {product-proxy} instances {product-proxy} is designed to be highly available and run a clustered fashion to avoid a single point of failure. diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index cdfe3a36..a8695605 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -71,6 +71,13 @@ If you need to make changes to the application or data model to ensure that your It is also highly recommended to perform tests and benchmarks when connected directly to {astra-db} prior to the migration, so that you don't find unexpected issues during the migration process. +=== {astra} doesn't support CL.ONE + +`CL.ONE` isn't supported by {astra}, and xref:ROOT:components.adoc#how-zdm-proxy-handles-reads-and-writes[read and write requests sent through {product-proxy}] with `CL.ONE` to {astra-db} databases always fail. + +{product-proxy} doesn't mute these failures because you need to be aware of them. +You must adapt your client application to use a consistency level that is supported by both clusters to ensure that the migration is seamless and error-free. + [[_read_only_applications]] === Read-only applications From 1a14641abe93b277bdf515eaf4752288884d05ce Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 19 Nov 2025 14:41:16 -0800 Subject: [PATCH 3/6] zdm-403 --- modules/ROOT/pages/deploy-proxy-monitoring.adoc | 2 +- modules/ROOT/pages/deployment-infrastructure.adoc | 13 ++++++++++--- modules/ROOT/pages/feasibility-checklists.adoc | 13 +++++++++++++ modules/sideloader/pages/prepare-sideloader.adoc | 1 + 4 files changed, 25 insertions(+), 4 deletions(-) diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index da966946..df18f74a 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -175,7 +175,7 @@ For more information, see xref:manage-proxy-instances.adoc[]. ==== Multi-datacenter clusters -For multi-datacenter origin clusters, specify the name of the datacenter that {product-proxy} should consider local. +For xref:ROOT:feasibility-checklists.adoc[multi-datacenter origin clusters], specify the name of the datacenter that {product-proxy} should consider local. To do this, set the `origin_local_datacenter` property to the local datacenter name. Similarly, for multi-datacenter target clusters, set the `target_local_datacenter` property to the local datacenter name. These two variables are stored in `vars/zdm_proxy_advanced_config.yml`. diff --git a/modules/ROOT/pages/deployment-infrastructure.adoc b/modules/ROOT/pages/deployment-infrastructure.adoc index cba607c1..02aac877 100644 --- a/modules/ROOT/pages/deployment-infrastructure.adoc +++ b/modules/ROOT/pages/deployment-infrastructure.adoc @@ -14,12 +14,19 @@ The {product-proxy} process is lightweight, requiring only a small amount of res {product-proxy} should be deployed close to your client application instances. This can be on any cloud provider as well as on-premise, depending on your existing infrastructure. -If you have a multi-DC cluster with multiple set of client application instances deployed to geographically distributed data centers, you should plan for a separate {product-proxy} deployment for each data center. - -Here's a typical deployment showing connectivity between client applications, {product-proxy} instances, and clusters: +The following diagram shows a typical deployment with connectivity between client applications, {product-proxy} instances, and clusters: image::zdm-during-migration3.png[Connectivity between client applications, proxy instances, and clusters.] +=== Multiple datacenter clusters + +If you have a multi-datacenter cluster with multiple set of client application instances deployed to geographically distributed datacenters, you must plan a separate {product-proxy} deployment for each datacenter. + +In the configuration for each {product-proxy} deployment, specify only the contact points that belong to that datacenter, and set the `xref:ROOT:deploy-proxy-monitoring.adoc#_advanced_configuration_optional[origin_local_datacenter]` and `xref:ROOT:deploy-proxy-monitoring.adoc#_advanced_configuration_optional[target_local_datacenter]` properties as needed. + +If your origin and target clusters are both multi-datacenter clusters, this configuration will be more complicated to correctly orchestrate traffic routing through {product-proxy}. +{company} recommends contacting {support-url}[{company} Support] for assistance with complex multi-region and multi-datacenter migrations. + === Don't deploy {product-proxy} as a sidecar Don't deploy {product-proxy} as a sidecar because it was designed to mimic communication with a {cass-short}-based cluster. diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index a8695605..5f79ebfa 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -78,6 +78,19 @@ It is also highly recommended to perform tests and benchmarks when connected dir {product-proxy} doesn't mute these failures because you need to be aware of them. You must adapt your client application to use a consistency level that is supported by both clusters to ensure that the migration is seamless and error-free. +=== {astra} doesn't support the Stargate APIs + +The xref:astra-db-serverless:api-reference:compare-dataapi-to-stargate.adoc[Stargate APIs] (Document, REST, GraphQL, gRPC) are deprecated for {astra}. + +If you are migrating to {astra} from an origin cluster that uses any of these APIs, your client applications won't work with {astra}. + +Before you migrate, you must change your applications to use other programmatic access, such as {cass-short} drivers or the {data-api}. + +=== Multi-region and multi-node {astra} migrations + +If your migration involves multi-region or multi-node clusters, plan your strategy for migrating and replicating data to the different regions. +For more information, see the xref:sideloader:prepare-sideloader.adoc#additional-preparation-for-specific-migration-scenarios[{sstable-sideloader} preparations for specific migration scenarios] and the xref:ROOT:deployment-infrastructure.adoc[{product-proxy} infrastructure guidelines for multi-datacenter clusters]. + [[_read_only_applications]] === Read-only applications diff --git a/modules/sideloader/pages/prepare-sideloader.adoc b/modules/sideloader/pages/prepare-sideloader.adoc index 18b1009b..74d500a0 100644 --- a/modules/sideloader/pages/prepare-sideloader.adoc +++ b/modules/sideloader/pages/prepare-sideloader.adoc @@ -178,6 +178,7 @@ Your administration server must have SSH access to each node in your origin clus * https://jqlang.github.io/jq/[jq] to format JSON responses from the {astra} {devops-api}. The {devops-api} commands in this guide use this tool. +[#additional-preparation-for-specific-migration-scenarios] == Additional preparation for specific migration scenarios The following information can help you prepare for specific migration scenarios, including multi-region migrations and multiple migrations to the same database. From 4be6413cbc0b6a125b9f9ba01c8941b6681be76e Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 19 Nov 2025 15:34:37 -0800 Subject: [PATCH 4/6] zdm-545 --- modules/ROOT/pages/components.adoc | 10 ++++++++++ modules/ROOT/pages/feasibility-checklists.adoc | 1 + modules/ROOT/pages/manage-proxy-instances.adoc | 3 ++- modules/ROOT/pages/metrics.adoc | 3 ++- 4 files changed, 15 insertions(+), 2 deletions(-) diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 2f60557d..78c611f2 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -84,6 +84,16 @@ Requests sent with `CL.ONE` to {astra-db} databases always fail. {product-proxy} doesn't mute these failures because you need to be aware of them. You must adapt your client application to use a consistency level that is supported by both clusters to ensure that the migration is seamless and error-free. +==== Timeouts and connection failures + +When requests are routed through {product-proxy}, there is a proxy-side timeout and application-side timeout. + +If a response isn't received within the timeout period (`xref:ROOT:manage-proxy-instances.adoc#zdm_proxy_request_timeout_ms[zdm_proxy_request_timeout_ms]`), nothing is returned to the request handling thread, and, by extension, no response is sent to the client. +This inevitably results in a client-side timeout, which is an accurate representation of the fact that at least one cluster failed to respond to the request. +The clusters that are required to respond depend on the type of request and whether asynchronous dual reads are enabled. + +See also xref:ROOT:feasibility-checklists.adoc#driver-retry-policy-and-query-idempotence[Driver retry policy and query idempotence] and `xref:manage-proxy-instances.adoc#zdm_proxy_max_stream_ids[zdm_proxy_max_stream_ids]`. + === High availability and multiple {product-proxy} instances {product-proxy} is designed to be highly available and run a clustered fashion to avoid a single point of failure. diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index 5f79ebfa..42c4a168 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -192,6 +192,7 @@ If the performance impact is unacceptable for your application, or you are using Most drivers have utility methods that help you compute these values locally. For more information, see your driver's documentation and xref:datastax-drivers:developing:query-timestamps.adoc[Query timestamps in {cass-short} drivers]. +[#driver-retry-policy-and-query-idempotence] == Driver retry policy and query idempotence [IMPORTANT] diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index c366c19e..e132e1c8 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -153,9 +153,10 @@ If `now()` is used in any of your primary key columns, {company} recommends enab For more information, see xref:ROOT:feasibility-checklists.adoc#cql-function-replacement[Server-side non-deterministic functions in the primary key]. ==== -* `zdm_proxy_request_timeout_ms`: Global timeout in milliseconds of a request at proxy level. +* [[zdm_proxy_request_timeout_ms]]`zdm_proxy_request_timeout_ms`: Global timeout in milliseconds of a request at proxy level. Determines how long {product-proxy} waits for one cluster (for reads) or both clusters (for writes) to reply to a request. Upon reaching the timeout limit, {product-proxy} abandons the request and no longer considers it pending, which frees up internal resources to processes other requests. ++ When a request is abandoned due to a timeout, {product-proxy} doesn't return any result or error. A timeout warning or error is only returned when the client application's own timeout is reached and the request is expired on the driver side. + diff --git a/modules/ROOT/pages/metrics.adoc b/modules/ROOT/pages/metrics.adoc index 509fab44..e7b9784d 100644 --- a/modules/ROOT/pages/metrics.adoc +++ b/modules/ROOT/pages/metrics.adoc @@ -48,7 +48,8 @@ This metric is measured as the total latency across both clusters for a single x * Number of client connections * Prepared Statement cache: -** Cache Misses: meaning, a prepared statement was sent to {product-proxy}, but it wasn't on its cache, so the proxy returned an `UNPREPARED` response to make the driver send the `PREPARE` request again. +** Cache Misses: If a prepared statement is sent to {product-proxy} but the statement's `preparedID` isn't present in the node's cache, then {product-proxy} sends an `UNPREPARED` response to the client to reprepare the statement. +This metric tracks the number of times this happens. ** Number of cached prepared statements. * Request Failure Rates: the number of request failures per interval. From 80b7c01b7304c8a8ffa110b2ad41d13efbc2eb24 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 19 Nov 2025 15:48:32 -0800 Subject: [PATCH 5/6] connect dots --- modules/ROOT/pages/feasibility-checklists.adoc | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index 42c4a168..43599652 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -62,6 +62,10 @@ For example, if a table has 10 columns but your client application only uses 5 o You can also change the primary key in some cases. For example, if your compound primary key is `PRIMARY KEY (A, B)` and you always provide parameters for the `A` and `B` columns in your CQL statements then you could change the key to `PRIMARY KEY (B, A)` when creating the schema on the target because your CQL statements will still run successfully. +== Request and error handling expectations with {product-proxy} + +See xref:ROOT:components.adoc#how-zdm-proxy-handles-reads-and-writes[How {product-proxy} handles reads and writes]. + == Considerations for {astra-db} migrations {astra-db} implements guardrails and sets limits to ensure good practices, foster availability, and promote optimal configurations for your databases. @@ -73,7 +77,7 @@ It is also highly recommended to perform tests and benchmarks when connected dir === {astra} doesn't support CL.ONE -`CL.ONE` isn't supported by {astra}, and xref:ROOT:components.adoc#how-zdm-proxy-handles-reads-and-writes[read and write requests sent through {product-proxy}] with `CL.ONE` to {astra-db} databases always fail. +`CL.ONE` isn't supported by {astra}, and read and write requests sent through {product-proxy} with `CL.ONE` to {astra-db} databases always fail. {product-proxy} doesn't mute these failures because you need to be aware of them. You must adapt your client application to use a consistency level that is supported by both clusters to ensure that the migration is seamless and error-free. @@ -99,11 +103,8 @@ The default interval is 30,000 milliseconds, and it can be configured with the ` In {product-proxy} versions earlier than 2.1.0, read-only applications require special handling to avoid connection termination due to inactivity. -{company} recommends that you upgrade to version 2.1.0 or later to benefit from the heartbeat feature. -If you have an existing {product-proxy} deployment, you can xref:ROOT:troubleshooting-tips.adoc#check-version[check your {product-proxy} version]. -For upgrade instructions, see xref:ROOT:manage-proxy-instances.adoc#_upgrade_the_proxy_version[Upgrade the proxy version]. - -If you cannot upgrade to version 2.1.0 or later, see the alternatives described in xref:ROOT:troubleshooting-tips.adoc#client-application-closed-connection-errors-every-10-minutes-when-migrating-to-astra-db[Client application closed connection errors every 10 minutes when migrating to {astra-db}]. +{company} recommends that you use {product-proxy} version 2.1.0 or later to benefit from the heartbeat feature. +If you cannot use version 2.1.0 or later, see the alternatives described in xref:ROOT:troubleshooting-tips.adoc#client-application-closed-connection-errors-every-10-minutes-when-migrating-to-astra-db[Client application closed connection errors every 10 minutes when migrating to {astra-db}]. [[non-idempotent-operations]] == Lightweight Transactions and other non-idempotent operations From c2b8268ba8fc8aee422ae404d91fe5a6c2e1283d Mon Sep 17 00:00:00 2001 From: "April I. Murphy" <36110273+aimurphy@users.noreply.github.com> Date: Thu, 20 Nov 2025 03:26:35 -0800 Subject: [PATCH 6/6] Apply suggestions from code review Co-authored-by: Sarah Edwards --- modules/ROOT/pages/deploy-proxy-monitoring.adoc | 2 +- modules/ROOT/pages/feasibility-checklists.adoc | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index df18f74a..6dde98ce 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -170,7 +170,7 @@ For instructions, see xref:ROOT:tls.adoc[]. There are additional configuration variables in `vars/zdm_proxy_advanced_config.yml` that you might want to change _at deployment time_ in specific cases. -All advanced configuration variables that aren't listed here are considered mutable and can be changed later without recreating the entire deployment. +All advanced configuration variables that aren't listed here are mutable and can be changed later without recreating the entire deployment. For more information, see xref:manage-proxy-instances.adoc[]. ==== Multi-datacenter clusters diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index 43599652..29db369c 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -84,11 +84,11 @@ You must adapt your client application to use a consistency level that is suppor === {astra} doesn't support the Stargate APIs -The xref:astra-db-serverless:api-reference:compare-dataapi-to-stargate.adoc[Stargate APIs] (Document, REST, GraphQL, gRPC) are deprecated for {astra}. +The Stargate APIs (Document, REST, GraphQL, gRPC) are deprecated for {astra}. If you are migrating to {astra} from an origin cluster that uses any of these APIs, your client applications won't work with {astra}. - Before you migrate, you must change your applications to use other programmatic access, such as {cass-short} drivers or the {data-api}. +For more information, see xref:astra-db-serverless:api-reference:compare-dataapi-to-stargate.adoc[]. === Multi-region and multi-node {astra} migrations