In a database application environment, the application connectivity may become the weakest link without a robust connection management. Cockroach Labs technical support has assisted customers in several cases of complete application/database service outages wherein the application lost all database connections to a heathy, operational CockroachDB cluster due to configuration errors or lax connection management.
For continuous service availability, database connectivity requires redundancy and agile connection management, including connection governance, load balancing, failover and failback. This is commonly achieved with a combined use of load balancers and connection pooling.
Load balancing (LB) ensures:
- a balanced client connection distribution across all-peer CockroachDB cluster nodes
- client connections routing only to healthy nodes of the cluster
- rebalances client connections upon a database cluster size or topology change, in orchestration with connection pool settings
Connection pooling (CP) serves these important functions:
- alleviates the open/close connection overhead
- provides workload concurrency governance
- rebalances client connections upon a database cluster size or topology change, in orchestration with load balancer configuration
A combination of a connection pool and a load balancer is a commonly used, proven architectural design pattern. However it is not the only valid approach. Other viable approaches exist - for example a database driver- based load balancing and failover/failback can be implemented.
CockroachDB supports Layer 4 load balancing, which means TCP/IP connection is established directly between the client (the application or connection pool) and a CockroachDB node. I.e. the application connection terminates at a CockroachDB node.
The load balancers are typically configured with "round-robin" algorithm, which is the simplest, most effective and most tested.
Upon a new cluster connection request, the load balancer selects a healthy node in the region (as determined by /health?ready=1
) for the client to connect to. While the "round-robin" load balancing policy attempts to spread the connections evenly, there is no guarantee of a connections balance because the load balancer is not aware of the node's capacity to process the workload and because clients can close connections. These realities may lead to client connections imbalance. See connection balancing mitigation techniques section.
This section includes a real world configuration example of the load balancer tier in front of a a CockroachDB cluster spanning 3 geographically dispersed data centers (DC1, DC2, DC3) with 3 CockroachDB nodes in each data center. That configuration is appropriate for the specific service continuity requirements and it's included merely as a practical example of one possible configuration. This example does not rise to a "recommended" or a "reference" level due to great variations in the realities of deployment platforms and service continuity requirements.
In the diagram the HAProxies are in blue:
Configuration highlights:
- Application instance in each DC connects to local HAProxies, 2 instances for redundancy.
- In an event one server in a DC fails, HAProxy load balances database traffic to the remaining CRDB servers in the same DC
- In an event all CRDB servers in a DC fail, HAProxy in that DC sends the database traffic to ALL surviving CRDB nodes in ALL OTHER DCs (option
allbackups
) - The connections between the application and redundant HAProxies in the same DC are load balanced via round-robin DNS (A record).
- In an event an entire DC fails, or in an event the application in a DC is not available, the incoming application traffic is routed to the application(s) in the surviving DCs
The HAProxy configurations:
---------------------------------- DC1 ----------------------------------
global
daemon
maxconn 4096
pidfile /var/run/haproxy.pid
stats socket /var/lib/haproxy/stats mode 660 level admin
nbproc 4
defaults
mode tcp
timeout connect 10s
timeout client 30m
timeout server 30m
option clitcpka
option allbackups
listen psql
log global
bind :26257
mode tcp
balance roundrobin
option httpchk GET /health?ready=1
server cockroach101 cockroach101.DC1 check port 8080
server cockroach102 cockroach102.DC1 check port 8080
server cockroach103 cockroach103.DC1 check port 8080
server cockroach201 cockroach201.DC2 check port 8080 backup
server cockroach202 cockroach202.DC2 check port 8080 backup
server cockroach203 cockroach203.DC2 check port 8080 backup
server cockroach301 cockroach301.DC3 check port 8080 backup
server cockroach302 cockroach302.DC3 check port 8080 backup
server cockroach303 cockroach303.DC3 check port 8080 backup
listen stats
bind 127.0.0.1:9000
mode http
stats enable
stats hide-version
stats uri /haproxy_stat
---------------------------------- DC2 ----------------------------------
global
daemon
maxconn 4096
pidfile /var/run/haproxy.pid
stats socket /var/lib/haproxy/stats mode 660 level admin
nbproc 4
defaults
mode tcp
timeout connect 10s
timeout client 30m
timeout server 30m
option clitcpka
option allbackups
listen psql
log global
bind :26257
mode tcp
balance roundrobin
option httpchk GET /health?ready=1
server cockroach201 cockroach201.DC2 check port 8080
server cockroach202 cockroach202.DC2 check port 8080
server cockroach203 cockroach203.DC2 check port 8080
server cockroach101 cockroach101.DC1 check port 8080 backup
server cockroach102 cockroach102.DC1 check port 8080 backup
server cockroach103 cockroach103.DC1 check port 8080 backup
server cockroach301 cockroach301.DC3 check port 8080 backup
server cockroach302 cockroach302.DC3 check port 8080 backup
server cockroach303 cockroach303.DC3 check port 8080 backup
listen stats
bind 127.0.0.1:9000
mode http
stats enable
stats hide-version
stats uri /haproxy_stat
---------------------------------- DC3 ----------------------------------
global
daemon
maxconn 4096
pidfile /var/run/haproxy.pid
stats socket /var/lib/haproxy/stats mode 660 level admin
nbproc 4
defaults
mode tcp
timeout connect 10s
timeout client 30m
timeout server 30m
option clitcpka
option allbackups
listen psql
log global
bind :26257
mode tcp
balance roundrobin
option httpchk GET /health?ready=1
server cockroach301 cockroach301.DC3 check port 8080
server cockroach302 cockroach302.DC3 check port 8080
server cockroach303 cockroach303.DC3 check port 8080
server cockroach101 cockroach101.DC1 check port 8080 backup
server cockroach102 cockroach102.DC1 check port 8080 backup
server cockroach103 cockroach103.DC1 check port 8080 backup
server cockroach201 cockroach201.DC2 check port 8080 backup
server cockroach202 cockroach202.DC2 check port 8080 backup
server cockroach203 cockroach203.DC2 check port 8080 backup
listen stats
bind 127.0.0.1:9000
mode http
stats enable
stats hide-version
stats uri /haproxy_stat
In a Google cloud deployment, a regional native load balancer is typically provisioned to route connections to CockroachDB nodes in that region. This is an OSI Layer 4 load balancer implemented in the network fabric. It operates across AZs, with AZ loss survivability.
Google Cloud load balancers are typically deployed in the following configuration: internal, regional, in pass-through mode, optionally with global access.
In a AWS cloud deployment, a regional network load balancer with cross zone load balancing enabled (per Cockroach Labs deployment reference) is typically provisioned to route connections to CockroachDB nodes in that region. This load balancer functions at the 4th layer of the OSI model.
CockroachDB operators in AWS cloud should be aware of the following caveats:
Application connections to CockroachDB via AWS NLB can be disrupted by connection reset
errors or timeouts if both client IP preservation and cross zone load balancing are enabled in NLB configuration.
The impact on a user workload is a reduced SQL throughput due to reconnect and re-try overhead. However AWS troubleshooting paragraphs above suggest that application clients can also experience connection timeouts.
The issue is described in this article. AWS provides two troubleshooting paragraphs pertinent to this issue: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html#loopback-timeout https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html#intermittent-connection-failure
The issue can be eliminated only by disabling either client IP preservation or cross zone load balancing.
Since Cockroach Labs only has operating experience with cross zone load balancing enabled (the deployment reference), realistically the operator's decision is whether to disable client IP preservation or not. Disabling client IP preservation will deny the database an opportunity to track client connections (all client SQL connections will be reported as originating at NLB's IP address), which may obstruct effective troubleshooting. Keeping client IP preservation enabled will lead to intermittent performance “dips”, which could be an acceptable in some workloads since there should be no service continuity disruptions.
Cockroach Labs recommends configuring load balancers with 2 seconds health check interval and taking a VM out of rotation upon 3 consecutive health check failures. These settings provide a predictable VM failure detection in less than 10 seconds.
The NLB defaults for VM health checks (see table below) result in a VM failure detection taking up to 30 seconds, which is considered too lose. We understand 5 seconds is the lowest technically feasible value for HC interval and timeout.
To enable the fastest possible VM failure detection (~15 seconds):
- Change both Server Health Check Interval and Check Timeout to 5 seconds
- Keep Unhealthy Threshold at the default 2
The following table summarizes the NLB target Health Check setting values:
NLB Setting | NLB Default Value | CRL Recommended Value | NLB Best (Lowest) Available Value |
---|---|---|---|
Server Health Check Interval | 10 secs | 2 secs | 5 secs |
Unhealthy Threshold | 2 consecutive HC failures | 3 consecutive HC failures | 2 consecutive HC failures |
Check Timeout | 10 secs | 2 secs | 5 secs |
👉 To ensure orderly connections failover during regular rolling maintenance, the Load Balancer settings must be coordinated with CockroachDB Cluster settings. Follow the guidance in the node shutdown article.
The best practices for CockroachDB cluster connection pooling are as follows:
- The size of the connection pool should not exceed 4 (four) times the total vcpus in the cluster, presuming connections are evenly distributed across cluster nodes. Many workloads perform best when the maximum number of active connections is between 2 and 4 times the number of CPU cores in the cluster.
- Configure the pool with the MIN connections = MAX connections = the connection pool size. Adjust when cluster topology changes.
- Do not set tight idle -anything timeouts, e.g. for idle connections in the pool. This is best done in the application tier.
- Set the maximum connection life to around 30 minutes (give and take), per considerations outlined in the connection balancing section.
From a practical standpoint, connections pool products have either in-process (a library linked into application executable) or out-of-process (external process).
In-process pooling is always preferred, whenever architecturally possible. It minimizes the performance overhead overhead and improves the overall reliability of the application environment. However it is only useful for multiple threads in the same process. In-process pooling is a preferred form factor for most traditional stateful application architectures (monolithic, multitier, client-server, etc.). In more modern application architectures, specifically based on stateless micro-services and lambdas, the use of in-process (library style) connection pools is simply not possible.
CockroachDB docs offer code samples for using popular in-process connection pool products.
An external connection pool allows pooling between multiple processes, allowing, among other benefits, an effective workload concurrency governance which is critical with CockroachDB that doesn't offer SQL workload management. The goal for deploying an external connection pool is to provide the app with a complete governance over the total number of connections to the CockroachDB cluster across many application side processes.
The connection management benefits with external pools naturally come at the cost of response time overhead due to an extra network roundtrip for every SQL call and extra operational cost to maintain an additional infrastructure component (monitoring connection pool process, orchestrating failover and failback, etc.)
PgBouncer connection pool is often deployed as a separate tier to aggregate connections, primarily for workload concurrency governance, across many instances of microservices or lambdas.
The sizing method for PgBouncer is as follows:
- An instance of PgBouncer running on 1 core and 2GB RAM handles up to 5,000 incoming (application side) connections and up to 96 outgoing (CockroachDB database) connections
- The total cluster connection sizing rule for PgBouncer is the same general rule - do not exceed 4 times the total vcpus in the cluster
- PgBouncer pools should be deployed in a redundant configuration, e.g. at least 2. However, using many PgBouncer pools defeats the goal of an effective workload concurrency (i.e. connection) governance. Thus a typical number of PgBouncer pools per environment is between 2 and 4.
Below is a real world pgbouncer.ini
example:
[databases]
abcdb_production = host=internal-prod-abc.aws-eu-west-1.cockroachlabs.cloud port=26257 dbname=abcdb_production
abcdb-production = host=internal-prod-abc.aws-eu-west-1.cockroachlabs.cloud port=26257 dbname=abcdb_production
[pgbouncer]
listen_addr = *
listen_port = 26257
unix_socket_dir = /tmp
client_tls_sslmode = require
client_tls_key_file = /etc/pgbouncer/server.key
client_tls_cert_file = /etc/pgbouncer/server.crt
client_tls_ciphers = normal
auth_type = plain
auth_file = /etc/pgbouncer/users.pwd
pool_mode = transaction
server_check_query = SELECT 1
server_check_delay = 5
max_client_conn = 5000
default_pool_size = 96
server_tls_sslmode = require
server_round_robin = 1
server_idle_timeout = 900
idle_transaction_timeout = 60
stats_period = 30
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
Client connections balancing after a cluster expansion, a cluster topology change, or a routine rolling maintenance needs operator's consideration. For instance, an application that maintains a steady number of open connections will not fully benefit from a cluster expansion unless the added nodes get their equal share of client connections. Even after the load balancer's configuration is updated with the references to the new nodes, they will not necessarily receive client connections and will be underutilized.
Here is an example scenario resulting in a connection imbalance:
- The application has many connections, all active most of the time.
- Cluster is orderly expanded; the load balancer configuration is updated to include new nodes.
- New nodes can take their share of client connections.
- However, the new nodes will only get connections when the application (connection pool) asks for new connections and that's not happening because the client side has enough of connection slots in the pool and the pool is reusing the established connections. Client connections stay status quo and new nodes are underutilized.
- And even if the application adds connections, and the load balancer's policy is "round-robin", the imbalance of connections would persist.
To handle this case efficiently and without disorderly connection termination, operators may utilize the following mitigation techniques:
- Configure a connection max lifetime in the connection pool. This will force a periodic connection recycling, maintaining a reasonable connection balance. A
30 minutes
may be a good starting point when fine tuning for specific application requirements. A good practice range for the max connection lifetime is between 10 and 60 minutes. A lower value will allow a more vigorous connection balancing at the expense of a higher re-connect overhead. Less than 10 minutes may add a measurable re-connect overhead (particularly with TLS) with an adverse effects on the workload performance. Over 60 minutes may make connection rebalancing sluggish. - This value selection should be coordinated with the snapshot rebalancing rate setting (or rather measurements of the elapsed time to complete node data rebalancing for the given rate setting), so connections are rebalanced to new nodes reasonably promptly following a cluster expansion.
- Instead of effective, yet simplistic
roundrobin
load balancing algorithm, configure a method that is aware of the current number of server connections, such as HAProxy'sleastconn
, that routs the new connections to nodes with the lowest number of connections. This mitigation technique is not commonly deployed and the potential side effects/scenarios are less studied/understood. This approach is viewed as riskier and requires extra operator's due diligence during QA cycles.
To ensure orderly connection failover, follow the guidance in the node shutdown article. If followed, the connection failover will be orderly and will cause no service interruptions. It requires a coordination between the cluster settings, load balancer settings and connection pooling configuration.
Connection failback is effectively the same topic as connection balancing and therefore does not require special operational provisions.