Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions src/pages/docs/platform/architecture/edge-network.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Behind CloudFront, each Ably region employs AWS Network Load Balancers to distri

Ably uses DNS-based latency routing to direct clients to the nearest available datacenter. The primary endpoints for client connections and HTTP requests is `main.realtime.ably.net`.

When a client performs a DNS lookup for this endpoint, the DNS service resolves to the closest datacenter, among those that are currently enabled, to the client's location. This latency-based routing ensures that clients connect to the datacenter with the lowest network latency, maximising the responsiveness of the service.
When a client performs a DNS lookup for this endpoint, the DNS service resolves to the closest datacenter, among those that are currently enabled, to the client's location. This latency-based routing ensures that clients connect to the datacenter with the lowest network latency, maximizing the responsiveness of the service.

Ably's DNS configuration uses a TTL of 60 seconds, allowing for relatively quick rerouting of traffic if a datacenter becomes unhealthy. The health of each datacenter is continuously monitored, and if issues are detected, Ably can modify the DNS routing to direct traffic away from the affected datacenter within minutes.

Expand Down Expand Up @@ -109,7 +109,11 @@ Ably's edge network is designed to provide low-latency connectivity from virtual

The geographic distribution of these access points is continuously reviewed and optimized based on traffic patterns and customer requirements. As Ably's user base expands into new regions, additional datacenters can be added to maintain optimal performance.

In regions such as China with unique connectivity challenges, Ably has implemented specific strategies to ensure reliable service. These may include partnerships with local providers, alternative routing arrangements, and region-specific optimizations.
#### China connectivity

Ably's service works in China, with customers successfully using the platform to serve users globally, including within China. The global network with over 700 edge locations provides connectivity to Chinese users.

China operates a national firewall that can block access to foreign websites and services without notice. While Ably is not currently aware of any blocks affecting its services, potential firewall changes could impact service availability. Ably has implemented specific strategies to ensure reliable service in China, including partnerships with local providers, alternative routing arrangements, and region-specific optimizations.

### Protocol support and transport optimization

Expand Down
39 changes: 23 additions & 16 deletions src/pages/docs/platform/architecture/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@ Ably's globally distributed infrastructure forms the foundation of the platform,

Ably characterizes the system across [4 pillars of dependability](https://ably.com/four-pillars-of-dependability):

* **Performance**: Ably focuses on predictability of latencies to provide certainty in uncertain operating conditions.
* Performance: Focuses on predictability of latencies to provide certainty in uncertain operating conditions.
* `<30ms` round trip latency within datacenter (99th percentile)
* `<65ms` global round trip latency (99th percentile)
* **Integrity**: Guarantees for message ordering and delivery.
* Integrity: Guarantees for message ordering and delivery.
* Exactly-once delivery semantics
* Guaranteed message ordering from publishers to subscribers
* Automatic connection recovery with message continuity
* **Reliability**: Fault tolerant architecture at regional and global levels to survive multiple failures without outages.
* Reliability: Fault tolerant architecture at regional and global levels to survive multiple failures without outages.
* 100% message delivery guarantee through multi-region redundancy
* 99.999999% message survivability
* 99.99999999% persisted data survivability
* Edge network failure resolution by the client SDKs within 30s
* Automated routing of all traffic away from an abrupt failure of datacenter in less than two minutes
* **Availability**: Meticulously designed to provide continuity of service even in the case of instance or whole datacenter failures.
* Availability: Designed to provide continuity of service even in the case of instance or whole datacenter failures.
* 99.999% global service availability (5 minutes 15 seconds of downtime per year)
* 50% global capacity margin for instant demand surges

Expand All @@ -36,12 +37,12 @@ Ably's platform is a global service that supports all realtime messaging and ass

The platform has been designed with the following primary objectives in mind:

* **Horizontal scalability**: As more nodes are added, load is automatically redistributed across the cluster so that global capacity increases linearly with the number of instances Ably runs.
* **No single point of congestion**: As the system scales, there is no single point of congestion for any data path, and data within the system is routed peer-to-peer, ensuring no single component becomes overloaded as traffic scales for an individual app or across the cluster.
* **Fault tolerance**: Faults in the system are expected, and the system must have redundancy at every layer in the stack to ensure availability and reliability.
* **Autonomy**: Each component in the system should be able to operate fully without reliance on a global controller. For example, two isolated data centers should continue to service realtime requests while isolated.
* **Consistent low latencies**: Within data centers, Ably aims for latencies to be in the low 10s of milliseconds and less than 100ms globally. Consistently achieving low latencies requires careful consideration of the placement of data and services across the system as well as prioritisation of the computation performed by each service.
* **Quality of service**: Ably intentionally designs for high QoS targets to enable sophisticated realtime applications that would be impossible on platforms with weaker guarantees.
* Horizontal scalability: As more nodes are added, load is automatically redistributed across the cluster so that global capacity increases linearly with the number of instances.
* No single point of congestion: As the system scales, there is no single point of congestion for any data path, and data within the system is routed peer-to-peer, ensuring no single component becomes overloaded as traffic scales for an individual app or across the cluster.
* Fault tolerance: Faults in the system are expected, and the system must have redundancy at every layer in the stack to ensure availability and reliability.
* Autonomy: Each component in the system should be able to operate fully without reliance on a global controller. For example, two isolated data centers should continue to service realtime requests while isolated.
* Consistent low latencies: Within data centers, latencies are in the low 10s of milliseconds and less than 100ms globally. Consistently achieving low latencies requires careful consideration of the placement of data and services across the system as well as prioritisation of the computation performed by each service.
* Quality of service: Designed for high QoS targets to enable sophisticated realtime applications that would be impossible on platforms with weaker guarantees.

## Cluster architecture

Expand All @@ -51,10 +52,10 @@ Each regional deployment operates independently, handling its own subscriber con

Ably's architecture consists of four primary layers:

* **Routing Layer**: Provides intelligent, latency optimized routing for robust end client connectivity.
* **Gossip Layer**: Distributes network topology information and facilitates service discovery.
* **Frontend Layer**: Handles REST requests and maintains realtime connections (such as WebSocket, Comet and SSE).
* **Core Layer**: Performs all central message processing for channels.
* Routing Layer: Provides intelligent, latency optimized routing for robust end client connectivity.
* Gossip Layer: Distributes network topology information and facilitates service discovery.
* Frontend Layer: Handles REST requests and maintains realtime connections (such as WebSocket, Comet and SSE).
* Core Layer: Performs all central message processing for channels.

![Ably Architecture Overview Diagram](../../../../images/content/diagrams/architecture-overview.png)

Expand Down Expand Up @@ -133,9 +134,15 @@ Nodes in the core layer are responsible for all channel message processing and p

Messages are persisted in multiple locations to ensure that message availability and continuity are maintained even during individual node or data center failures.

Ably provides a 100% message delivery guarantee through its multi-layered durability strategy:
* Every message is stored in RAM on two or more physically isolated data centers within the receiving region.
* Every message is additionally stored in RAM in at least one other region, bringing the total to at least three copies.
* For persisted messages, storage across three regions is required before the message is deemed successfully stored.
* If durability requirements cannot be met, the message is rejected and the client is notified to retry.

Once a message is acknowledged, it is stored in multiple physical locations, providing statistical guarantees of 99.999999% (8 nines) for message availability and survivability. This redundancy enables Ably to maintain its quality of service guarantees even during infrastructure failures.

Messages are stored in two ways:

* **Ephemeral Storage**: Messages are held for 2 minutes in an in-memory database (Redis). This data is distributed according to Ably's consistent hashing mechanism and relocated when channels move between nodes. This short-term storage enables low-latency message delivery and retrieval and supports features like [connection recovery](/docs/connect/states).
* **Persisted Storage**: Messages can optionally be stored persistently on disk if longer term retention is required. Ably uses a globally distributed and clustered database (Cassandra) for this purpose, deployed across multiple data centers with message data replicated to three regions to ensure integrity and availability even if a region fails.
* Ephemeral Storage: Messages are held for 2 minutes in an in-memory database (Redis). This data is distributed according to the consistent hashing mechanism and relocated when channels move between nodes. This short-term storage enables low-latency message delivery and retrieval and supports features like [connection recovery](/docs/connect/states).
* Persisted Storage: Messages can optionally be stored persistently on disk if longer term retention is required. Uses a globally distributed and clustered database (Cassandra) for this purpose, deployed across multiple data centers with message data replicated to three regions to ensure integrity and availability even if a region fails.
51 changes: 50 additions & 1 deletion src/pages/docs/platform/architecture/performance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The first objective is minimizing latency and latency variance for message deliv

The second objective is maximizing single-channel throughput in terms of both message rate and bandwidth. This ensures that even when a channel handles high volumes of traffic, such as during live events or peak usage periods, the system can efficiently process and distribute messages without degradation.

Ably achieves a [**global mean latency of 37ms**](/docs/platform/architecture/latency) to provide the best possible realtime experience to your users.
Ably achieves a [global mean latency of 37ms](/docs/platform/architecture/latency).

Beyond average latency, Ably focuses on the performance of the slowest percentiles of messages (p95, p99, p99.9) to ensure consistent performance for all messages. These tail latencies often reveal performance issues that might be hidden by average measurements.

Expand Down Expand Up @@ -58,3 +58,52 @@ Ably manages the capacity of all elements of its infrastructure — both message
This proactive capacity management involves predictive scaling where capacity is adjusted ahead of anticipated demand changes, headroom maintenance where systems operate with sufficient margin to absorb spikes, and resource balancing where workloads are distributed to optimize utilization across the infrastructure.

Quality of service mechanisms include traffic prioritization where critical messages receive preferential treatment, fair usage enforcement to prevent any single client from monopolizing resources, graceful degradation so that under extreme load, system behavior remains predictable, and backpressure signaling where clients receive early warnings when approaching limits.

## Debugging performance issues

If you experience slow REST requests, several factors could be contributing:

### Routing verification
Ably uses latency-based routing, but requests may occasionally be routed to a suboptimal datacenter. To check your routing:

```bash
curl http://rest.ably.io/404
```

The 404 response includes a server ID containing the AWS region. Verify this matches your expected closest region.

### Connection overhead analysis
HTTPS connections require TLS handshakes that can compound latency issues. To measure connection timing:

```bash
curl -o /dev/null -s -w "time_namelookup: %{time_namelookup}\n time_connect: %{time_connect}\n time_appconnect: %{time_appconnect}\n time_pretransfer: %{time_pretransfer}\n time_starttransfer: %{time_starttransfer}\n time_total: %{time_total}\n" https://rest.ably.io/time
```

The `time_appconnect` value shows SSL handshake duration. Compare with the non-SSL endpoint to isolate TLS overhead.

### Network access verification
For comprehensive connection diagnostics, use this verbose curl command to analyze the complete TLS handshake and connection process:

```bash
curl -v -s -w "\n time_namelookup: %{time_namelookup}\n \
time_connect: %{time_connect}\n \
time_appconnect: %{time_appconnect}\n \
time_pretransfer: %{time_pretransfer}\n \
time_redirect: %{time_redirect}\n \
time_starttransfer: %{time_starttransfer}\n \
time_total: %{time_total}\n" \
https://rest.ably.io/time
```

This command provides detailed output including:
- TLS handshake steps and certificate verification
- HTTP headers including server ID (`x-ably-serverid`)
- Connection timing breakdown for each phase
- Current server time as verification of successful communication

The timing statistics help identify where connection delays occur, while the verbose TLS output reveals certificate issues or network routing problems.

### REST vs Realtime performance
REST publishes have inherently higher latency than realtime publishes due to per-request authentication and capability checking. For low-latency requirements, realtime client libraries are recommended as they maintain persistent WebSocket connections, performing SSL handshaking and authentication only once rather than per-message.

For comprehensive performance monitoring, view [Ably's global latency statistics](/docs/platform/architecture/latency).
3 changes: 1 addition & 2 deletions src/pages/docs/protocols/mqtt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,13 @@ The Ably MQTT protocol adapter is able to translate back and forth between [MQTT

## When to use the MQTT adapter <a id="when-to-use"/>

The MQTT adapter is our recommended way of interacting with Ably from devices which do not have a native Ably SDK, such as Arduino platforms, C/C++ applications, and so on. Anyone who has previously been using Pubnub or Pusher SDKs for this purpose may want to consider switching to MQTT. Compared to the Pubnub protocol, using MQTT will result in better performance with lower latency. Compared to the Pusher protocol, MQTT will give you connection state recovery.
The MQTT adapter is our recommended way of interacting with Ably from devices which do not have a native Ably SDK, such as Arduino platforms, C/C++ applications, and so on. Anyone who has previously been using Pubnub or Pusher SDKs for this purpose may want to consider switching to MQTT. Compared to the Pubnub protocol, using MQTT will result in better performance with lower latency. The MQTT adapter typically adds only 1-10ms of latency. Compared to the Pusher protocol, MQTT will give you connection state recovery.

Behind the scenes, the adapter just uses the normal Ably service, so there is no problem with using MQTT and Ably SDKs side by side. You can mix and match as you like; for example, using MQTT on your IoT devices, but using the Ably Realtime API on your servers.

MQTT is recommended to interact with Ably when:

* Bandwidth is limited and you want to keep network traffic to a minimum
* You want to keep network traffic to a minimum.

## Known limitations <a id="limitations"/>

Expand Down
2 changes: 1 addition & 1 deletion src/pages/docs/protocols/pubnub.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ languages:

Ably enables migration from PubNub to Ably using its PubNub Adapter. The protocol adapter handles all background translation and only requires an API key change.

Using an adapter introduces some latency and is slower than using an Ably SDK, however the impact is typically in the low milliseconds. Some operations are quick with PubNub, but slower or impossible with Ably, and vice versa.
Using an adapter introduces some latency and is slower than using an Ably SDK. The PubNub adapter can have more variable latency than other adapters because PubNub's protocol is inherently long-polling based, which creates an impedance mismatch with Ably's WebSocket-based architecture. Some operations are quick with PubNub, but slower or impossible with Ably, and vice versa.

Many of the advantages associated with using Ably, such as the use of WebSockets rather than long polling, [continuity guarantees](https://ably.com/four-pillars-of-dependability), and fallback host support are only available when using an Ably SDK. If an [Ably SDK](/docs/sdks) is available in your chosen platform, it is recommended you use that, or plan to transition to it eventually.

Expand Down
2 changes: 1 addition & 1 deletion src/pages/docs/protocols/pusher.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ languages:

Ably enables migration from Pusher to Ably using its Pusher Adapter. The protocol adapter handles all background translation and only requires an API key change.

Using an adapter introduces some latency and is slower than using an Ably SDK, however the impact is typically in the low milliseconds. It will also be slightly slower than using Pusher natively, but only if you are close to whichever Pusher data center used. If you aren't close to the Pusher data center you've chosen, then the extra latency from using the adapter should be more than compensated for by being able to use a data center that is close to you. This is because Ably automatically connects clients to the data center closest to them.
Using an adapter introduces some latency and is slower than using an Ably SDK, however the impact is typically 1-10ms. It will also be slightly slower than using Pusher natively, but only if you are close to whichever Pusher data center used. If you aren't close to the Pusher data center you've chosen, then the extra latency from using the adapter should be more than compensated for by being able to use a data center that is close to you. This is because Ably automatically connects clients to the data center closest to them.

The Pusher Adapter provides some of the advantages of Ably, such as inter-regional message federation, however others, such as [continuity guarantees](https://ably.com/four-pillars-of-dependability), fallback host support, and [message history](/docs/storage-history/history) are only available when using an Ably SDK. If an [Ably SDK](/docs/sdks) is available in your chosen platform, it is recommended you use that, or plan to transition to it eventually.

Expand Down