Binary Protocol: Cancellation & Timeouts #1852

1st1 · 2020-10-02T03:13:17Z

1st1
Oct 2, 2020
Maintainer

Abstract

We propose to add support for cancellation and timeouts right to the EdgeDB binary protocol to be handled by the DB server.

This discussion post will later be converted into an RFC.

Proposal

Timeouts

Handling timeouts at the protocol level makes client binding implementations simpler and smaller. It pushes the complexity to one place -- the database server itself. Judging by the state of code in our Python PostgreSQL driver asyncpg, implementing timeouts correctly and efficiently is not easy.

We propose to add a special header to Execute, Optimistic Execute, Execute Script, and Prepare client messages:

HEADER_QUERY_EXECUTE_TIMEOUT = 0xFF02

The header will contain a 64-bit unsigned integer value, representing the max execution duration in microseconds. If the operation takes longer than the specified time window, the server would abort the query and return a QueryTimeoutError exception to the client.

The "pros" of adding centralized query execute timeouts are:

Less complexity in DB bindings (especially blocking-IO ones).
Timeouts become cheaper on the client side than with framework primitives like asyncio.wait_for(). Essentially clients will only need to compute the corresponding timeouts for Parse and Execute messages.
Built-in timeouts handling in EdgeDB server is needed anyways for cloud ops.

The "cons" are:

Timeouts won't account for client<>server network latencies, but timeouts typically aren't precise anyways (e.g. asyncio timeouts are always skewed by any CPU-bound code).

Along with the per-command timeouts, we propose to introduce system-level and session-level config options to set default max timeouts.

Cancellation

Even with EdgeDB implementing timeouts handling at the server level there are still cases when A client might want to cancel the current query, e.g.:

a long-running query in REPL interrupted by ^C;
an exception in user code aborting current task or thread that is busy waiting for query results;
a connection acquired from a pool wasn't released in time; etc.

Before outlining the proposed protocol changes for EdgeDB, it's worth revisiting how cancellation is implemented in PostgreSQL. In short, to cancel the current operation, the client must open a new connection to the DB and send a special cancel message, essentially signalling that a certain PostgreSQL worker process should abort whatever operation it is performing now. The cancellation either aborts the current operation making it raise an error, or it might have no effect at all, i.e. when the worker process has already finished processing of the request. There are a few problems with this approach: opening a new connection can be time consuming; the client cannot reliably cancel some particular operation. The latter means that it's not possible to implement both reliable query pipelining and reliable cancellation.

The cancellation protocol of PostgreSQL is designed this way to keep the implementation simple. PostgreSQL uses blocking sockets and Unix signals to actually interrupt worker processes. Since EdgeDB itself uses non-blocking sockets and its code is concurrent we can implement a different approach:

Introduce a new OPERATION_ID binary protocol message header that client can attach to operations it might want to cancel. We'll likely limit the header to be valid only for Execute, Optimistic Execute, Execute Script, and Prepare client messages in the beginning. The header will be a 64-bit integer, preferably unique, but the server will not try to enforce that.
Introduce a new CancelRequest protocol message with a single 64-bit integer in it: the ID of the operation the client wants to cancel.
Add a system-level config option to control the max size of the server per-connection read-ahead buffer. The server will use the buffer to read ahead protocol messages and put them in a queue for processing. The buffer will also act as a flow-control mechanism (currently there's none, which has to be fixed anyways).

Essentially, the proposed solution for cancellation in EdgeDB is to use the same network connection to the DB to both make queries and to cancel them. A well-behaved client should maintain a unique counter of operations and send a unique ID along with every request. When the client want to cancel some operation it would send a CancelRequest message along with the operation ID. The server will then do one of the following:

If the read-ahead buffer has only the CancelRequest message and there's no ongoing operation the request is ignored.
If the read-ahead buffer already has messages in it, the server will note the requested cancellation ID and keep processing the messages. If there's a message with matching ID the server replies to the client with OperationCancelledError for all messages up until the buffered CancelRequest client message.

The proposed design has the following advantages over the PostgreSQL approach:

The client is always certain which specific operation it cancels.
It's possible to implement deterministic pipelining and cancellation for EdgeDB clients.
Cancellation will be a cheaper operation in EdgeDB (no new connection necessary).

tailhook · 2020-10-02T14:46:48Z

tailhook
Oct 2, 2020

Timeouts

Handling timeouts at the protocol level makes client binding implementations simpler and smaller. It pushes the complexity to one place -- the database server itself. Judging by the state of code in our Python PostgreSQL driver asyncpg, implementing timeouts correctly and efficiently is not easy.

This spec doesn't solve client-side timeouts problem at all. Because timeouts defined here not only "won't account for client<>server network latencies" but also don't account cases when hardware is down or network between client and server is disconnected. In this cases client can wait for a next network packet for hours without a proper timeout client-side (i.e. merely a power flip that causes machine to reboot would not send "connection reset" message until another packet on that connection comes in).

Another note is that timeouted request should do transaction retry described in RFC1004

Cancellation

This spec doesn't describe what happens when cancelling a query in the transaction. Does transaction switch into a failed state? (I think it is)
I think there is a flaw in the protocol: if CancelRequest is sent when response for this specific ID is already in-flight, server has to store that ID forever (or even worse until that ID is reused).

Overall, I'm -1 on this spec on cancellation for two reasons:

Closing connection is the simplest and most obvious way to cancel a request.
If for canceling a request client needs to send another message and wait for the response, then probably rollback a transaction (another round-trip, or multiple since we tend to Prepare/Execute all statements including ROLLBACK) it's unclear what the benefits of this comparing to reconnect.
If you're arguing that timeouts are hard on blocking connection, then cancellation is harder. And async code cancellation is even harder because interrupting a chunk of code (and you have to interrupt at least the procedure that receives the response) and having consistent connection state (i.e. not sending a message in the middle of another) requires very careful bookkeeping. This includes the case that even if CancelRequest was late and response already sent, client probably needs to rollback anyway, since exception is already happened there and there is no way to continue original code path.

I think in postgres cancellation is done this way because it's basically impossible to detect closed connection while postgres is doing some disk/CPU work.

So I think here is what should be done instead:

Making connection close as cheap as cancellation on the server side (i.e. reusing compiler, if it is not hanging at the moment, cancel postgres query, etc.)
Making establishing connections as cheap as possible without breaking security: pooling compiler processes, use TLS session resumption when we support TLS. Password checking which is 0.1-1s should be acceptable trade-off in both cases when timeout happens (which is going to be on the magnitude of seconds) and when one needs to cancel a request.

6 replies

1st1 Oct 2, 2020
Maintainer Author

I think there is a flaw in the protocol: if CancelRequest is sent when response for this specific ID is already in-flight, server has to store that ID forever (or even worse until that ID is reused).

This should be fixable if the client always sends increasingly greater IDs. The good behavior of the client can easily be enforced on the server side. And if IDs are strictly incrementing, the server will only need to store one last cancellation request (until it receives an op with a greater ID).

1st1 Oct 2, 2020
Maintainer Author

This spec doesn't solve client-side timeouts problem at all. Because timeouts defined here not only "won't account for client<>server network latencies" but also don't account cases when hardware is down or network between client and server is disconnected. In this cases client can wait for a next network packet for hours without a proper timeout client-side (i.e. merely a power flip that causes machine to reboot would not send "connection reset" message until another packet on that connection comes in).

I see your point, but it's a problem both ways. If the network is down the way you describe it could make the server to run some reporting query "for hours" until it becomes apparent that client wanted the operation to be cancelled. Maybe in ideal scenario timeouts should be implemented on both sides? I.e. when a client wants to execute a query with a timeout, it would signal that to the server, but then would also enforce the timeout on its side.

elprans Oct 2, 2020
Maintainer

Closing connection is the simplest and most obvious way to cancel a request.

Like Yury pointed out above, this poses a slight problem with respect to session state, which we will have only more of. For example, current REPL implementation drops connection on ^C, which trips me up constantly because it blows away module alias setup in the session. For application code it's even more complicated, because now you need a way to "replay" your session setup.

That said, connections may drop on their own due to other reasons, so we probably actually need to properly design for unreliable connections and support session setup "replay" anyway. I'm thinking some way of "dump/restore" for session state would probably be the most robust way of doing this.

1st1 Oct 2, 2020
Maintainer Author

That said, connections may drop on their own due to other reasons

How probable are those "connection drops due to other reasons"? Don't forget about the context here: Paul us suggesting that cancellation should always break the connection, so it would make disconnects very common. As opposed to them being extremely uncommon if we have proper cancellation.

I don't think we should implement some elaborate state dump/restore, this seems to me like a pure client-side design problem. Essentially just have your application middleware clearly designed.

tailhook Oct 5, 2020

This should be fixable if the client always sends increasingly greater IDs.

Yes, something along the lines of that.

How probable are those "connection drops due to other reasons"?

Well explicit cancellation could happen in these cases:

Request is unexpectedly too slow (most probably a bug in the application)
Application client don't need result any more

Implicit cancellation happen:

When connection is dropped due to network reconfiguration
When timeout or application client doesn't need result and that exception happens in the middle of request/response
When application restarts
When application crashes
When primay/replica of postgres changes

Generally most common reason should be (2) from both lists if the application framework supports that which not every framework supports (especially blocking ones are almost never do that). And even if that support is there I'm not sure how often that interrupt would be in the middle of the request/response.

Like Yury pointed out above, this poses a slight problem with respect to session state, which we will have only more of.

I think we got some preliminary agreement in the last meeting that this has to be fixed anyway. I think we can discuss this more after API for session parameters is settled up.

That said, connections may drop on their own due to other reasons, so we probably actually need to properly design for unreliable connections and support session setup "replay" anyway.

Yes. This was my implicit assumption. As we have async code in edgedb it's easy enough to detect closed connection and do the same thing we need to cancel request even without a cancellation message from the client.

dmgolembiowski · 2020-10-04T06:42:29Z

dmgolembiowski
Oct 4, 2020

I think there is a flaw in the protocol: if CancelRequest is sent when response for this specific ID is already in-flight, server has to store that ID forever (or even worse until that ID is reused).

This should be fixable if the client always sends increasingly greater IDs. The good behavior of the client can easily be enforced on the server side. And if IDs are strictly incrementing, the server will only need to store one last cancellation request (until it receives an op with a greater ID).

I think I may have come across a possible source for the increasingly greater IDs while porting the EdgeQL compiler from Python to Rust for edgemorph. Consider this Python snippet from edb's common compiler file :

class SimpleCounter:
    counts: DefaultDict[str, int]

    def __init__(self) -> None:
        self.counts = collections.defaultdict(int)

    def nextval(self, name: str = 'default') -> int:
        self.counts[name] += 1
        return self.counts[name]


class AliasGenerator(SimpleCounter):
    def get(self, hint: str = '') -> str:
        if not hint:
            hint = 'v'
        m = re.search(r'~\d+$', hint)
        if m:
            hint = hint[:m.start()]

        idx = self.nextval(hint)
        alias = f'{hint}~{idx}'

        return alias

For the rewrite, part of the process is doing the nitty-gritty work of re-implementing defaultdict and other boilerplate code. (Man, sometimes I wish all programming languages could be as nice as Python...) Everything seems to go fine in at first. Nothing suspicious:

use std::collections::HashMap;
use std::ops::AddAssign;
use regex::{Regex, Match};

#[derive(Debug)
struct Count(u32);

impl Default for Count {
    fn default() -> Count {
        Count(0 as u32)
    }
}
impl AddAssign for Count {
    fn add_assign(&mut self, other: Self) {
        self.0 += other.0
    }
}

#[derive(Debug)]
struct AliasGenerator {
    counts: HashMap<String, Count>
}

trait SimpleCounter {
    fn new() -> Self;
    fn next_val(&mut self, name: &str) -> u32;
}

impl SimpleCounter for AliasGenerator {
    
    fn new() -> AliasGenerator {
        AliasGenerator { counts: HashMap::<String, Count>::new() }
    }

    fn next_val(&mut self, name: &str) -> u32 {
        let Self { counts } = self;
        counts.entry(name.to_string())
            .and_modify(|e| { *e += Count(1 as u32) })
            .or_insert(Count(1 as u32)).0
    }
}

until we get to the implementation section of the AliasGenerator::get which acts like the contrapositive to the Python implementation from earlier.

impl AliasGenerator {
    pub fn get(&mut self, hint: Option<&str>) -> String {
        let m: Option<Match>;
        let index: u32;
        let alias: String;
        
        // Here is where the implied `if hint:` branch would be in the corresponding Python file:

        if let Some(mut alias_hint) = hint {
            m = Regex::new(r"~\d+$")
                    .unwrap()
                    .find(&alias_hint);

            match m {
                Some(mat) => {
                    alias_hint = &alias_hint[mat.start()..];
                    index   = self.next_val(&alias_hint);
                    alias = format!("{hint}~{index}", hint=alias_hint, index=index);
                    return alias;
                },
                None => {
                    index = self.next_val(&"v");
                    alias = format!("{hint}~{index}", hint=alias_hint, index=index);
                    return alias;
                }
            }
        }

        // And here is where we'll encounter the equivalent `if not hint` check:
        /*
           (... snip ...)
        */
    }
}

If we examine cases where hint has already been None at least once already, and this particular call supplies an empty hint argument, something interesting emerges. The assignment m = re.search(r'~\d+$', hint) will reduce to m = None in Python. And for the remaining statements:

        if m:
            hint = hint[:m.start()]

        idx = self.nextval(hint)
        alias = f'{hint}~{idx}'

        return alias

we have the locals m = None and hint = 'v', but what of idx? Using this alternative representation, we expose:

               (... snip ...)
                None => {
                    index = self.next_val(&"v");
                    alias = format!("{hint}~{index}", hint=alias_hint, index=index);
                    return alias;
                }
            }
        }

        //  Here is an equivalent `if not hint` check:
        // No need to do anything to `m` since it's already guaranteed to be `None`.
        index = self.next_val(&"v");

        // But now `self.next_val` does a +1 to the empty string from `hint`,
        // we create an entirely new alias "v~${count}"
        alias = format!("{hint}~{index}", hint=alias_hint, index=index);
        alias
    }
}

Notice: It makes a call to self.nextval(hint) and inadvertently +1 the bad hint string. If this continues, I suspect it will build an increasingly large value in the defaultdict due to repeated calls on self.nextval(hint).

In Rust, this behavior can be avoided by completely replacing those last three lines with a static format!("v~{index}", index=1 as u32) which is what I'm assuming we want to happen. That is, we shouldn't want an increasingly large alias value mapping. Instead, there can just be a single one for the empty base case -- and then have that one never change. All told, the get method becomes:

impl AliasGenerator {
    pub fn get(&mut self, hint: Option<&str>) -> String {
        let m: Option<Match>;
        let index: u32;
        let alias: String;
    
        if let Some(mut alias_hint) = hint {
            m = Regex::new(r"~\d+$")
                    .unwrap()
                    .find(&alias_hint);

            match m {
                Some(mat) => {
                    alias_hint = &alias_hint[mat.start()..];
                    index   = self.next_val(&alias_hint);
                    alias = format!("{hint}~{index}", hint=alias_hint, index=index);
                    return alias;
                },
                None => {
                    index = self.next_val(&"v");
                    alias = format!("{hint}~{index}", hint=alias_hint, index=index);
                    return alias;
                }
            }
        }
        format!("v~{index}", index=1 as u32)
    }
}

And while I wish this was enough to explain away your troubles, I don't have the familiarity with this repository to know how intertwined certain things are. And yet, I can't help but wonder if my observation is linked with some more insidious behavior at the lower level of the protocol, like in line 644 from edgecon.pyx. What seems more likely is that a logical check like this one exists out in this repository and is causing undefined behavior.

2 replies

tailhook Oct 5, 2020

Enumeration of aliases is a different thing than identifying incoming packets. Latter is not yet implemented at all. Do you have any issues with the current protocol?

dmgolembiowski Oct 6, 2020

Enumeration of aliases is a different thing than identifying incoming packets. Latter is not yet implemented at all. Do you have any issues with the current protocol?

As a student of the edgedb-protocol crate, no, no issues with the current implementation; I'm still studying the source to learn what's going on.
But as a user of EdgeDB, I'm burdened by the same issue Elvis was mentioning:

Closing connection is the simplest and most obvious way to cancel a request.

Like Yury pointed out above, this poses a slight problem with respect to session state, which we will have only more of. For example, current REPL implementation drops connection on ^C, which trips me up constantly because it blows away module alias setup in the session. For application code it's even more complicated, because now you need a way to "replay" your session setup.

That said, connections may drop on their own due to other reasons, so we probably actually need to properly design for unreliable connections and support session setup "replay" anyway. I'm thinking some way of "dump/restore" for session state would probably be the most robust way of doing this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary Protocol: Cancellation & Timeouts #1852

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Binary Protocol: Cancellation & Timeouts #1852

1st1 Oct 2, 2020 Maintainer

Abstract

Proposal

Timeouts

Cancellation

Replies: 2 comments · 8 replies

tailhook Oct 2, 2020

Timeouts

Cancellation

1st1 Oct 2, 2020 Maintainer Author

1st1 Oct 2, 2020 Maintainer Author

elprans Oct 2, 2020 Maintainer

1st1 Oct 2, 2020 Maintainer Author

tailhook Oct 5, 2020

dmgolembiowski Oct 4, 2020

tailhook Oct 5, 2020

dmgolembiowski Oct 6, 2020

1st1
Oct 2, 2020
Maintainer

Replies: 2 comments 8 replies

tailhook
Oct 2, 2020

1st1 Oct 2, 2020
Maintainer Author

1st1 Oct 2, 2020
Maintainer Author

elprans Oct 2, 2020
Maintainer

1st1 Oct 2, 2020
Maintainer Author

dmgolembiowski
Oct 4, 2020