Replies: 2 comments 8 replies
-
Timeouts
This spec doesn't solve client-side timeouts problem at all. Because timeouts defined here not only "won't account for client<>server network latencies" but also don't account cases when hardware is down or network between client and server is disconnected. In this cases client can wait for a next network packet for hours without a proper timeout client-side (i.e. merely a power flip that causes machine to reboot would not send "connection reset" message until another packet on that connection comes in). Another note is that timeouted request should do transaction retry described in RFC1004 Cancellation
Overall, I'm -1 on this spec on cancellation for two reasons:
I think in postgres cancellation is done this way because it's basically impossible to detect closed connection while postgres is doing some disk/CPU work. So I think here is what should be done instead:
|
Beta Was this translation helpful? Give feedback.
-
I think I may have come across a possible source for the increasingly greater IDs while porting the EdgeQL compiler from Python to Rust for edgemorph. Consider this Python snippet from edb's common compiler file : class SimpleCounter:
counts: DefaultDict[str, int]
def __init__(self) -> None:
self.counts = collections.defaultdict(int)
def nextval(self, name: str = 'default') -> int:
self.counts[name] += 1
return self.counts[name]
class AliasGenerator(SimpleCounter):
def get(self, hint: str = '') -> str:
if not hint:
hint = 'v'
m = re.search(r'~\d+$', hint)
if m:
hint = hint[:m.start()]
idx = self.nextval(hint)
alias = f'{hint}~{idx}'
return alias For the rewrite, part of the process is doing the nitty-gritty work of re-implementing use std::collections::HashMap;
use std::ops::AddAssign;
use regex::{Regex, Match};
#[derive(Debug)
struct Count(u32);
impl Default for Count {
fn default() -> Count {
Count(0 as u32)
}
}
impl AddAssign for Count {
fn add_assign(&mut self, other: Self) {
self.0 += other.0
}
}
#[derive(Debug)]
struct AliasGenerator {
counts: HashMap<String, Count>
}
trait SimpleCounter {
fn new() -> Self;
fn next_val(&mut self, name: &str) -> u32;
}
impl SimpleCounter for AliasGenerator {
fn new() -> AliasGenerator {
AliasGenerator { counts: HashMap::<String, Count>::new() }
}
fn next_val(&mut self, name: &str) -> u32 {
let Self { counts } = self;
counts.entry(name.to_string())
.and_modify(|e| { *e += Count(1 as u32) })
.or_insert(Count(1 as u32)).0
}
} until we get to the implementation section of the impl AliasGenerator {
pub fn get(&mut self, hint: Option<&str>) -> String {
let m: Option<Match>;
let index: u32;
let alias: String;
// Here is where the implied `if hint:` branch would be in the corresponding Python file:
if let Some(mut alias_hint) = hint {
m = Regex::new(r"~\d+$")
.unwrap()
.find(&alias_hint);
match m {
Some(mat) => {
alias_hint = &alias_hint[mat.start()..];
index = self.next_val(&alias_hint);
alias = format!("{hint}~{index}", hint=alias_hint, index=index);
return alias;
},
None => {
index = self.next_val(&"v");
alias = format!("{hint}~{index}", hint=alias_hint, index=index);
return alias;
}
}
}
// And here is where we'll encounter the equivalent `if not hint` check:
/*
(... snip ...)
*/
}
} If we examine cases where if m:
hint = hint[:m.start()]
idx = self.nextval(hint)
alias = f'{hint}~{idx}'
return alias we have the locals (... snip ...)
None => {
index = self.next_val(&"v");
alias = format!("{hint}~{index}", hint=alias_hint, index=index);
return alias;
}
}
}
// Here is an equivalent `if not hint` check:
// No need to do anything to `m` since it's already guaranteed to be `None`.
index = self.next_val(&"v");
// But now `self.next_val` does a +1 to the empty string from `hint`,
// we create an entirely new alias "v~${count}"
alias = format!("{hint}~{index}", hint=alias_hint, index=index);
alias
}
} Notice: It makes a call to In Rust, this behavior can be avoided by completely replacing those last three lines with a static impl AliasGenerator {
pub fn get(&mut self, hint: Option<&str>) -> String {
let m: Option<Match>;
let index: u32;
let alias: String;
if let Some(mut alias_hint) = hint {
m = Regex::new(r"~\d+$")
.unwrap()
.find(&alias_hint);
match m {
Some(mat) => {
alias_hint = &alias_hint[mat.start()..];
index = self.next_val(&alias_hint);
alias = format!("{hint}~{index}", hint=alias_hint, index=index);
return alias;
},
None => {
index = self.next_val(&"v");
alias = format!("{hint}~{index}", hint=alias_hint, index=index);
return alias;
}
}
}
format!("v~{index}", index=1 as u32)
}
} And while I wish this was enough to explain away your troubles, I don't have the familiarity with this repository to know how intertwined certain things are. And yet, I can't help but wonder if my observation is linked with some more insidious behavior at the lower level of the protocol, like in line 644 from edgecon.pyx. What seems more likely is that a logical check like this one exists out in this repository and is causing undefined behavior. |
Beta Was this translation helpful? Give feedback.
-
Abstract
We propose to add support for cancellation and timeouts right to the EdgeDB binary protocol to be handled by the DB server.
This discussion post will later be converted into an RFC.
Proposal
Timeouts
Handling timeouts at the protocol level makes client binding implementations simpler and smaller. It pushes the complexity to one place -- the database server itself. Judging by the state of code in our Python PostgreSQL driver asyncpg, implementing timeouts correctly and efficiently is not easy.
We propose to add a special header to
Execute
,Optimistic Execute
,Execute Script
, andPrepare
client messages:The header will contain a 64-bit unsigned integer value, representing the max execution duration in microseconds. If the operation takes longer than the specified time window, the server would abort the query and return a
QueryTimeoutError
exception to the client.The "pros" of adding centralized query execute timeouts are:
asyncio.wait_for()
. Essentially clients will only need to compute the corresponding timeouts forParse
andExecute
messages.The "cons" are:
Along with the per-command timeouts, we propose to introduce system-level and session-level config options to set default max timeouts.
Cancellation
Even with EdgeDB implementing timeouts handling at the server level there are still cases when A client might want to cancel the current query, e.g.:
Before outlining the proposed protocol changes for EdgeDB, it's worth revisiting how cancellation is implemented in PostgreSQL. In short, to cancel the current operation, the client must open a new connection to the DB and send a special cancel message, essentially signalling that a certain PostgreSQL worker process should abort whatever operation it is performing now. The cancellation either aborts the current operation making it raise an error, or it might have no effect at all, i.e. when the worker process has already finished processing of the request. There are a few problems with this approach: opening a new connection can be time consuming; the client cannot reliably cancel some particular operation. The latter means that it's not possible to implement both reliable query pipelining and reliable cancellation.
The cancellation protocol of PostgreSQL is designed this way to keep the implementation simple. PostgreSQL uses blocking sockets and Unix signals to actually interrupt worker processes. Since EdgeDB itself uses non-blocking sockets and its code is concurrent we can implement a different approach:
Introduce a new
OPERATION_ID
binary protocol message header that client can attach to operations it might want to cancel. We'll likely limit the header to be valid only forExecute
,Optimistic Execute
,Execute Script
, andPrepare
client messages in the beginning. The header will be a 64-bit integer, preferably unique, but the server will not try to enforce that.Introduce a new
CancelRequest
protocol message with a single 64-bit integer in it: the ID of the operation the client wants to cancel.Add a system-level config option to control the max size of the server per-connection read-ahead buffer. The server will use the buffer to read ahead protocol messages and put them in a queue for processing. The buffer will also act as a flow-control mechanism (currently there's none, which has to be fixed anyways).
Essentially, the proposed solution for cancellation in EdgeDB is to use the same network connection to the DB to both make queries and to cancel them. A well-behaved client should maintain a unique counter of operations and send a unique ID along with every request. When the client want to cancel some operation it would send a
CancelRequest
message along with the operation ID. The server will then do one of the following:If the read-ahead buffer has only the
CancelRequest
message and there's no ongoing operation the request is ignored.If the read-ahead buffer already has messages in it, the server will note the requested cancellation ID and keep processing the messages. If there's a message with matching ID the server replies to the client with
OperationCancelledError
for all messages up until the bufferedCancelRequest
client message.The proposed design has the following advantages over the PostgreSQL approach:
Beta Was this translation helpful? Give feedback.
All reactions