-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get() handles non-existent q_name cost-effectively #15
base: master
Are you sure you want to change the base?
get() handles non-existent q_name cost-effectively #15
Conversation
I think it's the wrong way to fix this issue. It might actually just work if you use something like this: SET statement_timeout TO <timeout>
QUERY
RESET statement_timeout I think you can fold those into a You would put that timeout configuration in the |
@malthe I think I described the issue poorly by using the term 'long running queries'. Please take a look at the extra details I posted in the discussion of the issue and let me know what you think. I might be misunderstanding your suggestion, but I don't think aborting long running queries will solve the high call volume. |
I don't think that it would be a good idea to set a timeout in this case. The problem does not reside on the query execution length, it is how much noise is being underlying in the internal central queue system for the async calls in Postgres. The fix over the I'm thinking, that the only way to scale this calls, is splitting the queue table into different databases in the same instance. Which can be translated in other way by sharding by Currently, every backend connected to the same database, will listen and wake up on notifications in the same database, triggering the query execution. Having no-rows queries is adding noise to the channels and additional query executions. |
Okay, I got it now. I think it makes sense to consider if Yes, it doesn't scale to many of such pairs, but it does give good performance in the case of up to a dozen queues with some idle and some active. Plus it leaves an option for the PostgreSQL developers to improve the performance if there are users that run into a limitation here. Wdyt? |
@malthe I think that is a good idea (see bellow), however still don't know if this should avoid all the queries with non-existent https://github.com/malthe/pq/blob/master/pq/create.sql
https://github.com/malthe/pq/blob/master/pq/__init__.py#L250:
This can avoid execute q_name on different table listenings. However, as the queue is database based, each listening backend will be still "waken up":
After reviewing |
By "backend" here you mean the PostgreSQL server process that handles a single client connection (session) – ? |
@malthe Yes, that extract is from the Postgres code, so it refers to server child processes handling connections. |
I don't see how it's a huge problem that the backend (which is a child My suggestion would be to encourage the use of an application-level message Malthe
|
@malthe The problem is that the filtering is not done over only 1 backend, every backend that is listening on at least one channel is an entry on the async array. We know that the design would need to step away from transactional to an app-level queue system in a future, but we need to keep it as it is until then. With the change purposed above (splitting the listeners), I'm pretty confident that the CPU usage will reduce considerably in certain scenarios (when you split the queues in the same database). |
It seems like there's a less than perfect behavior there in the PG codebase. But then again, they have always maintained that you can't really do queues. |
(self.table, self.name) | ||
) | ||
row = cursor.fetchone() | ||
self.has_records = True if row else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool(row)
Hi @malthe -
This is a simple fix intended to address Issue #14 that @3manuek posted.
We discovered some very long running queries on our database today that were due to a worker trying to
get()
a task for aq_name
that did not exist in our table. This situation can occur when the rows of aq_name
are purged from the tasks table and thenput()
occurs very infrequently. If the worker is trying toget()
repetitively in this scenario, it results in the costly queries that @3manuek described (especially on a big table with many other q_names).I'm no wizard, but these changes seem safe and introduce little need to refactor the tests. I used
time.sleep(timeout)
in lieu of the unixselect.select(timeout)
, since it didn't seem appropriate to monitor the file descriptor in this specific context.I hope it works out! If not, I'd love to hear your thoughts on a better approach.
Thanks,
-Matt