Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to connect to wrong port (PG post), hangs #370

Closed
matriv opened this issue Jun 27, 2022 · 15 comments
Closed

Attempting to connect to wrong port (PG post), hangs #370

matriv opened this issue Jun 27, 2022 · 15 comments
Assignees

Comments

@matriv
Copy link
Contributor

matriv commented Jun 27, 2022

crash --host "localhost:5433" --verbose  

Runs for long time, without returning any error.

@matriv matriv added the bug label Jun 27, 2022
@amotl
Copy link
Member

amotl commented Jun 27, 2022

Hi Marios,

thank you for reporting this. I can confirm it by using, for example:

nc -l -p 5433
crash --host "localhost:5433" --verbose

Because crash uses timeout=None, the default timeout value for "connect" and "read" will be used, which is "infinite". The requests library behaves equally. There is some discussion around the topic at urllib3/urllib3#655.

In different spots, I observed others to use default timeout values like those:

from urllib3.util.timeout import Timeout
Timeout(connect=2, read=7)
Timeout(connect=3.05, read=27)
Timeout(connect=5.0, read=?)
Timeout(connect=10.0, read=?)

Do you think it is a good idea to also set reasonable non-infinite default timeout values for crash? If so, what do you think which values we should use?

With kind regards,
Andreas.

@matriv
Copy link
Contributor Author

matriv commented Jun 27, 2022

I think yes we should. It seems to me that a user would expect to see and error message if connection cannot be established.
I would say that 5 or even 10 seconds is a reasonable value, and probably it would be nice to be able to configure it on demand.

@proddata
Copy link
Member

Hm, I am wondering how this would affect long-running queries, considering crash uses the http-endpoint

@amotl
Copy link
Member

amotl commented Jun 27, 2022

@proddata: Long-running queries would fall into the category of the read timeout. We should probably leave this set to "infinite" and only adjust the connect timeout?

Otherwise, do you think we should also configure the read timeout to a maximum value by default? If yes, what do think would be an optimal default value? 24 hours?

@mfussenegger
Copy link
Member

+1 for connect timeout. E.g. 5sec by default
-1 for read timeout.

@matriv
Copy link
Contributor Author

matriv commented Jun 28, 2022

Could we also have some "keep-alive" test that can happen in the background, so even if there is a long running query, we can still get an error that the server(s) is/are disconnected?

@proddata
Copy link
Member

proddata commented Jun 28, 2022

Otherwise, do you think we should also configure the read timeout to a maximum value by default? If yes, what do think would be an optimal default value? 24 hours?

I think there should be none. 24h might be even worse

@matriv
Copy link
Contributor Author

matriv commented Jul 1, 2022

Could we also have some "keep-alive" test that can happen in the background, so even if there is a long running query, we can still get an error that the server(s) is/are disconnected?

This would address the issue that a client issues a long-run query and the hosts it's connected go down, but the client still "thinks" the query is still running. WDYT?

@amotl
Copy link
Member

amotl commented Jul 1, 2022

Dear Marios,

sure we can think about such an "availability background check" feature. However, let's track it on behalf of a different issue instead? I think the scope of this one would be to only adjust the connect timeout accordingly. I also think five seconds as a default value, with an option to adjust it, would be fine.

With kind regards,
Andreas.

@matriv
Copy link
Contributor Author

matriv commented Jul 1, 2022

@amotl Of course, I just wanted to ask here and get opinions, since it's somehow relevant, so we can open a separate issue.

@matriv
Copy link
Contributor Author

matriv commented Jul 5, 2022

@amotl Created a new issue for that proposal: #371

@amotl
Copy link
Member

amotl commented Jan 12, 2024

Report

Hi again. GH-421 adds the --timeout command-line argument, for the purpose outlined here. As suggested, it also configures the default connect timeout to five seconds, while still using an infinite read timeout.

On this particular case, at least when emulating a server through nc -l -p 5433, this does not help much, because the client actually connects, but would need to limit the read timeout to not block forever.

You can exercise it like this:

# Blocks, because read timeout is still infinite.
crash --host "localhost:5433" --timeout=0.42
# Does not block, because read timeout is decreased.
crash --host "localhost:5433" --timeout=0.42,0.42

Conclusion

In this way, GH-421 does not improve the situation as reported by @matriv.

Outlook

I have no other idea how this can be improved when not adjusting the default read timeout, which I am also -1 about.

Maybe it would be a good idea to check if the user aims to unintentionally connect to a port in the 543x range, and yield a corresponding WARNING message only, in the same spirit as the Homebrew package manager is doing it:

Figure out how to make error handling and messaging as informative as possible. The more you can help your users help themselves, [the better].

-- https://dev.to/maximwheatley/how-to-build-great-open-source-devtools-with-max-howell-creator-of-homebrew-5eoc

@amotl amotl self-assigned this Jan 12, 2024
@seut
Copy link
Member

seut commented Feb 13, 2024

How are other clients like e.g. psql behave?

One idea could be to do a fast probing query with a short read timeout at the connection instantiation (SELECT 1), but not sure if this is really good (what if the cluster is overloaded? other issues?)

@amotl
Copy link
Member

amotl commented Mar 26, 2024

How do other clients like e.g. psql behave?

It looks like psql behaves in the same way. Running that on macOS, it does not return for the time of writing this.

nc -l -p 5433
time psql postgresql://postgres@localhost:5433/

One idea could be to do a fast probing query with a short read timeout at the connection instantiation (SELECT 1).

That will not work. What would "fast" actually mean? What if this "quick/fast probe check" with a lower timeout than the configured one would be below the value needed to actually make the connection work, because the server is far away, or, as you may be referring to, overloaded, or such?

@amotl amotl removed the bug label Mar 26, 2024
@amotl
Copy link
Member

amotl commented Mar 27, 2024

Final summary and report, cheers.

Problem Report

crash --host "localhost:5433" --verbose  

Runs for long time, without returning any error.

Evaluation

When connecting to a TCP endpoint "that does not talk the right protocol" 1, and partners can't establish a good conversation, so the request sent by the client will not get answered by the server at all, the conversation on this TCP connection will naturally stall. Unfortunately, the network connection will not be terminated, because there are no measures in place that could make that happen, other than client- and server-side timeout settings.

By default, the behaviour is the same with psql, we've confirmed it on localhost both on Linux and macOS.

psql postgresql://postgres@localhost:5433/

To reproduce it with both psql and crash, a server that does not talk the right protocol can easily be emulated using netcat 2.

Conclusion and Thoughts

Therefore, we are closing this ticket. Please re-open if you feel your issue has not been addressed properly. Thank you for GH-421, which probably would not have happened without your request.

Footnotes

  1. Long version: Where an application is bound to the corresponding TCP socket in LISTEN mode, which implements certain application-level protocols.

  2. nc -l -p 5433. It works well for both HTTP and the PostgreSQL wire protocol, because such a dummy server understands neither of both.

@amotl amotl closed this as completed Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants