Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix server hang when trying to stop a server with active connections with event notifications #8432

Conversation

TreeHunter9
Copy link
Contributor

@TreeHunter9 TreeHunter9 commented Feb 7, 2025

The reason for the hang is that we handle port_async in a strange way. When we create it in aux_request() it is added to chain of client ports (port_clients) via alloc_port(), but is not added to inet_ports. inet_ports used in PortsCleanup::closePorts() when we start shutdown server to close ports that we currently have. So, after closing all client ports, except port_async, we start looping over these active ports in select_multi(), hoping to get some data from them. This is happens because port_async is presented in port_clients chain and has port_state = PENDING.
My fix for this problem is to close port_async when we close it's parent port.

The hang can be reproduced by running example/api/api16 and making one more additional connection to the database, then try to kill the server, it will hang with inability to receive new connections, it can only be stopped with kill -9.

This bug can also be reproduced in v5.0, I haven't tested it in older versions, but it seems the code is the same.

@AlexPeshkoff
Copy link
Member

@TreeHunter9 Can I take a look at server stacks in mentioned state?

The hang can be reproduced by running example/api/api16 and making one more additional connection to the database, then try to kill the server, it will hang with inability to receive new connections, it can only be stopped with kill -9.

@TreeHunter9
Copy link
Contributor Author

Thread 6 (Thread 0x775a058006c0 (LWP 61179) "firebird"):
#0  0x0000775a0ce98d61 in __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x775a057ffc40, op=393, expected=0, futex_word=0x775a057ffd70) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x775a057ffc40, clockid=0, expected=0, futex_word=0x775a057ffd70) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x775a057ffd70, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x775a057ffc40, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000775a0cea4ce0 in do_futex_wait (sem=sem@entry=0x775a057ffd70, abstime=abstime@entry=0x775a057ffc40, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000775a0cea4d83 in __new_sem_wait_slow64 (sem=0x775a057ffd70, abstime=0x775a057ffc40, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x00005bc3c6137857 in Firebird::SignalSafeSemaphore::tryEnter (this=0x775a057ffd70, seconds=60, milliseconds=60000) at /firebird/src/common/classes/semaphore.cpp:202
#6  0x00005bc3c60e3433 in Worker::wait (this=0x775a057ffd60, timeout=60) at /firebird/src/remote/server/server.cpp:7297
#7  0x00005bc3c60e28a6 in loopThread () at /firebird/src/remote/server/server.cpp:7100
#8  0x00005bc3c610b970 in (anonymous namespace)::ThreadArgs::run (this=0x775a057ffe00) at /firebird/src/common/ThreadStart.cpp:78
#9  0x00005bc3c610ba48 in (anonymous namespace)::threadStart (arg=0x775a0d81f380) at /firebird/src/common/ThreadStart.cpp:94
#10 0x0000775a0ce9ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#11 0x0000775a0cf29c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 1 (Thread 0x775a0d83b7c0 (LWP 61099) "firebird"):
#0  0x0000775a0cf1b4cd in __GI___poll (fds=0x775a0d81b298, nfds=1, timeout=60000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00005bc3c60a18f4 in Select::select (this=0x775a0d81b280, timeout=0x7ffcd3a416d0) at /firebird/src/remote/inet.cpp:454
#2  0x00005bc3c609c031 in select_wait (main_port=0x775a0d80f6d0, selct=0x775a0d81b280) at /firebird/src/remote/inet.cpp:2365
#3  0x00005bc3c609b98b in select_multi (main_port=0x775a0d80f6d0, buffer=0x775a0d813d50 "", bufsize=8192, length=0x7ffcd3a417ba, port=...) at /firebird/src/remote/inet.cpp:2144
#4  0x00005bc3c60b5eb1 in rem_port::select_multi (this=0x775a0d80f6d0, buffer=0x775a0d813d50 "", bufsize=8192, length=0x7ffcd3a417ba, port=...) at /firebird/src/remote/remote.cpp:672
#5  0x00005bc3c60ced79 in SRVR_multi_thread (main_port=0x775a0d80f6d0, flags=2) at /firebird/src/remote/server/server.cpp:1736
#6  0x00005bc3c60f3b7b in main (argc=1, argv=0x7ffcd3a424a0) at /firebird/src/remote/server/os/posix/inet_server.cpp:582

@dyemanov dyemanov merged commit 26ff286 into FirebirdSQL:master Mar 3, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants