Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ping/Pong v2 #53

Merged
merged 3 commits into from
May 2, 2022
Merged

Ping/Pong v2 #53

merged 3 commits into from
May 2, 2022

Conversation

Theldus
Copy link
Owner

@Theldus Theldus commented Apr 30, 2022

Description

In PR #51 an early version of Ping/Pong was introduced by @gloveboxes.

As discussed earlier in issue #50, some modifications would be made for a final version of wsServer PING support. This PR addresses those changes.

This 'v2' changes:

  • Removal of using snprintf/strtol to send the PING id as a string: the PING id is now sent as a number, always encoded as a big-endian 32-bit signed integer. This brings an interesting advantage:

    • RFC 6455 specifies that an unsolicited PONG frame can be sent, as a unidirectional heartbeat: sending a non-fixed-length string makes it difficult to know whether the received PONG was in response to a PING or not. On the other hand, sending a number with known encoding and size makes it easier to know if the PONG received is in response to a PING or not, either by the size (always 4 bytes), or by the content itself.
  • Use of locks for reading/writing the PING and PONG id, since there may be race conditions between receiving a PONG and sending a PING.

  • Sending pings to a client or broadcast: the 'client' parameter defines this. In addition, PING and PONG id is now managed by the server itself, which maintains a private copy for each connected client. This also allows the broadcast to send different IDs accordingly to the client counter itself.

Also, this PR adds a man page (doc/man/man3/ws_ping.3) and an example file, located at examples/ping.


@gloveboxes This PR basically brings together everything we talked about earlier and that's what I can imagine for the final version. Since the initial idea was yours, I would like to hear from you: what do you think? is there anything that could be improved/removed/etc or it seems ok to you?

Hope this eases your problem with clients disconnecting while sleeping =).
Again, thank you very much for your contribution.

@gloveboxes
Copy link
Contributor

gloveboxes commented May 1, 2022

hey @Theldus there are a couple of deadlock issues

If the server is sending messages when the client silently disconnects the server.

When sending, I can see that send doesn't return at line (and I can see that locks have been taken)

ret = send(client->client_sock, p, len, flags);

I don't 100% understand your locking strategy, but I can see that a number of locks are taken and ping hangs trying to take a mutex at line

pthread_mutex_lock(&mutex);

reading the header

* block until all content is sent, since _we_ don't use 'O_NONBLOCK'.

I see it's using blocking io - so what i think is happening is ret = send(client->client_sock, p, len, flags); is blocking which is causing deadlocks.

Cheers Dave

@Theldus
Copy link
Owner Author

Theldus commented May 1, 2022

Hey @gloveboxes, thanks for taking a look and testing this PR.

How can I reproduce this? in my tests I had no problems with locks...

I don't 100% understand your locking strategy, but I can see that a number of locks are taken and ping hangs trying to take a mutex at line

My lock 'strategy' boils down to:

  • mutex - Global lock, used whenever iterating over the client list.

  • mtx_state - Per-client lock, used to ensure that the client state is atomically read/set throughout the code.

  • mtx_send - Per-client lock, used to ensure that messages sent to a client will always be serialized, fixes send_all without mutex? #43.

  • mtx_ping - Per-client lock, used to prevent simultaneous access to current_ping_id and last_pong_id.

So you're saying the send() function doesn't return? that's interesting...

Ok, reading the (Linux) man-pages:

When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode.

I could do non-blocking sends, but since I want the client to receive the content, I would have to be in some loop trying to send anyway...

What should I do?... I believe send() will not be blocked forever (or is there such possibility?), so at some point all 'send_ping_close()' will be executed and the lock released again... and then new pings can be sent.

In a way this even makes some sense: new pings are only sent when the previous ones manage to be sent... or does that not make much sense?

@gloveboxes
Copy link
Contributor

gloveboxes commented May 1, 2022

What should I do?... I believe send() will not be blocked forever (or is there such possibility?), so at some point all 'send_ping_close()' will be executed and the lock released again... and then new pings can be sent.

From some limited searching I did it appears the socket will timeout after 20 minutes. (https://ubuntuforums.org/showthread.php?t=1565651)

But you can override with

new_sock = accept(sock, (struct sockaddr *)&client, (socklen_t *)&len);
setsockopt(new_sock, SOL_SOCKET, SO_SNDTIMEO, &(struct timeval){1, 0}, sizeof(struct timeval));

In this case a 1 second timeout.

I had a hack and this does solve the problem. The socket send timeouts, app continues, ping will be called and then it will call close socket when the pong becomes stale.

Perhaps the default timeout could be set in the struct ws_events evs; struct...

@Theldus
Copy link
Owner Author

Theldus commented May 1, 2022

From some limited searching I did it appears the socket will timeout after 20 minutes. (https://ubuntuforums.org/showthread.php?t=1565651)

I see... far from ideal...

But you can override with

new_sock = accept(sock, (struct sockaddr *)&client, (socklen_t *)&len);
setsockopt(new_sock, SOL_SOCKET, SO_SNDTIMEO, &(struct timeval){1, 0}, sizeof(struct timeval));

In this case a 1 second timeout.

I had a hack and this does solve the problem. The socket send timeouts, app continues, ping will be called and then it will call close socket when the pong becomes stale.

Very good, on a quick search here, SO_SNDTIMEO seems to be compatible with Linux, Windows, FreeBSD and MacOS too, which is exactly what I'm looking for.

Perhaps the default timeout could be set in the struct ws_events evs; struct...

Yes... but ws_events in theory should only store the events, but I really plan on adding a structure to the server itself soon (to get rid of global variables), so that's the least important thing.

The only problem I see is that with this approach, the timeout would apply to all send(). Thinking that the user can send large frames (several MB), send() blocking for a few seconds would be normal and even expected...

Ideally the timeout should only apply to sending PINGs, not everything else... Anyway, being able to choose the timeout is better than nothing =).


Idea: Maybe I can add a parameter in ws_sendframe() (not the _txt,_bin versions) to specify the timeout (in milliseconds, if any, 0 disables) and use it in 'send_all()'.

Also, modify 'send_all()' in such a way that if a timeout is specified, send() will be invoked with MSG_DONTWAIT and in case of EAGAIN/EWOULDBLOCK, a loop (with a small sleep) repeats sending until the time expires.

In this way, I can guarantee the 'timeout' for sending only to PING frames (or from a specific frame).

The 'ws_sendframe_[txt, bin]' functions would still be blocking as usual.

What do you think?

@gloveboxes
Copy link
Contributor

Ideally the timeout should only apply to sending PINGs, not everything else... Anyway, being able to choose the timeout is better than nothing =).

Hi there - timeout should apply to all sends.

I'm using libevent to drive the event driven architecture for my app. Event driven and tasks give the illusion of multithreaded but without the overhead of threads. Event tasks are typically small units of work, but the important thing is they must complete else other events wont have a chance to execute. If a socket send be it a data frame of a ping frame doesn't timeout then an event task doesn't complete and the app hangs.

@gloveboxes
Copy link
Contributor

The only problem I see is that with this approach, the timeout would apply to all send(). Thinking that the user can send large frames (several MB), send() blocking for a few seconds would be normal and even expected...

reading the https://linux.die.net/man/3/setsockopt docs for SO_SNDTIMEO

Sets the timeout value specifying the amount of time that an output function blocks because flow control prevents data from being sent. If a send operation has blocked for this time, it shall return with a partial count or with errno set to [EAGAIN] or [EWOULDBLOCK] if no data is sent. The default for this option is zero, which indicates that a send operation shall not time out. This option stores a timeval structure. Note that not all implementations allow this option to be set.

The important thing here is

amount of time that an output function blocks because flow control prevents data from being sent.

To me this reads that if everything is operating as normal then SO_SNDTIMEO won't timeout long multi megabyte sends. The SO_SNDTIMEO is used when flow control prevents data from being sent.

@gloveboxes
Copy link
Contributor

I had a go at implementing timeout. I pass in a global timeout, I'm really not convinced you need to be more granular than that...

extern int ws_socket(struct ws_events *evs, uint16_t port, int thread_loop, uint32_t timeout_ms);

see https://github.com/gloveboxes/wsServer/blob/6705a591f891e24b3d17cd9e920f829ad95adb37/include/ws.h#L270

and I set the timeout on the new socket

https://github.com/gloveboxes/wsServer/blob/6705a591f891e24b3d17cd9e920f829ad95adb37/src/ws.c#L1541

Cheers Dave

@Theldus
Copy link
Owner Author

Theldus commented May 2, 2022

@gloveboxes

The important thing here is

amount of time that an output function blocks because flow control prevents data from being sent.

To me this reads that if everything is operating as normal then SO_SNDTIMEO won't timeout long multi megabyte sends. The SO_SNDTIMEO is used when flow control prevents data from being sent.

You're right, again =).

I made a small program to test this theory:

  • I wrote a small server that sends 100 MiB to a connected client
  • Set a timeout of 1s as suggested by you
  • Added a latency in my network of varying values, between 100ms to 1100ms.
  • Up to 1000ms (fake latency) all bytes were sent perfectly, although it took about 1 minute. For 1100ms latency, the send aborted with just under 9 seconds.

If you want to reproduce:

(You can add latency either on localhost or on the network adapter, just remember that on localhost the latency will be for both sending and receiving. Here I will add it to my network card and test from another computer (with Gigabit adapter) ).

# Download server
$ wget https://gist.github.com/Theldus/19dad093630c33a637872b0bb1f74f5f/raw/\
ed382bcb590337b40b450dcb2f248a8ebc0e91cc/server.c

# Build
$ gcc server.c -o server

# Add 900ms of latency on eth1 (remove with: sudo tc qdisc del dev eth1 root)
$ sudo tc qdisc add dev eth1 root netem delay 900ms

# Run server
./server

# On another computer
$ nc <your-server-address> 8080 | tail -c 10 -

#### After some time you should see on server and client, respectively: ####

# Server output
$ time ./server 
Connected!
Sending buffer...
Sent: Success...

real	1m31,500s
user	0m0,011s
sys	0m0,191s

# Client output
$ nc <your-server-address> 8080 | tail -c 10 -
aaaaaaaaab

tc (Netem) is part of the iproute2 package and should be available by default on distributions like Ubuntu.


I think this is the solution then, thank goodness =).

If you do not mind, I can take your commit and add it to my 'pingpong_v2' branch, so I can credit you for the solution. Otherwise I can add it manually, no worries =).

@Theldus Theldus merged commit d03bf09 into master May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants