Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP packets can *not* be received #2

Open
rdlh opened this issue Mar 28, 2023 · 22 comments
Open

UDP packets can *not* be received #2

rdlh opened this issue Mar 28, 2023 · 22 comments

Comments

@rdlh
Copy link

rdlh commented Mar 28, 2023

Hi there!

First, thanks for the amazing jobs you guys have made.

We're having an issue using ZUMBLE on a french 250+ simultaneous player server.
Sometimes (that's the issue), players are stuck in the default Mumble channel and people can speak, but can't hear others.
After a random number of login / logout, it just works.

The only thing we've found after some investigations are those logs for EVERY stucked player :

UDP packets can *not* be received. Switching to TCP tunnel mode

And seconds later :

UDP packets can be received. Switching to UDP mode.

We're using pma-voice lastest version and FiveM build 6231

Do you have any idea about how to fix this?

Cc @ChaosQK

@joelwurtz
Copy link
Contributor

There are no logs on the zumble process that show why those players can't connect to udp ?
Do you use the latest version also ?

FYI a client to server connection works like this :

  • client connect in tcp
  • client exchange informations with the server : syncing the user / channel availables and the server will send to the client a crypt setup packet with a key to encrypt udp packet
  • once client receive this crypt setup it will connect in udp
  • server then try to match which tcp client this udp connection relat to by testing the crypt udp with the various crypt setup it issue

so normally in the logs you would see a client connecting in tcp and then a udp client connected related to this client (approximately 500ms after the tcp connection)

Also we didn't test a lot the communication with channels since we don't use channels in our voice system (we only use target link) so maybe the problem comes from that

@rdlh
Copy link
Author

rdlh commented Mar 29, 2023

Thanks for your fast answer !

I can send you some anonymised logs if needed to help you find what's going on, feel free to ping me on Discord Knid#4242
We pulled ZUMBLE repo yesterday before production release from oss branch, so I think we're using the lastest version.

We tried to implement soz-voip but it's not compatible with the phone resource we're using and soz-phone is not immediatly an option for us as we're unfortunately using ESX,

@rdlh
Copy link
Author

rdlh commented Mar 29, 2023

We're also seeing some logs spamming sometimes saying:

2023-03-29T02:01:24.116090Z ERROR zumble::server::udp: cannot send voice packet to client: no available capacity

Is this an issue we can fix on our end?

@joelwurtz
Copy link
Contributor

No we are also facing this issue, it can happens sometime, don't know why it happens, something is blocking the client loop and i don't see what, this result in the message queue for this client to be full and this error happens.

I believe there is some race conditions when someone disconnect (or something else), but normally those errors should not stay long, as when it's blocked the tcp / udp connections is "freeze" and then the client reconnects again with a new account and this one is dropped afterwards

@joelwurtz
Copy link
Contributor

But this may be related to your issue, i will try to debug this in the coming days to not have this error anymore

@galexrt
Copy link

galexrt commented Aug 27, 2023

I have no experience in Rust, but reading the Tokio docs suggests it might be all about the channel buffer sizes (maybe if a client just takes longer to be sent the messages as you said with the timeout/disconnect till the client is dropped)
(I have increased the targets as well, though it should be unrelated to the actual issue)

I wonder if these changes below make any sense to someone who knows Rust/Zumble or if the changes are just "postponing" the problem to a "later time" (daily restarts FTW?). The patch below makes the situation a bit better for us now.

Here's the patch diff:

diff --git a/src/client.rs b/src/client.rs
index 4c6d448..04171c0 100644
--- a/src/client.rs
+++ b/src/client.rs
@@ -71,8 +71,8 @@ impl Client {
         publisher: Sender<ClientMessage>,
     ) -> Self {
         let tokens = authenticate.get_tokens().iter().map(|token| token.to_string()).collect();
-        let mut targets = Vec::with_capacity(30);
-        targets.resize_with(30, Default::default);
+        let mut targets = Vec::with_capacity(128);
+        targets.resize_with(128, Default::default);
 
         Self {
             version,
diff --git a/src/handler/voice_packet.rs b/src/handler/voice_packet.rs
index a5c33ad..4d83c97 100644
--- a/src/handler/voice_packet.rs
+++ b/src/handler/voice_packet.rs
@@ -86,8 +86,9 @@ impl Handler for VoicePacket<Clientbound> {
                             Ok(_) => {}
                             Err(err) => {
                                 tracing::error!(
-                                    "error sending voice packet message to {}: {}",
+                                    "error sending voice packet message to {} (capacity: {}): {}",
                                     client_read.authenticate.get_username(),
+                                    client_read.publisher.capacity(),
                                     err
                                 );
                             }
diff --git a/src/server/tcp.rs b/src/server/tcp.rs
index aa81fc6..4e0ed51 100644
--- a/src/server/tcp.rs
+++ b/src/server/tcp.rs
@@ -60,7 +60,7 @@ async fn handle_new_client(acceptor: TlsAcceptor,
     let (version, authenticate, crypt_state) = Client::init(&mut stream, server_version).await.context("init client")?;
 
     let (read, write) = io::split(stream);
-    let (tx, rx) = mpsc::channel(128);
+    let (tx, rx) = mpsc::channel(1024);
 
     let username = authenticate.get_username().to_string();
     let client = {
diff --git a/src/server/udp.rs b/src/server/udp.rs
index 0a781f4..fdc5eb1 100644
--- a/src/server/udp.rs
+++ b/src/server/udp.rs
@@ -151,7 +151,7 @@ async fn handle_packet(mut buffer: BytesMut, size: usize, addr: SocketAddr, prot
                 (Some(client), Some(packet)) => {
                     {
                         tracing::info!(
-                            "UPD connected client {} on {}",
+                            "UDP connected client {} on {}",
                             client.read_err().await?.authenticate.get_username(),
                             addr
                         );
diff --git a/src/state.rs b/src/state.rs
index b2d7449..5bfa20f 100644
--- a/src/state.rs
+++ b/src/state.rs
@@ -176,7 +176,12 @@ impl ServerState {
                 }) {
                     Ok(_) => {}
                     Err(err) => {
-                        tracing::error!("failed to send message to {}: {}", client_read.authenticate.get_username(), err);
+                        tracing::error!(
+                            "failed to send message to {} (capacity: {}): {}",
+                            client_read.authenticate.get_username(),
+                            client_read.publisher.capacity(),
+                            err
+                        );
                     }
                 }
             }

If I can help with debugging this let me know.

@NariieL
Copy link
Member

NariieL commented Sep 7, 2023

Small update: On SOZ with 310 players connected simultaneously, we had no VOIP problems.

@NariieL
Copy link
Member

NariieL commented Sep 20, 2023

Small update² : Still no problems with 387 players connected. I hope the problem has been solved on your side. <3

@Korioz
Copy link

Korioz commented Oct 7, 2023

Hey, we've been using it on a 1100~ peak players average server, it had only one issue, when we reached 1000 players the sound just cut off for every new connections, like if there was a connection pool limit.

I've applied some tiny patches which helped resolve some issues, you can check it out here

Anyways, it's still a prototype but this project is a viable replacement for others external mumble server solutions.

@TheiLLeniumStudios
Copy link

@Korioz are you using ZUMBLE with pma-voice? or soz-voip?

We're having issues with servers that use pma-voice with ZUMBLE where 99% players are not assigned a channel / the channel doesn't get created for them and are placed in the root channel and cannot hear each other at all.

We've also tried explicitly invoking MumbleCreateChannel before assigning a player to it but it seems like it has no effect and that can be seen if we join via a desktop mumble client

@trong0981
Copy link

trong0981 commented Jan 17, 2024 via email

@TheiLLeniumStudios
Copy link

@trong0981 can you explain what do you mean by that? Are you saying async is causing issues and there are race conditions because of that?

@joelwurtz
Copy link
Contributor

Also don't understand by what you mean about unsync thread ? If you give a snippet on what you change would be nice so we can portback to this library and make it profit for everyone

@trong0981
Copy link

trong0981 commented Jan 17, 2024

I using Zumble, it good. but my knowledge of Rust language is not good enough to be able to fix errors. I think this trick can help you fix all problems still avaible in Zumble.

There are 2 problems that if can be solved, I think all the errors that people often make will be completely resolved:

  1. Take advantage of UnboundedReceiver & UnboundedSender to avoid having to create channels for tokio's multi-threaded processing. Handle deadlock problem when using Unbounded if there are too many clients or VPS (DC) is too weak.

  2. Completely delete the chat room if no one left in the room to avoid memory overflow due to too many chat rooms.

@TheiLLeniumStudios
Copy link

TheiLLeniumStudios commented Jan 17, 2024

Made a bunch of changes over at my fork that has now resolved all the issues.

  • Made channels permanent (This is REQUIRED if you're using channel based clients). This makes sure that the channels are not removed on disconnects and in return clients will be able to re-use the channels so they won't be stuck in the root channel (I know this can lead to high memory usage but in my case when using pma-voice, channels are heavily reused and the maximum number of created channels are always equal to the max number of concurrent players that ever joined your server so not a big issue. Tested with 450 players and only about 350MB memory usage was seen)
  • Increased the capacity to 4096. I was getting no available capacity when the value was set to 256
  • Reduced error log spam for channel closed. They are now logged as debug since they're not useful
  • Setup a workflow for publishing container images
  • Setup a workflow for publishing executable binaries for windows and linux
  • Added a helm chart to run it on kubernetes

With all these changes, Zumble runs pretty stable and I am not seeing any issues so far.

Also the reason of making the channels permanent is that the channels get cleaned up if no one is there but the same channel with the same channel id CANNOT be recreated, which is why I just made them permanent. If anyone can help with figuring out why this happens, that would be awesome. Maybe @joelwurtz might have some idea

I will however look at the UnboundedReceiver and UnboundedSender classes. I have 0% knowledge of RUST so I'm not confident that I'll be able to fix the actual root cause which is why I just forked it rather than contributing hotfixes

Link to my fork: https://github.com/iLLeniumStudios/ZUMBLE

@joelwurtz
Copy link
Contributor

Unbounded only means there is no more limit on the message buffer (when sending / receiving message between threads), i'm not a fan since if there is a problem when consuming a message it will grow without errors (so it can destroy your memory) and also it will use more cpu (as it need to allocate memory on the fly for the buffer, where as a fixed buffer will only be allocated on start)

But increasing the capacity should be okay like you have done, i presume most of those errors happens when one of the client thread is blocked due to a disconnect (it await for some times writing the message to the client) and a lot of voice message get stacked (until there is no more space left on the buffer).

I will try to take a look at why channel cannot be recreated, since we don't use them i have not tested them a lot (we only use target even for proximity)

@TheiLLeniumStudios
Copy link

Thanks for looking into it @joelwurtz
I suspect that it has something to do with channel not getting removed from the state. Or some sort of a race condition between removal and creation for the same channel id. I tried to remove the channel inside of the temporary check itself but ran into a bunch of mutable errors that I failed to fix. I'd assume that's why you're trying to remove it from the channels state map in the handler itself rather than right after checking that it's not needed.

@TheiLLeniumStudios
Copy link

On the plus side, it seems to be working well even with permanent channels. Here's how the CPU and Memory usage looks like for the past 6 hours where the players started from 450 and then went down to 100
image

@joelwurtz
Copy link
Contributor

Channel does not consume a lot of memory so even if you have like 10 000 channels it should use relatively low memory

@TheiLLeniumStudios
Copy link

TheiLLeniumStudios commented Jan 17, 2024

Makes sense. This issue is not a blocker by the way but I'll try to add some debug logs around removing channels to see if I can figure it the problem (even if I can't fix it myself). Once we fix it somehow, I'll be happy to raise a PR for my changes and the workflows back to this repo

@TheiLLeniumStudios
Copy link

TheiLLeniumStudios commented Jan 17, 2024

So I tried to debug this and it looks like the remove channel statement never gets triggered (when a player disconnects), neither on channel_state: https://github.com/SOZ-Faut-etre-Sub/ZUMBLE/blob/oss/src/handler/channel_state.rs#L84
nor on user_state: https://github.com/SOZ-Faut-etre-Sub/ZUMBLE/blob/oss/src/handler/user_state.rs#L36

I don't know the exact reason but that is what's basically causing the channels to be there and not get recreated

@FingerlessGlov3s
Copy link

FingerlessGlov3s commented Oct 15, 2024

@TheiLLeniumStudios

Made a bunch of changes over at my fork that has now resolved all the issues.

Does this fix the issue, where if I toggled off my Voice Chat and then back on again, people could hear me but I couldn't hear them? I was also getting various issues, where rejoining the server would fix voice for someone.

Just tried to deploy this repos' versions of Zumble to server of 200-300 players, and proximity chat was very hit n miss working, radio always worked though. Using PMA-Voice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants