-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statsd seems to be dropping metrics probably because of UDP #2044
Comments
Is there any update on this? |
If there is an update we will be updating the issue. Do you want to work on it? |
Any updates about this issue? |
Same here |
I've just changed where I run the datadog-agent and changed some settings of the SO, it worked for me. |
@rodrigotheodoropsv - That's great that you got it working. Can you tell me what settings were changed to get it to work? |
We use a VM with Ubuntu, but I don't know the version and more details about it because the team of DevOps who take care about it. Ps: There are a lot of settings that we use for other things. The settings are: net.core.somaxconn | 65535 |
Does anyone experienced the same in a containerized environment? |
I tried to investigate the issue and my initial hunch is that the problem is not related to udp. It seems that statsd could not handle incoming packets fast enough. I first tried netcat to verify that packets are delivered to statsd host (Neither tcpdump nor a package manager exists on statsd-exporter). I changed the statsd address to port 9127 on k6 container definition:
I also added some sleep to command so that it waits until nc is started on statsd-exporter. On statsd-exporter, I started nc: After the end of the test request count reported by k6 matches the "k6.http_reqs" count on packets.txt:
On the other hand, when I use statsd as output address, on its own udp receive metrics, it reports less packets than the ones captured by tcpdump on k6 host:
When I checked the statds_export code (https://github.com/prometheus/statsd_exporter/blob/871e2d8df1da51d4eed27738f4bc079271a09c61/pkg/listener/listener.go#L54-L82), it seems that instead of processing the packet in a seperate goroutine/thread, it first parses the packet to events, then sends them to a channel in the udp handling goroutine/thread. Parse operation might take a lot of time depending on the metrics delivered and this may cause packet drops. It seems that k6 sends a large UDP packet:
This is my initial investigation, it may not be 100% percent correct though :) |
Lowering flush interval helps not because packets are smaller (tcpdump shows similar packet sizes, i.e around 1400 bytes) but packets are not sent in bursts when interval is smaller. I mean when I compared network captures of 1s and 100ms interval, it shows this (I am summarizing):
So higher flush interval means more packets to process by statsd_exporter with less time, therefore if it cannot catch up, it drops packets. Before going further, could you please tell me which direction to pursue? Newer statsd (datadog-go) client has a feature called "ClientSideAggregation" which reduces the number of packets sent to statsd by aggregating them on client side, however k6's dependency policy explicitly states that it is not prefferred to update datadog-go dependency, therefore going on that direction is not feasible I think |
I missed the references pull request (govuk-one-login/performance-testing#122) which also lowers the flush interval. I also tried changing statsd_exporter logic to process udp packet in a different goroutine, which helped a lot (no packet drop occurred for several attempts): https://github.com/kullanici0606/statsd_exporter/tree/enqueu_udp_packets I think, lowering the flush interval to 100ms, increasing the buffer size and changing statds_exporter logic a little bit will be the easiest and the best path for solution. |
The version of the k6 in the docker file on the forum is 0.32.0 and on v.0.33.0 some tags are blacklisted / blocked so that UDP packets are smaller: So upgrading k6 to a version newer than v0.33.0 will definitely help. To sum up:
will reduce the probability of packet drops. @mstoykov Do you think this issue needs further investigation? I tried implementing part of the client side aggregation on k6 by decorating / wrapping statsd client but it seems that most of the metrics are trends / timing information which cannot be aggregated easily. Counter is easy but there is only one counter, so it helped a little when I tried aggregating it. Lines 87 to 95 in 3655065
|
@kullanici0606 This issue will likely be moved along all other statsd issues "soon" ™️ to the https://github.com/LeonAdato/xk6-output-statsd as part of #2982. I guess v0.33.0 helped a lot, but this still will happen with enough requests from my memories when I last tested it. |
Yes, even with v0.33.0, problem occurs sometimes, therefore I also tried to do some improvements on statsd_exporter side too: @mstoykov should I stop working on this issue then? |
I would recommend either waiting for the final move of issues or opening an issue with the new repo. The k6 core team will in practice have no connection with this and as part of that I will not tell anyone how and what to do with the extension :) |
Hey @wvikum, We intend to merge into k6core in the upcoming releases, and it will be probably the best solution to stream metrics to a 3rd party metrics database. Please, check if that works for you and with your DataDog integration. |
Per @olegbespalov and @javaducky, this issue should probably be part of the StatsD project. Feel free to transfer it here: |
Closing in favor of LeonAdato/xk6-output-statsd#29 |
Discussed in forum thread.
My last comment confirms that:
Possible actions:
The text was updated successfully, but these errors were encountered: