-
-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi,
I've encountered the following problem: When using slog-loki, if the Loki server is not reachable when sending the first message, the process may halt. This only seems to occur if there is a significant (>= 1 second?) delay between the first two messages. Consider the following code:
package main
import (
"time"
"github.com/grafana/loki-client-go/loki"
slogloki "github.com/samber/slog-loki/v3"
"log/slog"
)
func main() {
config, _ := loki.NewDefaultConfig("http://localhost:3100/loki/api/v1/push")
config.TenantID = "xyz"
client, _ := loki.New(config)
lokiLogHandler := slogloki.Option{Level: slog.LevelDebug, Client: client}.NewLokiHandler()
defer func() {
client.Stop()
}()
lokiLogger := slog.New(lokiLogHandler)
lokiLogger.Error("Message 1")
slog.Info("Message 1 logged")
time.Sleep(1 * time.Second)
lokiLogger.Error("Message 2")
slog.Info("Message 2 logged")
lokiLogger.Error("Message 3")
slog.Info("Message 3 logged")
time.Sleep(3 * time.Second)
slog.Info("Post sleep")
}When running this code without an available Loki instance, I'd expect the application to output the three "Message x logged" info logs to stdout, as well as a lot of "error sending batch, will retry" messages to stderr. However, the application halts for about seven minutes and then continues to run as expected:
$ go run main.go
2024/07/22 22:22:28 INFO Message 1 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]
level=error component=client host=localhost:3100 msg="final error sending batch" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:29:33 INFO Message 2 logged
2024/07/22 22:29:33 INFO Message 3 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:29:36 INFO Post sleep
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]This error only occurs if there is a significant delay between the first two messages. When I remove the first sleep, the expected behaviour occurs.
[...]
lokiLogger.Error("Message 1")
slog.Info("Message 1 logged")
lokiLogger.Error("Message 2")
slog.Info("Message 2 logged")
lokiLogger.Error("Message 3")
slog.Info("Message 3 logged")
[...]Output:
$ go run main.go
2024/07/22 22:33:45 INFO Message 1 logged
2024/07/22 22:33:45 INFO Message 2 logged
2024/07/22 22:33:45 INFO Message 3 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:33:48 INFO Post sleep
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]I haven't yet had time to investigate further edge cases or to look into how grafana/loki-client-go handles message batching, which I think could be related to this issue.
Any ideas on what the cause of this issue may be and how to solve it?
Best regards,
Daniel