Skip to content

Failure to send log message can block the process #18

@danteu

Description

@danteu

Hi,

I've encountered the following problem: When using slog-loki, if the Loki server is not reachable when sending the first message, the process may halt. This only seems to occur if there is a significant (>= 1 second?) delay between the first two messages. Consider the following code:

package main

import (
	"time"

	"github.com/grafana/loki-client-go/loki"
	slogloki "github.com/samber/slog-loki/v3"

	"log/slog"
)

func main() {
	config, _ := loki.NewDefaultConfig("http://localhost:3100/loki/api/v1/push")
	config.TenantID = "xyz"
	client, _ := loki.New(config)

	lokiLogHandler := slogloki.Option{Level: slog.LevelDebug, Client: client}.NewLokiHandler()
	defer func() {
		client.Stop()
	}()
	lokiLogger := slog.New(lokiLogHandler)

	lokiLogger.Error("Message 1")
	slog.Info("Message 1 logged")

	time.Sleep(1 * time.Second)

	lokiLogger.Error("Message 2")
	slog.Info("Message 2 logged")
	lokiLogger.Error("Message 3")
	slog.Info("Message 3 logged")

	time.Sleep(3 * time.Second)
	slog.Info("Post sleep")
}

When running this code without an available Loki instance, I'd expect the application to output the three "Message x logged" info logs to stdout, as well as a lot of "error sending batch, will retry" messages to stderr. However, the application halts for about seven minutes and then continues to run as expected:

$ go run main.go             
2024/07/22 22:22:28 INFO Message 1 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]
level=error component=client host=localhost:3100 msg="final error sending batch" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:29:33 INFO Message 2 logged
2024/07/22 22:29:33 INFO Message 3 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:29:36 INFO Post sleep
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]

This error only occurs if there is a significant delay between the first two messages. When I remove the first sleep, the expected behaviour occurs.

[...]
	lokiLogger.Error("Message 1")
	slog.Info("Message 1 logged")
	lokiLogger.Error("Message 2")
	slog.Info("Message 2 logged")
	lokiLogger.Error("Message 3")
	slog.Info("Message 3 logged")
[...]

Output:

$ go run main.go
2024/07/22 22:33:45 INFO Message 1 logged
2024/07/22 22:33:45 INFO Message 2 logged
2024/07/22 22:33:45 INFO Message 3 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:33:48 INFO Post sleep
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]

I haven't yet had time to investigate further edge cases or to look into how grafana/loki-client-go handles message batching, which I think could be related to this issue.

Any ideas on what the cause of this issue may be and how to solve it?

Best regards,
Daniel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions