Log forward(er)

Goal

Using scalyr and datadog logging platforms as drop-in replacement for your previous logstash architecture.

Testing it

You can start a logstash server that will redirect its stream both on datadog and scalyr servers at the same time with the following command:

docker run --rm -p 5050:5050 -e DATADOG_TOKEN=... -e SCALYR_TOKEN=... habx/logfwd:dev

At least one of the two tokens (DATADOG_TOKEN or SCALYR-TOKEN) is necessary.

Why

Scalyr and datadog push you to use their agent to do logging but:

Scalyr agent sucks, it consumes a LOT of CPU and RAM
Datadog agent costs a LOT ($15/server)

Logstash has logging clients in all languages, it's good compromise between simplicity and performance.

How it works

General overview

For each logstash client connected to the server, it will create independent logging sessions, whether it's for sclayr or datadog.

Each logstash event is parsed, converted to basic logging event, then passed to all the output clients that are enabled.

Performed translations

To easily re-use an existing logstash implementation, a few tricks are needed:

@timestamp is converted to the scalyr's ts field
@message becomes the message attributes
@fields are brought moved to the root attribute properties (as scalyr doesn't support nested search)
level / levelname fields are converted to the scalyr's sev fields
source is set to logfwd

Env vars

Everything is handled through environment variables

Logstash input

LISTEN_ADDR (optional): Port to listen on. Defaults to :5050
LOG_ENV (optional): Logging mode. Defaults to prod. Use dev for sort-of-pretty logging in debug level.
LOGSTASH_EVENT_MAX_SIZE (optional): Maximum size of a logstash event. Defaults to 307200 (300 KB)
LOGSTASH_AUTH_KEY (optional): Key to use for authentication. Not set by default
LOGSTASH_AUTH_VALUE (optional): Value expected for the authentication key. Not set by default

Scalyr output

SCALYR_WRITELOG_TOKEN (enables it): Your scalyr log write token
SCALYR_FIELDS_CONV_MESSAGE (optional): Conversion to apply between logstash and scalyr event attributes
SCALYR_FIELDS_CONV_SESSION (optional): Conversion to apply between logstash events and scalyr log session attributes
SCALYR_SERVER (optional): URL to use for reporting logs. Defaults to https://www.scalyr.com, you can also use https://eu.scalyr.com for an european account
SCALYR_REQUEST_MAX_NB_EVENTS (optional): Max number of events to send by request. Defaults to 20
SCALYR_REQUEST_MAX_REQUEST_SIZE (optional): Maximum size of a request. Defaults to 2097152 (2MB)
SCALYR_REQUEST_MIN_PERIOD (optional): Minimum time between queries (mostly for testing, can also be used to reduce total bandwidth)
SCALYR_QUEUE_SIZE (optional): Buffering queue between logstash and scalyr. Defaults to 1000

Datadog output

DATADOG_TOKEN (enables it) : Your datadog token
DATADOG_SERVER (optiona): Datadog server. Defaults to intake.logs.datadoghq.com:15506, use tcp-intake.logs.datadoghq.eu:443 for europe
DATADOG_QUEUESIZE (optional): Queue size. Defautls to 20. As it's a TCP to TCP stream, it can be kept to a low value
DATADOG_FIELDS_CONV_MESSAGE (optioanl): Conversion of message fields
DATADOG_FIELDS_CONV_TAGS (optional): Conversion of message fields to tags

Default scalyr conversion

For messages

{
    "@source_host": "hostname",
    "@source_path": "file_path",
    "@message":     "message",
    "@type":        "logstash_type",
    "@source":      "logstash_source",
    "@tags":        "tags"
}

For sessions

{
    "appname": "serverHost",
    "env":     "logfile"
}

Dependencies

The dependencies outside the standard library are:

zap for logs
envconfig for config management through environemnt variables
go.uuid for scalyr sessions UUID generation

Feedback

Any feedback is welcome.

Possible evolutions

Handling of scalyr's threads. I'm not entirely sure of how we could use them.

Known issues

Could be optimized (but probably handles tenths of megabytes per second)
Some logstash fields might not be very well converted
Some logstash fields might be transmitted in the session data to reduce the amount of data being sent
There's not a single unit tests
Each connection can consume a lot of memory (roughly 300KB * 1000 = 300MB), but will likely consume a lot less in standard usage
Needs some refactoring
SSL isn't supported (easy to add)
No clean shutdown: We should stop to accept clients and disconnect existing ones
Only supports TCP. UDP wouldn't be difficult to setup but the scalyr's sessionInfo mechanism would have to be handheld by other means

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.circleci		.circleci
.github		.github
clients		clients
sample_clients		sample_clients
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
clienthandler.go		clienthandler.go
config.go		config.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
package.json		package.json
server.go		server.go
version.go		version.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Log forward(er)

Goal

Testing it

Why

How it works

General overview

Performed translations

Env vars

Logstash input

Scalyr output

Datadog output

Default scalyr conversion

For messages

For sessions

Dependencies

Feedback

Possible evolutions

Known issues

License

About

Releases

Packages

Contributors 5

Languages

habx/service-logfwd

Folders and files

Latest commit

History

Repository files navigation

Log forward(er)

Goal

Testing it

Why

How it works

General overview

Performed translations

Env vars

Logstash input

Scalyr output

Datadog output

Default scalyr conversion

For messages

For sessions

Dependencies

Feedback

Possible evolutions

Known issues

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages