Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add SSE streaming via /events #791

Merged
merged 20 commits into from
Jun 22, 2024
Merged

feat: add SSE streaming via /events #791

merged 20 commits into from
Jun 22, 2024

Conversation

teleivo
Copy link
Contributor

@teleivo teleivo commented Jun 13, 2024

This PR allows users to stream instance manager events via SSE. An instance manager event can be anything from "Your DB has been saved" to changes to a pod. This PR only implements the event consuming side. Another PR producing the first instance manager event will follow.

Use cases

These use cases are supported:

image

Note that in case C resume will only work if the EventSource got an event (id) before it lost connection. Only then can it send the HTTP header Last-Event-ID which allows us to resume. Otherwise, the user will get new messages only as in case A.

Architecture

Clients can stream events via HTTP server-sent events. Our web UI will rely on EventSource to establish, maintain the connection and deliver the SSE events to callbacks.

The instance manager opens a consumer to a RabbitMQ stream for every user connecting to the HTTP /events endpoint. By default (if no HTTP header Last-Event-ID) is sent new events will be relayed from RabbitMQ to the user via SSE. If HTTP header Last-Event-ID is sent then message from Last-Event-ID+1 will be sent.

image

Infrastructure Changes

We needed to do a couple of things to use RabbitMQ streaming with filtering

Message Retention

Some napkin math 🔢 first:

At some point we want to push k8s status updates via SSE. I watched k8s pod events for a day on all namespaces and saw k8s event numbers of

  • 10585.5 per day
  • ~441 per hour
  • ~7.4 per minute
  • ~0.13 per second

Note: the numbers are just captured during one day. Activity on other days might be higher and/or grow over time as more users become active.

I sent 20974 messages with this RabbitMQ message data was (add the application headers of group and kind)

{
  Instance:   "my-instance",
  Status:     "Up",
  Deployment: "my-deployment",
  Message:    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
}

This resulted in 12M total

du -sh /var/lib/rabbitmq/mnesia/node/stream/__events_1718855104927203033
12M     /var/lib/rabbitmq/mnesia/rabbit@im-rabbitmq-rabbitmq-update-0.im-rabbitmq-rabbitmq-update-headless.instance-manager-feature.svc.cluster.local/stream/__events_1718855104927203033

RabbitMQ allows us to set retention policies for max segment size, max stream size and max age:

  • The retention policies will mainly be evaluated when a new segment is created. Segments should thus not be too big as otherwise policies would not be evaluated but also not be too small as this causes overhead for the broker.
  • Both max age and max stream length need to be reached for deletion to occur.
  • Different to queues we can change these via policies later on without having to delete the stream Policies take precedence over stream arguments rabbitmq/rabbitmq-server#3087

https://www.cloudamqp.com/blog/rabbitmq-streams-and-replay-features-part-3-limits-and-configurations-for-streams-in-rabbitmq.html
https://groups.google.com/g/rabbitmq-users/c/TQG_nE2m4GQ

We decided we want to keep messages for 1h. Our types of messages take up little space 20974 messages / 12MB. Even storing 12000 MB worth of messages would not cause us harm on disk. These would make up ~ 2.1 million messages which if we would reach in 1h would highlight a much bigger problem. This and the fact that the max age and stream size would be combined in an and is why we only pick the max age retention.

We picked

  1. a max segment size of 1MB (this holds ~20974÷12=1747.83 of messages)
  2. a max retention time of 1h

Testing

We have an integration test with 2 users streaming events. The users are in a shared group and one of them is in an exclusive group. We then test the routing logic of users only getting events they should see. We also test resuming when a connection is cancelled.

Docker issues

We switched to using the Docker image in our tests https://github.com/bitnami/containers/tree/main/bitnami/rabbitmq as this is what we use in the cluster.

https://www.rabbitmq.com/blog/2021/07/23/connecting-to-streams When connecting to a stream we pass a URI but the clients will then ask the RabbitMQ nodes for their host/port and use that to stream. This is configured via advertised_host/port in RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS or rabbitmq.conf. This config cannot be changed during runtime.

Our tests run outside of Docker while RabbitMQ is in a container. The advertised_port is 5552 by default. If we rely on Docker picking a random port our Go tests will not be able to connect. We thus expose the fixed port 5552. We set advertised_host to localhost as our host is not able to resolve the Docker container name or IP (at least not without more ⛑️).

The Bitnami image does not allow setting RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS an env var so we need to mount a config file.

Gnomock does not allow mounting a file 😓 Gnomock also uses the official RabbitMQ image. We thus have to use testcontainers for more control over the Docker container. We only use testcontainers for RabbitMQ.

Additional reading

pkg/event/handler.go Fixed Show fixed Hide fixed
@teleivo teleivo force-pushed the sse-events-consume branch from bc1b14a to 894fb7f Compare June 13, 2024 09:01
@teleivo teleivo changed the title feat: sse events feat: add SSE streaming via /events Jun 13, 2024
@teleivo teleivo force-pushed the sse-events-consume branch 4 times, most recently from 2f45396 to 81905c8 Compare June 13, 2024 10:16
pkg/event/handler.go Fixed Show fixed Hide fixed
@teleivo teleivo force-pushed the sse-events-consume branch from 84a6baa to c0dba9f Compare June 15, 2024 04:55
@teleivo teleivo added the deploy Used to toggle deploying PR branches to the "feature" env. label Jun 17, 2024
@teleivo teleivo force-pushed the sse-events-consume branch from 9eb98f8 to 6a4d641 Compare June 17, 2024 08:17
@teleivo teleivo removed deploy Used to toggle deploying PR branches to the "feature" env. labels Jun 17, 2024
@teleivo teleivo force-pushed the sse-events-consume branch 3 times, most recently from 5f64987 to 687bf7b Compare June 18, 2024 12:55
@teleivo teleivo added deploy Used to toggle deploying PR branches to the "feature" env. and removed deploy Used to toggle deploying PR branches to the "feature" env. labels Jun 18, 2024
@teleivo teleivo force-pushed the sse-events-consume branch 8 times, most recently from adfa48e to 659bfcb Compare June 19, 2024 06:54
@teleivo teleivo requested a review from tonsV2 June 19, 2024 07:21
@teleivo teleivo force-pushed the sse-events-consume branch 3 times, most recently from 07a6a0e to da30ecd Compare June 19, 2024 08:38
teleivo and others added 4 commits June 20, 2024 09:54
We originally used slogin but switched away from it due to some limitations.
We forgot to get the request id from the context via our own function. It
was thus not found in the context.
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/handler.go Show resolved Hide resolved
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/handler.go Show resolved Hide resolved
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/handler.go Outdated Show resolved Hide resolved
@teleivo teleivo marked this pull request as ready for review June 21, 2024 03:27
@teleivo teleivo requested a review from radnov June 21, 2024 09:36
teleivo added 4 commits June 21, 2024 11:42
its not needed
so use matchUnfiltered as the consumer should not get messages that have no group
the predicates do return false on error but it looks odd
that we do not return
pkg/event/handler.go Outdated Show resolved Hide resolved
pkg/event/event_integration_test.go Show resolved Hide resolved
pkg/event/event_integration_test.go Show resolved Hide resolved
pkg/event/event_integration_test.go Show resolved Hide resolved
pkg/event/event_integration_test.go Show resolved Hide resolved
@teleivo teleivo requested a review from tonsV2 June 21, 2024 11:27
Copy link
Contributor

@radnov radnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@teleivo teleivo enabled auto-merge (squash) June 22, 2024 03:50
@teleivo teleivo disabled auto-merge June 22, 2024 03:50
@teleivo teleivo merged commit e941b0a into master Jun 22, 2024
7 checks passed
@teleivo teleivo deleted the sse-events-consume branch June 22, 2024 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants