Platform for hosting realtime, stateful servers with zero downtime deployment and horizontal scaling on Kubernetes
The primary goals of Discovery is to solve two very important problems during the deployment of real-time state-full servers.
- Zero-downtime deployment.
- Horizontal scaling.
These are non-trivial because,
- Server's communications are state-full (uses websocket mostly instead of https REST model)
- Most often states are stored in-memory (as states fetching, manipulation and relaying needs to be done very fast and they are ephemeral).
- Due to above properties, doing a rolling deployment or scaling horizontally like normal stateless apps is not feasible in k8s.
- Rolling deployment deletes the old pods and load balancers can route connections to incorrect pods.
Discovery aims to solve this by acting as a platform over the Kubernetes.
Examples of real-time state-full servers are game-servers, chat servers etc..
- Statefull servers like game-server's states can't be mapped to a relational database.
- As mentioned above states are to be fetched, manipulated continuously, which is no an use-case of Postgres.
- Even though state can be stored in Redis as its a KV db. We can't allow shutting down
servers while users are connected due to reasons below,
- reconnection spike will be huge.
- Servers of these type are most often
alive
, that means processes will be running some code always (ex consider game timers/background simulations) even without user intervention. - Deleting/shutting servers has adverse effect on client-side too.
Added advantages of building over Kubernetes,
- We can distribute our servers in different regions, this will benefit in reducing the latency as realtime servers are latency prone.
- We can continue use stateful methods, but can also use persistence storage like Postgresql, when in need, like user login etc..
- Leveraging the open-source, almost industry standard deployment solution.
- Zero downtime deployment.
- Horizontally scalable.
- Network reconnections always route to correct server.
- Built-in dashboard for deployment operations.
- APIs that can be run from iex repl for deployments.
- Can deploy language agnostic servers.
- Code as the source of truth.
The goal is, there should be never a downtime for the current users, while we update our statefull servers.
- Discovery deployed server_v1 to k8.
- Client_A and Client_B asks Discovery to get the latest url via https, Discovery returns server_v1's url
- Client_A and Client_B then directly connects to server_v1 as a websocket connection, all further messages and events are send without discovery, like a normal websocket app.
- When a state-full session is over, client can repeat from step 2.
- Discovery deployed server_v2 after some time.
- Client_A and Client_B will be still communicating with server_v1
- New Client_C and Client_D asks Discovery to get the latest url via https, Discovery returns server_v2's url
- Client_C and Client_D then directly connects to server_v2 as a websocket connection, all further messages and events are send without discovery, like a normal websocket app.
- When a state-full session is over, clients can repeat from step 2.
- Here we can see, for new upgrades we don't shut older deployments.
- Eventually there will be no connections in server_v1 and then Discovery gracefully shuts it.
- All new statefull sessions will be connecting to new deployment.
docker login
minikube start --driver=docker
(starting minikube)kubectl config use-context minikube
(set kubectl to use minikube cluster)minikube addons enable ingress
(Setting up ingress-nginx)
- clone the repo
mix setup
iex -S mix phx.server
The dashboard is called Bridge
Bridge url
- An app is our statefull server
- App's name will be acts as its universal label in Discovery.
- Each app will be having an dedicated url to Discovery.
- Clients use this dedicated endpoint url to get the app's endpoint. (As specified in
Approach
section).
- When deploying an app, we have to specify the docker image name (which should be public as of now).
- Bridge shows each app's deployment logs/activites.
- Deployment CRUD operations to app are available as button clicks.
-
Client will hit Discovery API, and get the latest server endpoint url.
GET - http://localhost:4000/api/get-endpoint?app_name=nightwatch RESPONSE - { "endpoint": "nightwatch.minikube.com/9d00cc18" }
- For demo, we will be using a bare MMO game client, server.
- Disclaimer: These projects were not made for spawnfest, we will be just integrating Discovery to these.
- Players can traverse the world in four directions.
- They can attack others by clicking
attack
button - If in any other player is in the radius of 1 grid in any direction, then he/she will die.
- Player respawns after 5s, when he/she dies.
We recorded a gameplay to demo the discovery's Zero downtime deployment feature. Demo will be explained by timestamp.
-
[00 - 0:15] - Created a new app
watchex
(A phoenix server using websockets) -
[0:16 - 1:56] - Deployed
watchex
withmadclaws/watchex:0.1.5
madclaws/watchex:0.1.5
build has normal gameplay, as we mentioned before.- Players attacks, dies and respawns.
- Also both clients are in normal chrome tabs.
-
[1:57 - 3:32] - Made a new deployment with
madclaws/watchex:0.1.6
0.1.6
was our upgrade to the server, that removes the RESPAWN feature from the game.- Opens 2 new clients in incognito tabs.
- Players attacks, dies, but they are not respawned. As expected
-
[3:33 - 4:12] - Revisits our old clients, which were running in normal chrome tabs
- But here clients can still respawn, ie they are still connected to old server/build.
-
[4:13 - end] - Reloads old clients (normal chrome tabs).
- Now when they die, they are not respawned.
- ie they are connected to the latest deployment.
Summary: A new upgrade to the system, will not shut the old deployments and will not affect the existing clients. MISSION ACCOMPLISHED
server respawn code in 0.1.5 build
removed respawn code in 0.1.6 build
Client hitting Discovery for latest server endpoint
- Taking out Discovery from minikube and trying with EKS.
- Automatic zombie deployment cleanup.
- More functionalities in Bridge.