The application demonstrates how event sourcing could be implemented in Scala. This is the most controversial part of my application's architecture I believe, because it can be done in many ways, with different trade-offs, and many people doing it in one way would consider doing things in another way wrong.
Event sourcing is the idea that some journal of events is your source of truth. Everything else is just derived from that journal - SQL tables, MongoDB documents, Neo4j nodes, etc - all of that is just a result of applying updates to the empty database where each update is generated from an event.
Why should we use the event sourcing approach? When each action made by the user is immediately executed on the databases, external services and so on, it is trivial to implement. As a matter of the fact, it can also scale really well:
- design a monolith
- write only to some primary database
- read perform with several replicas
- throw Redis or other cache in front of most used queries or precompute some results and store it in ElasticSearch or sth
- you end up close to a setup that StackOverflow used in 2013 to handle a lot of traffic
Thing is, the more writes you have, and the more different databases you are using to perform specific queries, the harder it gets to update things. Suddenly you have to do a ballet dancing around coordinating things.
At some point you notices that you would prefer to write to one place, then do some computations triggered by changes, and have results updated in several places. So you introduce Read and Write models properly. If you pick your SQL database as the Write model, you have an issue. How do you know that you got all the updates, so that you turned each change into a corresponding change in all databases of the Read models? Pooling the DB might end up with missing some update if two of them happened almost at the same time. Better solution is the usage of Change Data Capture, which generates a log of all changes committed to the database. Then this log could be read and for each entry you could perform an update of the Read model.
This way you are basically generating events and performing their projections into Read model. Thing is, these events are not designed by you, they are generated by some plugin connected to your DB. If you wanted control over this event, it would be better if you just emitted the event yourself, and then treat the SQL server as yet another projection target.
If all databases like: Postgres, ElasticSearch, Neo4j, etc are just target of a projection, we could design this mechanism so that each event you be applied safely to it to avoid some headache:
- if the event would appear twice in the log, we ignore the second appearance
- the event should contain all the information to make projection possible so that if some service we relied on gets deleted, change its API or become inaccessible in some other way, we could still recover you database if we took a database snapshot from some point, and reapply events that were emitted since that moment
- we might decide to change our read model in backward and forward incompatible way, because the current design is getting in our way more like helping - if we have all the events since the beginning of history, we can just design a new projection, run it, and delete the old database without any absurd migrations
That design imposes 2 requirements:
- events are immutable - once they are committed, they cannot be edited or replaced or dropped (this is a simplification - see Versioning in an Event Sourced System by Greg Young for the complete picture)
- projection should be idempotent - you have to carefully design the projection so that even if you drop everything and start from scratch, you will end up with the same outcome, even if they concern e.g. payment through a payment service which no longer exists
The second part is tricky. You cannot simply create e.g. SendEmailRequested
event and make its projection sending an email. Rerunning this projection would
send email again. PaymentRequested
could perform payment twice. That's why
event has to be something that happened. EmailSent
. PaymentCreated
.
So these events should be emitted after some action was executed. It is helpful to define Events and Commands. The Command is something that you enqueue to perform some action, then you perform it - perhaps by calling some external service - and the result is immortalized as event.
This requires you to carefully design Commands. Sometimes you need some retry mechanism in them. Sometimes they need to call external service which takes time. Usually waiting for the result would be impractical, in worst case it could block API with a lot of requests waiting for their commands to finish. So it often makes sens to create a Command Query Responsibility Separation. When you perform the command, you might learn that the result will be available later under specific ID, and the request returns immediately. Command lands on some queue, and will be performed by some process at later time.
As a result you end up with Commands as the Write model, and Events projections as Read models. This introduces some complexity:
- Commands aren't immediately executed - eventual consistency
- you have to implement the logic separately:
- executing Commands and generating Event
- each Event's projection into a separate Read model
- there is a lot of repetition during moving data from Command to Event to target database
At the same time it solves a lot of problems:
- even the wildest change to the Read model can be implemented without a hacky, error prone migration
- adding a new Read model requires just adding another projection - each projection is run separately and can separately trace up to which journal index it was performed, so we can have several projections which are up-to-date and are applying new things as they arrive, while a new projection could be running from the beginning of time
- if we protect events from deleted no database
DROP
will be irreversible
Both Commands and Events are posted on Kafka. I guess, it's not the best database for this purpose, but it is a quite popular setup, so it makes sense to demonstrate it: Events describing the same entity have the same partition ID, so they should end up in the same consumer within a consumer group, which can help avoid race conditions (when older event would arrive to processing for after a newer one).
So far there is little to do in Command handling, so at best we could argue, that it enables scalability. Events can have more than one projection.
The boring part of turning one piece of data into a very similar (but not identical!) piece of data is handled by Chimney.
I am aware that the way I implemented events is far from perfect and in certain cases idempotency would be broken, but for a demo it should be good enough. Securing it would complicate code quite a bit, and TBH I saw worse systems on production and they were good enough. Still, treat my code more like an introduction to the idea, rather than a golden standard.