-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KOJO-127 | StateTransitionLogger LADR #74
base: 4.x
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# Kōjō State Transition Logger | ||
* Start Date: 2019-08-08 | ||
* Author: Przemyslaw Mucha (przemyslaw.mucha@55places.com) | ||
|
||
## Summary | ||
This document proposes a design for Kōjō logic that will emit log messages for every state transition of a job (e.g. from `waiting` to `working`) | ||
|
||
## Problem | ||
* Why are we doing this? | ||
* To gain visibility into the lifecycle of any given job | ||
* What use cases does it support? | ||
* Forensic analysis of a job that didn't behave as expected | ||
* Monitoring how much work is being done (more accurately than polling `kojo_job`) | ||
* Monitoring how much time is spent in each state | ||
|
||
## Proposed Solution | ||
Job state transitions are persisted to RDBMS in `Neighborhoods\Kojo\State\Service::applyRequest()`. | ||
To that logic we will add another `INSERT` into a new table: `kojo_job_state_transitions` (working title). | ||
The schema of that table will include all job information (ID, type, etc.) as well as process information (execution environment, PID, etc.). | ||
Those two statements will be wrapped in a database transaction to ensure that nothing is written to `kojo_job_state_transitions` unless the actual transition succeeds. | ||
|
||
We will add another first-class process type to the Kōjō complement (e.g. `Server`, `Root`) called the `StateTransitionLogger` (working title). | ||
The `StateTransitionLogger` will be a child of the `Root`, and will be most like `Worker` processes. | ||
There will only be one `StateTransitionLogger` per Kōjō cluster, in the same way that there is only one `Worker` acting as the `Maintainer`. | ||
`Root`s will be responsible for babysitting the status of the `StateTransitionLogger` (by inserting `command.addProcess('state_transition_logger')` messages into the "publication" redis list if nothing is holding the mutex). | ||
Once the `StateTransitionLogger` is instantiated, it will poll the `kojo_job_state_transitions` table continuously. | ||
For each transition event the `StateTransitionLogger` pulls into memory, it will emit a message and then delete that row. | ||
This guarantees at-least-once delivery of transition messages. | ||
|
||
## Backward Incompatible Changes | ||
If there are any assumptions within Kōjō about the process hierarchy, they could be violated by the addition of a new process type, but this is unlikely. | ||
Otherwise it's a purely additive modification to Kōjō. | ||
|
||
## Example 1 | ||
1. There exists a dynamically scheduled job with `previous_state: 'new'`, `assigned_state: 'waiting'`, `next_state_request: 'working'`, and `work_at_datetime < NOW()` | ||
1. A recently spawned `Worker` process selects that job to work | ||
1. That `Worker` process reaches `Neighborhoods\Kojo\Foreman::_updateJobAsWorking()` and invokes `Neighborhoods\Kojo\State\Service::applyRequest()` to transition the job from `waiting` to `working` | ||
1. The job, process, and transition information are inserted into `kojo_job_state_transitions` | ||
1. The `Worker` process continues execution and hands over control to userspace | ||
1. Concurrently, the `StateTransitionLogger` process for that cluster (not necessarily in the same execution environment) queries `kojo_job_state_transitions` and pulls that transition information into memory. | ||
1. The `StateTransitionLogger` emits a message, deletes the row, and moves on to the next transition event | ||
1. Concurrently, our logging infrastructure consumes the emitted messages | ||
|
||
## Future Scope | ||
Part of the design for the `StateTransitionLogger` is to delegate more one-per-cluster responsibilities to first-class processes. | ||
This is in contrast to the typical Kōjō pattern of requiring every newly spawned `Worker` process to attempt to perform these responsibilities. | ||
If this implementation of the `StateTransitionLogger` is successful, it would be desireable to refactor out `Maintainer`, `Scheduler`, etc. responsibilities in the same way. | ||
|
||
## Drawbacks | ||
There are multiple unknowns with this design which could cause it to become infeasible to implement, in which case we'd have to go with the approach outlined in Alternative 3. | ||
|
||
## Unresolved Questions | ||
* Should we log when a job is created (i.e. scheduled)? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the surface I would answer "yes". When an execution cluster is constrained on resources or limited by process pool size, there may be jobs that never exit the Is there an implementation detail that makes logging on creation difficult? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After we talked through the implementation a little bit last week, I realized that including job creation is easier than not including it |
||
|
||
## Alternatives | ||
1. `Neighborhoods\Kojo\State\Service::applyRequest()` emits the message itself when the transition happens | ||
1. There exists a dynamically scheduled job with `previous_state: 'new'`, `assigned_state: 'waiting'`, `next_state_request: 'working'`, and `work_at_datetime < NOW()` | ||
1. A recently spawned `Worker` process selects that job to work | ||
1. That `Worker` process reaches `Neighborhoods\Kojo\Foreman::_updateJobAsWorking()` and invokes `Neighborhoods\Kojo\State\Service::applyRequest()` to transition the job from `waiting` to `working` | ||
1. `Neighborhoods\Kojo\State\Service::applyRequest()` emits the transition message itself | ||
1. The `Worker` process continues execution and hands over control to userspace | ||
1. Userspace overrides this process's `\PDO` connection using `Neighborhoods\Kojo\Api\V1\RDBMS\Connection\Service::usePDO()` | ||
1. Userspace begins a transaction and issues a `complete_success` request via the Kōjō API (which causes a message to be emitted) | ||
1. Userspace rolls back the transaction, and issues a `complete_failed` request (which causes a contradictory message to be emitted) | ||
1. `kojo_job_state_transitions` is populated via triggers on `kojo_job` | ||
1. There exists a dynamically scheduled job with `previous_state: 'new'`, `assigned_state: 'waiting'`, `next_state_request: 'working'`, and `work_at_datetime < NOW()` | ||
1. A recently spawned `Worker` process selects that job to work | ||
1. That `Worker` process reaches `Neighborhoods\Kojo\Foreman::_updateJobAsWorking()` and invokes `Neighborhoods\Kojo\State\Service::applyRequest()` to transition the job from `waiting` to `working` | ||
1. Once `Neighborhoods\Kojo\State\Service::applyRequest()` updates `kojo_job`, a trigger writes the old and new row information to `kojo_job_state_transitions` | ||
1. The `Worker` process continues execution and hands over control to userspace | ||
1. Concurrently, the `StateTransitionLogger` process for that cluster (not necessarily in the same execution environment) queries `kojo_job_state_transitions` and pulls that transition information into memory. | ||
1. The `StateTransitionLogger` emits a message, deletes the row, and moves on to the next transition event | ||
1. Unfortunately, in this scenario, there's no process information available, so only job information is emitted | ||
1. Without process information, we can't determine whether there's a systemic issue on a particular execution environment | ||
1. Concurrently, our logging infrastructure consumes the emitted messages | ||
1. Make the `StateTransitionLogger` another responsibility of each process when it starts up (vs a first-class process type) | ||
1. There are no flaws with this approach per-se, but we are of the opinion that continuing to add responsibilities to newly-spawned `Worker` processes is unsustainable and results in less deterministic behavior than first-class, single-responsibility processes | ||
|
||
## Rejected Features | ||
As mentioned above, refactoring all `Worker` process responsibilities is outside the scope of implementing this "prototype" process type. | ||
|
||
## References | ||
* [LDR Google calendar](https://calendar.google.com/calendar?cid=NTVwbGFjZXMuY29tX3JrNG12NzFnYzEwNDhwZ3EwcWptMDZidGdjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mucha55 how are you planning on using PDO to accomplish this with user-land transactions? testing PDO::inTransaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doctrine DBAL supports "nested" transactions, but even without them this is fine: If there's a userspace transaction happening, because it being rolled back will mean the job state never changes, which means there shouldn't be anything written to
kojo_job_state_transitions
anywayThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per an offline discussion, we have to use
\PDO
for managing the transaction, since a Doctrine Connection isn't aware of what happens in the\PDO
connection that it was instantiated with