-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State management #4
Comments
I'm not familiar enough with the different serialization steps to know which is best so I will defer on that until I read more. If the framework can save data in a manner that is agnostic to everything except the measurement environment that it is writing to then I think that's a fine option. I think shared state is likely needed. Also agree that measurements should only ever read shared variables and let the controller update them. I don't think this will arise but if in the future the framework (maybe we can refer to it as the controller) is working in parallel we'll have to think about how to handle that. |
(Agreed on parallelization. In current conception this could perhaps be handled simply with file locks….) |
If we've decided to run the scheduler as a service, this means that it will have to interact with some internal record anyway. In my mind this saving state for modules is totally possible. I think we still have to decide how the controller and modules interact with state. In other words, if there is some state variable that determines whether or not a test runs, does the controller pass that variable off to the module to decide whether or not the test runs or does the controller make that call on its own? To give a concrete example, we regulate the execution of speed tests based on how much data has been consumed in a month. Assuming there is some user-defined cap, either the controller will compare the cap with the total data consumption or the module will. If the controller handles it and sees that we've hit the cap, we need to be mindful that the test is marked as complete and doesn't get shoved into the retry queue. I don't have a complete picture of how that would work because we haven't created the scheduler/controller yet but something to keep in mind. If the module handles it, it would have to exit with a special exit code to indicate that it should not be re-run. The other thing to keep in mind is that the state variable needs to be reset periodically. This probably means designing state variable as a triple |
Yes, I think recent conversations pushing the framework to operate as a service – or anyway maintaining its own internal state to some degree – only support this use-case of maintaining state on behalf of modules, (even though these are separate concerns). Regarding how we might use state, I imagine two options. One, the basic feature, is outlined by this Issue: state is generated by modules, recorded by the framework, and made available again to modules during execution. This might cover a great many use-cases. (And, yes, a module should certainly not be re-run in this case. I have imagined that re-runs must be opted into – e.g. with the exit code Another option, perhaps a further enhancement, is to provide configuration hooks, e.g. a "should this module actually run?" hook, evaluated by the framework right before executing the module: ping:
schedule: 0 */6 * * *
unless: state.thats_enough I.e. perhaps this expression is evaluated with a reference, For framework-wide monthly bandwidth consumption, this is annoyingly complex, but nonetheless conceivable and a useful proof of concept: speedtest-ookla:
schedule: 0 */6 * * *
unless:
STATE |
values |
sum(
"bandwidth.consumption.%s" | format(datetime.today.strftime('%YYYY-%mm')),
start=0
) >= 1024 (The expression – which I'm imagining as a Jinja templating library expression – needn't be written precisely that way; but, I think this helps to make it as clear as possible.) Anyway, {
"ping": {"thats_enough": false},
"speedtest-ookla": {
"bandwidth": {
"consumption": {
"2022-05": 204,
"2022-06": 201,
"2022-07": 20
}
}
}
} I.e. speedtest modules may simply accumulate their consumption by the current month, as a matter of course, as a feature, e.g.: state = … # acquire state from framework
consumption_now = … # calculate test iteration's consumption
month_key = date.today().strftime('%Y-%m')
consumption_total = state.setdefault('bandwidth', {}).setdefault('consumption', {})
consumption_total[month_key] = consumption_total.get(month_key, 0) + consumption_now
# write state back to framework
… And so this way, we can enforce that modules may only write to their own state, but may read from each others. So long as speedtest modules follow a common convention of writing their bandwidth consumption to the key path All that said:
(And certainly there should be shared utilities for use by Python-language modules: #12.) |
See: internet-equity/fate#19. |
Some measurement modules might require some form of state to be preserved across iterations.
(For example, speed tests might track their consumed bandwidth to avoid over-consumption.)
Questions:
➞ Most simply – serialized at
{venv prefix}
or{user prefix}
or/var/lib/netrics
+/state/{measurement}
– or something; but, unclear what's best.➞ Could simply provide above path to measurement process via its environ (
NETRICS_STATE
) and let measurement go for it➞ Could in same manner provide path to a FIFO into which state data has been written, and from which framework will update state, (moderating access to actual state file/whatever).
➞ If needed, but with moderation, could make global state available but read-only. (In this case, if implemented as above, would perhaps need either FIFO or better just a PIPE [
os.pipe
] – framework can provide the read descriptor of the pipe to the measurement, containing the aggregate state.)(This might say allow multiple separate measurements which care about bandwidth consumption to each record their own in a consistent manner and abide by the total they aggregate from shared state.)
The text was updated successfully, but these errors were encountered: