Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blog post] Final project: A DSL for Studying Metastability #425

Merged
merged 7 commits into from
Dec 23, 2023

Conversation

alifarahbakhsh
Copy link
Contributor

[Closes #395]

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool! Thanks for the extremely clear writeup; I think you did a great job introducing the background, and the example really shows off where you would like to go with the DSL. It would, of course, have been even more impressive if you had somehow gotten the whole interpreter working far enough to actually execute the programs you have evaluated on paper, but I think this is an impressive start regardless.

I just have a few minor suggestions within!


1. An agent has a finite input queue and a finite output queue.
2. An agent has finite receiving and sending rates on its input and output queues, respectively.
3. An agent has a finite processing rate, where "processing" means taking something from the input queue and putting it into the output queue (note that there is no content to processing).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "no content to processing", do you mean that the only thing an agent can do with a message it receives is to put it, unchanged, into its output queue? It cannot, for example, produce N output messages for each input message?

EDIT: Never mind! I can see now that you have generalized this model in the paragraphs below. Probably just ignore this comment, but I'm leaving it here because I think it points out an interesting constraint on this notional baseline model.

content/blog/2023-12-11-A DSL for Metastability.md Outdated Show resolved Hide resolved
1. Every "operation" in this DSL is an action. The meaning of the term "operation" will be clearer in the next section.
2. The "+" operator for an action designates nondeterministic choice in general, and that is the intended long-term purpose here. For now, I have limited it to deterministic branching based on the value of a predicate.
3. The "*" operator is our favorite Kleene star. The intended long-term behavior is repeated application of an action. In practice, just like branching, it will be guarded by a predicate for now (I did not want to clutter the star - it's so beautiful!).
4. $\texttt{put}$ only accepts state variables with a request for their value as it's second argument. I could not find a neat way of expressing this in the grammar (yes, I'm not good with grammars).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Typically, this constraint would not be expressed in the grammar—it's the sort of thing you'd express with something like a type system or some other static rule.

The environment is not expressed by the DSL, and is assumed to operate based on pure choice: each "action" for the environment is just an arbitrary state transformation subject to the limitations imposed on the environment.
These limitations are expressed by specifying the variables that the environment can touch.
For instance, the code for some agent can be wrapped with a giant branching action with a guard $\texttt{not crashed}$, and a $\texttt{nop}$ for the else clause.
The environment can then choose to crash and uncrash the agent however it likes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conception of the environment makes sense to me, but I don't quite understand what a "trigger" is… does that terminology matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! To explain this I would have to unpack our definition of metastability, which is based on temporal logic, and it would take a lot of space. That is why I decided to leave it out. In summary, there exists a predicate P over environment actions, and a trigger is an initial period of the environment going crazy, until it "eventually" satisfies "always" P.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. Maybe it be worth just briefly mentioning that you are also interested in defining metastability formally so you can build an automatic detector, but not going into detail about what that definition actually is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

content/blog/2023-12-11-A DSL for Metastability.md Outdated Show resolved Hide resolved
# Implementation
I decided to implement this DSL in Rust.
The main reason was that I wanted to learn Rust, and after spending countless hours wrestling with the borrow-checker, I can confirm that one should not toy around with Rust.
I am, by the way, still wrestling as we speak.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a serious challenge! But people often find that it can be worth the struggle in the end…

content/blog/2023-12-11-A DSL for Metastability.md Outdated Show resolved Hide resolved
content/blog/2023-12-11-A DSL for Metastability.md Outdated Show resolved Hide resolved

1. The ownership model of Rust, in tandem with the borrow-checker, is a totally new way of thinking about code for me. Things that are pretty straightforward in other languages become complicated. This problem is exacerbated by the fact that I find it convenient to pack stuff together into a struct, and it is not trivial to me at all how to use a struct's methods on itself with a careful observation of ownership, specially in recursive calls.
2. As I mentioned earlier, I have decided to make the DSL look like NetKAT. My hunch for this was that the general story in our setting looks very similar to the one studied by NetKAT: requests (packets) travelling around, and getting processed by agents (routers) based on their type (port, switch, etc). However, the semantics are currently drastically different. The important abstraction in NetKAT is a packet, and the semantics revolves around a packet: the network is a mapping of a packet to a set of packets. In our DSL, in contrast, the semantics is much more similar to the classic small-step semantics of, say, IMP, where you have a state (or a store), and you apply some operations on it. My hope was to be able to use something similar to the equational inference system baked into NetKAT, via Kleene algebras, to study metastability. I have not yet seen a clear connection.
3. On the long run, one would hope to have a performant interpreter, i.e., one that can interpret large systems in a short span of time. The crux of the difficulty lies in parallelizing the execution in the right way. While there are straightforward ways of attempting this, I have not yet found a good way of doing it.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed; writing parallel interpreters can be really hard. One thing you might consider, in the long term, is to sacrifice parallelism entirely for the straightforward interpreter mode. Instead, you can exploit parallelism at the level of your model checker, i.e., the exploration of a large space of inputs. That exploration can be much more naturally amenable to parallelism.

1. The ownership model of Rust, in tandem with the borrow-checker, is a totally new way of thinking about code for me. Things that are pretty straightforward in other languages become complicated. This problem is exacerbated by the fact that I find it convenient to pack stuff together into a struct, and it is not trivial to me at all how to use a struct's methods on itself with a careful observation of ownership, specially in recursive calls.
2. As I mentioned earlier, I have decided to make the DSL look like NetKAT. My hunch for this was that the general story in our setting looks very similar to the one studied by NetKAT: requests (packets) travelling around, and getting processed by agents (routers) based on their type (port, switch, etc). However, the semantics are currently drastically different. The important abstraction in NetKAT is a packet, and the semantics revolves around a packet: the network is a mapping of a packet to a set of packets. In our DSL, in contrast, the semantics is much more similar to the classic small-step semantics of, say, IMP, where you have a state (or a store), and you apply some operations on it. My hope was to be able to use something similar to the equational inference system baked into NetKAT, via Kleene algebras, to study metastability. I have not yet seen a clear connection.
3. On the long run, one would hope to have a performant interpreter, i.e., one that can interpret large systems in a short span of time. The crux of the difficulty lies in parallelizing the execution in the right way. While there are straightforward ways of attempting this, I have not yet found a good way of doing it.
4. The previous point is exacerbated by the fact that one would also like to have a model-checker that would use the interpreter. Ideally, the model-checker would receive a given system and produce a trace "ending" in a bad self-sustaining loop caused by a trigger. Following the maxim "immature optimization is the root of all evil", I have not spent much time thinking about this yet, but I am expecting some serious issues down the road.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big thing that occurs to me would be required for this: a way to detect metastability. This would require operationalizing a definition of metastability that you can check by looking at the current state, or perhaps the trace of recent states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. We have realized that, to make things even harder, metastabilty might actually be a hyperproperty and not a property.

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Nice work again.

@sampsyo sampsyo merged commit 0597c59 into sampsyo:2023fa Dec 23, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Designing a cloud DSL for analyzing metastable failures
3 participants