-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit cyclic foreign vars #164
Comments
Could you provide a concrete example where such quasi-cyclic situation happens? I wonder if you could refactor your problem into 3 processes I think that foreign vars with something like |
I don't know the internals but I thought - maybe naively - you could probably just ignore the variables with
Fair enough A - A process for a tree (here leaf) architecture - in practice all sorts of properties A.leaf_area (out) --> B In order to compute Of course we could just merge it all together in one process but that is where we come from and we would like to split it up in modules/processes. |
Would it be an option to declare |
Ok, a fat apology I forgot to mention that circular imports where in my way when I first tried to get it working ;) This time I tried a combination of xs.group in A (avoid explicit linking and circular import of C) and an 'inout' in C with the same group. So, moving everything into groups plus 'inout' is a solution but I loose a bit on the side of your nice model visualization/inspection features since I obscure where the variable comes from. But that's worth it! Thank you for your outstanding support! |
No worries at all!
You're welcome! |
Sorry, need to bring this up again. I am still thinking about another way to get this "cyclic" dependencies working (this is indeed one of our three pain points apart from sparse and growing index variables): I sketched a little prove of concept. Here is a notebook: https://github.com/jvail/xarray-simlab/blob/seemingly-cyclic/notebooks/seemingly_cyclic.ipynb I am not entirely happy with the current work-around: Declare an 'inout' rather than 'out' variable if you want to let previously run processes to use it) because it requires to deliver an input for each of them although initializing them in the initialize func would be sufficient; i.e. I can not just use intent 'out'. Also, as far as I can see from the code where you compute the dependencies that you are filtering for 'out' variables to determine process deps and sorting. So simply allowing a "pseudo cyclic" variable with intent 'out' wont work because process ordering will be ambiguous. Therefore, I think, another property is required (for the sake of clarity I called it "seemingly_cyclic") to explicitly exclude this dependency from being evaluated. It seems clear from my tests that if there is such a dependency the producer must declare a global name (or group - that's a bit confusing I have to say - a group is always global I guess) and the consumer use a global ref (group) in any case to avoid cycles at module import level. Therefore I added the property "seemingly_cyclic" only to "global_ref". I also added a dashed arrow to the dot graph to visualize this dependency. I can not see why this might open a "pandora's box" and introduce chaos - but I may - most certainly - lack knowledge about the mechanics of your lib. P.S.: I will try to make a suggestion for #163 as well but that one is really tricky - bear with me. |
I kept playing a bit with this idea and tested if we could use it with xs.foreign as well. If we split it into base classes that implement the different aspects of a process I can get around the cyclic import issue and we could arrive at a pretty "plugable", modular model design (in our use case): i.e. just replace a process with some different math as long as it provides the same output. The advantage of xs.foreign is that the dependency is explicit - "global_ref" hides it. Makes the model more readable, I think. |
Thanks for sharing your notebooks, it gives a clear overview of the problem. I admit that the 'inout' variable with a default value workaround is not ideal in the case where you just would use TBH, I'm reluctant to implement "pseudo cyclic" variables in xarray-simlab for the main reason that it breaks a fundamental concept of this framework, i.e., that the process workflow may be represented as DAG. Allowing one exception like this is opening the door to more exceptions, which may end up with something conceptually very hard to understand. It also adds a parameter to |
Sure, understood and accepted - wont bring it up again - however :) , out of curiosity...
I don't understand why it would break it because you have it already (global/group + 'inout'). I think such a change would just bring this option to the surface and improve it because an input is no longer required. Is there a fundamental difference between input_vars values and values set in initialize? Isn't it both just the state at step -1?
No, only to |
Ah I see, you're right. Probably it's not clear enough that the value of a 'inout' variable declared in a process should be updated in a different simulation stage in that case (that was clear only in my mind so far :-) ). An 'inout' This implicit condition is a bit unfortunate, actually, and I don't see how we could properly inspect the process classes to check this condition when building a new model :-/. Your suggestion of a "seemingly_cyclic" reference at least makes the intention a bit clearer, but it doesn't solve this implicit condition which still applies here. I'm not very comfortable with adding another way to allow this. |
D'accord. Fair enough.
Wow - that's subtle. A bit hard to follow, I find. I think you could put a bit more faith in the modeler who is supposed to know what he is doing. Nevertheless - I regard this as settled and remain quite now and hope you keep up your excellent work! :) |
Thanks! Please continue to give helpful feedback, it's very much appreciated! In this case it made me realize that we can actually break the DAG in the current version! |
:) All right. Though no guarantees on the "helpfulness". Why not think about the DAG this way:
Taking the time dimension into consideration. If you then go from C to A the cycle disappears. It is just offset in time. The interpretation of the DAG might be too "flat", static. This is what actually happens with 'inout' - it is just not transparent in the visualization. |
I remain a bit confused by this, I think there should be some clear documentation as to what is the best way to have this quasi-cyclic behaviour. Like: what is the correct way to have a variable that is updated on every timestep?
However, AFAIK this is not checked (and indeed I could assign values in the Maybe this should be for the model designer, and proper documentation is just what is needed? |
I agree this is a major limitation in xarray-simlab. I currently use other kinds of hacks in fastscape to circumvent this because computing all uplift and/or erosion processes from the topography at time t may not be the best / most stable solution compared to the chained application of those processes, one after another, each updating the topography. Also, I've never been 100% happy with how output snapshots are saved between I think that the cleanest way to overcome this limitation would be to allow user-defined process ordering and relax some constraints on variable intent, e.g., allow 'inout' in multiple processes for the same variable. This would be highly welcome, but this would require a big refactoring and design effort. An open question is also how can we support both automated (any DAG) and user-defined (single-chain DAG) process ordering? Combining both approaches in the same model would be challenging, but I still like the possibility to just add a process to a model and let xarray-simlab figure out where it could be inserted in the DAG. That's very convenient when we want to extend a model "at the edges" (e.g., for fastscape adding processes that simulate limit conditions such as climate or tectonics). |
Aah, now that explains all the
So, the conclusion here is that the
Oof, that indeed sounds like a lot of work... Honestly, I think this limitation is already overcome with proper use of the
So this could also be accomplished by allowing changing intent of
That then puts the hmm... on second thought, this may become more complicated. Also, I'm not sure if I completely understand what you want to accopmlish.. from looking at fastscape, it seems that you want to be able to change whether tectonics and erosion are done concurrently or simultaneously, and for that change the ordering in the graphs. This should be a (good) other issue I think? |
On another note, I do really like what @jvail did with the dotted arrows. It may be nice to have references to an |
Current problemLet me show a simple example: @xs.process
class ProcessA:
foo = xs.variable(intent='inout')
@xs.process
class ProcessB:
foo = xs.foreign(ProcessA, 'foo', intent='in') Currently it is possible to do: model = xs.Model({'a': ProcessA, 'b': ProcessB}) Let's say that both What we may also want to do: @xs.process
class ProcessC:
foo = xs.foreign(ProcessA, 'foo', intent='inout') But this is not possible either: Proposal: user-defined dependenciesThis could look like: model = xs.Model(
{'a': ProcessA, 'b': ProcessB, 'c': ProcessC},
dependencies=[('a', 'b'), ('c', 'a')]
) Where the When we know the order of all processes in the model (here
model = xs.Model([('b', ProcessB), ('a', ProcessA), ('c', ProcessC)])
model = xs.Model.from_ordered_collection({'b': ProcessB, 'a': ProcessA, 'c': ProcessC})
model = xs.SequentialModel({'b': ProcessB, 'a': ProcessA, 'c': ProcessC})
model = xs.Model({'b': ProcessB, 'a': ProcessA, 'c': ProcessC}, dependencies='strict') With user-defined dependencies, we could allow Unless one needs to clear some resources a each time step, we could get rid of xs.create_setup(
model=model,
clocks={'clock': ...},
input_vars={'a__foo': ...},
output_vars={'b__foo': 'clock', 'a__foo': 'clock', 'c__foo': 'clock'}
) There's some potential challenges, though:
|
Not exactly, A better illustration is the SurfaceToErode process and the SurfaceAfterTectonics process that inherits form it. This creates additional arrays on the grid that could be avoided with user-defined process dependencies. |
Great to see the discussion ongoing here. Just some observations while we are porting our model to simlab:
Not so sure about this. You could maybe create a new |
dotted arrowsSo I added an option the other discussion (which is not really the thread title)this becaome a bit too long....
So I can for now see two examples, that would be solved by allowing the user to order processes:
TL;DR
I made a proof-of-concept PR, that is exhibited in the example notebook.
Actually, both cases are basically the same, if an `inout` variable is declared, it does not have automatic dependencies. Therefore, the user can add them. But to still allow for automatic dependency sorting, we use a dictionary such as:
`{'process_name__var_name':'dependent_process_name'}` for determining dependencies. Then, only the dependency for that variable is removed (so other processes can be dependencies as well). Strictly speaking, this is not really necessary, and this removing can actually made redundant when graph reduction is implemented #120
having a chain of processes that are executed sequentially, in a user-defined order. ->
|
This is implemented in #177, but as of now, it does not check if
Actually, this is even more subtle:
both
apart from implementation, how do we ensure even in the
where an
Yes, in the current implementation, where the user specifies dependencies as a
I let the |
Yes, exactly! The user must add them, except in the two following cases where this isn't required:
I like the idea of using a dictionary for user-defined dependencies. We could even allow lists of process names as dict values. However I don't understand why we need the variable name in dict keys. Could you expand on why you skip those variables when retrieving the process dependencies in #177 please?
I'm afraid this won't be enough for all cases. Suppose we have a
Hmm how
This should not be harder to debug if we change how output variable snapshots are saved, like I suggest just before and/or after the execution of every process during the |
We need the variable name, since a process can depend on multiple other processes (with different variables). The variable name is used to create a dictionary of e.g.:
where we find that I preferred this method over a |
I don't think it's necessary to skip it. The topological sorting algorithm will yield consistent process ordering with or without |
The checking algortihm works now! (I added a transitive reduce, since it was easy) Both require a descendants/deps dict to work. That would also be very useful in drawing the dotted arrows. @benbovy what are your thoughts on adding that to the model class? EDIT: I have kept it in the
in fact, we never have to check for
done that, maybe a set is better? it needs one conversion less here, but maybe users are not as familiar with sets?
added that |
Great! I commented on #177.
Yep internally coercing into a set a sequence (tuple, list, etc) that passed through the API seems good to me. |
Sorry to open up a new 'battleground' - the last thing that haunts me for now:
What do you, @benbovy, think about the following idea, use case:
Let's say we have two processes
A
andB
.A
has a varx
that is a foreign variable inB
andB
has a variabley
that is a foreign variable inA
. Wont work because you can not derive the DAG in that case.What if I annotate the foreign variable
B.y
inA
with something likeis_cyclic=True
. That would mean:A.y
, when building the process graph, the orderingB -> y -> A
relation in the visualization (maybe with astep - 1
label, dotted line...)NaN
in step = 0 if y is notinout
in B or the value set inB.initialize
It just happens that we have a lot of these "cycles" but typically it is just the value from step
time - 1.
So only quasi-cyclic.The text was updated successfully, but these errors were encountered: