-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: emirge's role #75
Comments
I think that version B is just version A with some default environments included. For example, one for people who intend to develop packages, and one for those who just want to run stuff. Edit: I agree with everything you just said @majosm about the functionality and motivation of emirge. However the motivation behind our continued discussions about the functionality isn't about whether emirge can be used as an external, or more general tool, but remaining general enough to support one of my use cases - but I think it's an important one. What does a person do when there is a need to run multiple simulations at the same time with different versions of the code? This is often faced by our main end-user (e.g. the person doing the prediction). It is my understanding that since only one version of each package can be used at the same time in any given environment - that this situation requires multiple installations of mirgecom , each in its own environment. Is that true? |
No, that doesn't cover it at all. Suppose I'm working on something that requires branch Defaults don't cover this. We either encode this in git or manually pass these |
There is imho a simpler third option: Telling users to run |
I
We agree on this, @majosm . I am not arguing against continuing to use version control with emirge, and being able to modify the "requirements.txt" file there, and check that in as a branch. To me VersionB still looks like VersionA, but with source control. I've never advocated that any of our files should be taken out of source control - and I missed that about your initial description of VersionA. My real issue is dealing with the case where a single user needs to run multiple versions of the code at the same time on the same machine. I just need emirge to have enough functionality to allow that use case. That's the case behind all of my issues with emirge. emirge@master currently works for me - I just want to retain some of its behaviors. |
Ok, cool. Slack comments seemed to suggest otherwise, but I'm glad we're on the same page now.
Does that come from this part:
? To me, this isn't the approach ABATE/TEESD should be taking in a post-#72 world. Part of what I was trying to get at above and in Slack is that emirge's (or whatever "top-level" package we decide on) Can ABATE/TEESD be adjusted such that its projects can be branches of emirge (or whatever "top-level") instead of mirgecom? It would be nice if we could create a branch in emirge with a specified set of package versions, then just tell TEESD, "Hey, go test this emirge branch". |
Yes. That is how it works now, and how I would like for it to continue working. I misspoke before, each project is a different branch of emirge, not mirgecom. But each version of emirge that I test is associated with a particular branch of mirgecom. I was using emirge@teesd for that - but since all the emirge changes, I have switched to emirge@master and specify the --branch option to get a particular branch. I'm ok if this changes, it is sort of orthogonal to the troubles I have. As long as emirge@some_branch can install mirgecom@a-branch - any branch I need, then I'm fine with that. If I limit the testing to one branch, this all works fine. If I need more than one version of the code to be running at the same time on the same platform (e.g. like during prediction or when testing changes or testing multiple branches) - that's when the trouble starts for me. The feature of emirge that allows the multiple simultaneous installs to work is its ability to install each mirgecom installation into its own environment using a common conda. Current emirge does this. I seek to retain this behavior, and I don't think this feature runs afoul of our collective vision for emirge |
I believe that's true. |
Uhh, I think this would be hard to do. How would you keep emirge's requirement.txt in sync with whatever branch X of mirgecom requirements.txt describe? |
Then I would like if emirge could (continue to) do that.
I think I can just check out emirge@xyz, and know that it install(s) the version of mirgecom that I need. Can't I? |
Ok, I think the current version of PR #72 can still do that.
I don't think there is currently a way to do this automatically, and I struggle to imagine even a theoretical way how to do that automatically. The manual way to do this (either in emirge@master or with PR #72) is to manually select a branch of mirgecom, or just tell emirge to install mirgecom@master, run |
Wouldn't it amount to just checking out emirge, make a branch, edit the requirements.txt to be the version of mirgecom that I want and then just check that back in as a branch of emirge? What would stop me from doing that? |
Are you going to create a branch in emirge for every branch in mirgecom? That seems... not good... |
Naw, just the ones I want to share or be able to recreate remotely. Like if I need you or Matt to look at something (for the umpteenth time), then I can just check in an emirge branch that will install exactly the packages you need to recreate my environment. Look at it, run it, then blow it away. |
That seems like a very labor-intensive process and would also duplicate the the requirements in two different files, like I mentioned above. |
I don't follow that it is labor intensive, or the duplication concern. I'm not saying there would be a new requirements file - just that the requirements file can differ between emirge branches, like any other file can differ between branches. |
Regarding the labor intensity: Imagine you create a branch X in mirgecom, and have to create a corresponding branch X in emirge. Some days later, you change the requirements in mirgecom's branch X, and have to remember to make the corresponding changes in emirge's branch X. Regarding the duplication: instead of the relatively coarse-grained dependencies tracked in the #72 PR, you now need to track every change to mirgecom requirements.txt (whether to the master branch or another branch) in a second file that is outside mirgecom (emirge's requirements.txt). |
This seems like the required amount of labor to me. Iff I want/need emirge to be able to install my branch directly, then yes - I make a branch of emirge that does it (simply by editing the emirge/requirements.txt), and then checking that branch in. In many cases (including for our personal individual use), simply checking out emirge@master and installing will be enough, because we can easily ( some of us more easily than others ) switch between our branches for our current environment. No need to make an emirge branch for that.
|
Gotcha. Right, I think you'd want to clone emirge several times and install each into a separate environment. If #72 doesn't currently allow this, it should be fixed up to do so.
I don't see it as being very labor intensive. Keep in mind, this type of duplication already exists, e.g.: if you want to use a different branch of loopy, you already need to modify Also, emirge's |
Ok, fair enough.
We'll need to duplicate changes in mirgecom's requirements.txt to emirge's requirements.txt. |
I don't think we need to modify meshmode's etc.
Certainly, and PR #72 already has such a facility, by using |
Oh good, then we don't need to change mirgecom's either, we can just change emirge's. 😄
I'm not really talking about optional dev packages, I was thinking more along the lines of things we would use in our simulations that fall outside of the scope of mirgecom as a solver-focused package. |
Not really, since mirgecom's requirements.txt should reflect accurate requirements as well. Not everyone will use emirge to install mirgecom.
Ok, we could add another file, |
Nor will everyone use mirgecom to install meshmode or grudge. 🙂 To clarify one thing: I don't think every dependency of mirgecom should go in emirge's
Why? |
Sure, but wouldn't that be the responsibility of the meshmode etc. package then? Just to clarify: If you need a different loopy branch due to a change in mirgecom, changing the requirements.txt in mirgecom should be enough; you normally don't need to change the requirements.txt in other packages that also happen to require loopy.
Right, and thats the situation with #72 currently (minus the setup.py vs. requirements.txt shenanigans), right?
To make them easier to install (optionally)? |
This might call for a few model use cases - especially the ones we all think we want and/or need. Maybe we can tweak our use cases and the proposed solution i.e. PR #72 (if any tweaking is needed), until they meet up. I have some expectation that my use case may be part of what needs "debugged" here. But here it is: Multiple working installs for the end-user.The end-user has some additional packages to install to their environment specified by Install 1:
Some explaining of the arguments:
After #72 and the removal of those options - the user must either follow either: Now the end-user has a fully working mirgecom@branch1 installed on the platform using Install 2
After #72, the user will have used one of the manual install paths, MIP1, or MIP2 or some combo thereof. I contend that MIP2 is superior because: Now the user wants to take his simulations or a portion of those say from Quartz to Lassen, or share his setup with others (which he does). Having followed MIP2, then he can just check in his emirge changes in a branch (e.g. emirge@eu_branch1). In this sense, MIP2 seems superior even to emirge@master. Additionally if the user wants to automate this process later, it is easier because the automaton can just checkout emirge@eu_branch1 and "hit go", instead of scripting manual steps. |
Yeah, and so apart from occasional hacks to me this doesn't seem like it's creating too much duplication or making it very labor-intensive to maintain.
(I was thinking more non-optional packages.) Anyway, the point I was trying to make with these two lines of discussion is that we're trying not to treat mirgecom as the "head honcho" package. The idea is for the mirgecom package to have a targeted role (similar to, e.g., grudge), with the possibility that there may be other packages we create in CEESD that sit at the same (or higher?) level in the dependency hierarchy. |
Personally, I'd like to see the conda installation/environment creation done in a separate (optional) script, if possible.
I'm leaning towards saying this should go away...
Question: does the install script do anything extra with these beyond
I think you would want to switch branches before installing, right?
It doesn't, as far as I know. We would need a script for that I think. |
I've been advocating for making the conda step separate for some time. It is a convenience at best, and just in the way most of the time.
Indeed, this goes away with #72.
No, this is just convenience. Consider that if you wanted to automate this, then first you'd need to get emirge, install the ceesd env to get conda and a new compatible environment, then script installing extra stuff above-and-beyond ceesd requirements into the new conda/env. This is more difficult and error-prone than you might think. If the install script has the option to put in extra stuff just by listing packages - then this makes it much easier to automate environment customization.
As @matthiasdiener has been saying - we can just install from master, then go in and switch the branch of mirgecom. Before installing, I have no mirgecom directory to go into to switch the branch. Manual install path 2 is the one where I switch the branch in the emirge/requirements.txt before installing.
|
This is kind of what I meant by "doing something extra" (maybe poor wording). i.e., it's not exactly equivalent to something like: ./install.sh <args>
for x in <extra conda packages>; do
conda install x
done
for y in <extra pip packages>; do
pip install y
done Instead there's some additional processing going on inside the script that makes having those options worthwhile. Right?
Ahhh ok, I misunderstood what MIP1 was doing. Got it now. |
I meant to say, no, there is nothing special about allowing these extra options. It only makes it more convenient for adding stuff to the environment in the install process. The The thing that makes these options nice is that they allow the user to insert additional things that the emirge install script can easily put into the CEESD environment on-the-fly. Without the options, then I need to extract knowledge about which conda to use and which environment name to add the packages to, and then ensure that my additional install scripts pick up the right environment settings. On the command line, this is a trivial thing to do, but in scripts it is sort of cumbersome - and introduces yet another set of scripts to run to setup testing environments, etc. Since these options do not harm, and provide a useful function (useful to me), I'd say keep them. One other thing that you asked about earlier @majosm and I've just experienced again today reminding me why it was there.... You asked me something like "why do you check out your own mirgecom since emirge has just done already when it installed mirgecom?". I gave the wrong answer when I said I don't need to any more. When ABaTe does its Continuous mode building, it checks the repo for updates every 5-10 minutes. It does this just by doing The Continuous issue was never solved. But we subvert it by just not doing Continuous-type ABaTe testing. Nightly's will be enough for the production compute platforms and we'll let the github CI do all the continuous testing. |
After some back-and-forth on Slack, I noticed we still have some differing thoughts on emirge, so I wanted to start up one final (hah) discussion in an attempt to iron this out.
In emirgecom, we (tentatively) agreed that emirge should no longer parse mirgecom's
requirements.txt
. However, there seem to still be two views on what role emirge should play:View A: emirge is a tool that installs an environment (for our purposes, a CEESD environment).
View B: emirge is a CEESD environment.
In View A, emirge would take as parameters: (some of these are optional) 1) a conda environment name, 2) a list of conda packages, 3) a
requirements.txt
, and 4) an installation directory. It would create the conda environment, install the conda packages in that environment, and install the pip packages in the specified installation directory. Everyone would use the master branch and construct their ownrequirements.txt
for whatever package versions they want to use. These would exist outside of emirge (i.e., not tracked). (@MTCam please let me know (or edit this directly) if I've gotten any of this wrong.)View B is much like View A, but a given branch in emirge would encode a set of CEESD package versions via a single tracked
requirements.txt
. Users would maintain different environments by creating branches with different versions of this file. Installation would be performed once*, with CEESD packages going into the emirge directory. Switching between environments would be done by: 1) checking out a different branch of emirge, and 2) running a helper script to go into the package directories and check out the branches specified in emirge'srequirements.txt
. (No conda environment switching is done as there is only one environment needed.)(* Multiple installations are still supported via separate clones, as with anything else.)
I lean towards View B. I think there is a need for something to sit at the top level and keep track of our soon-to-be many different development environments, and now that it looks like mirgecom isn't going to fill that role anymore it has left a bit of a void. I don't think the approach in View A alone can be made to deal with the issues discussed in #53, and manually passing around
requirements.txt
s when someone wants to share an environment or move to a different machine sounds like a mess.As I understand it, the primary motivation for View A is that this could become something useful in a more general sense (i.e., to install things other than just a CEESD environment). This may be true, and I don't want to discourage that from being explored; but I don't think it needs to be emirge that does this, per se. We can extract that functionality into a separate package (with a more appropriate name; there isn't really anything "mirge" about it when it's installing something else) and then have emirge depend on it.
Thoughts?
The text was updated successfully, but these errors were encountered: