Replies: 5 comments 2 replies
-
One other point in favour of the shadow dir approach comes from Snakemake's new provenance behaviour. If This is especially relevant for folks installing their snakebids apps to temporary, local scratch directories, where the paths may change constantly. I imagine this usecase is not uncommon, especially on the cluster. And ideally, our official policy would not be that "for this app to work, this snakemake feature must be disabled". I would thus argue that even if we don't go with the shadow dir, we should find another solution to this. (I'll raise a dedicated issue for this in case there's any other ideas) |
Beta Was this translation helpful? Give feedback.
-
Thanks for writing this up! It seems like a reasonable avenue to explore but I do think the details will matter a lot. I think solving the provenance issue is the most important impact of this proposal. We certainly don't want users to need to disable useful snakemake features to use Snakebids, and we don't want useless reruns as a result of our architecture. I think the increased complexity is something we'll need to work hard to address if we implement this. The existence (and location) of a temporary shadow dir will probably not be obvious to an inexperienced user and will make workflow debugging a lot harder in the absence of good logs. My wishlist on this topic would be (in order of implementation complexity):
A couple of other questions (that may reveal errors in how I'm understanding or thinking about this proposal):
|
Beta Was this translation helpful? Give feedback.
-
I came across a good reason not to use shadow dirs: Snakemake has a number of flags allowing the output of various secondary outputs or the configuration of in-app parameters (e.g. I can see two primary workarounds:
Of course, this is also the source of some current weirdness: right now if you make a report, the path will be relative to whatever output dir you provide, which is unintuitive. But that behaviour, at least, can be documented. All in all, this issue seems critical and I don't currently see a way around it. But I'd love to hear other ideas that might resolve it. |
Beta Was this translation helpful? Give feedback.
-
Ahh, this reminded me a conservation about VTK versions at ohbm... 😅 On that note, for the short term, do we need to temporarily constrain the Snakemake version to avoid any potential issues (not sure what version this was introduced)? I can understand why we wouldn't want to constrain versions, but I also think this would give us time to implement any features / fixes as necessary for a given version. To your first point, I agree - don't think we should enforce absolute paths. I think we had similar discussions when implementing Just for clarification, you mentioned the new provenance a couple of posts ago with |
Beta Was this translation helpful? Give feedback.
-
Ah, good catch that there are Snakemake-provided ways to introduce files relative to the working directory that aren't known by the workflow developer ahead of time. I forget whether anything like this was ever suggested, but maybe we could inspect the shadow directory, create a directory |
Beta Was this translation helpful? Give feedback.
-
Been working with snakemake shadow dirs lately, and was wondering if this idea might find useful application for snakebids bidsapps. I've spent a bit of time thinking it out, and it's definitely not a "no-brainer let's do it", as it comes with one or two critical consequences for app design, but I thought I'd lay it out as I see it.
It's already been discussed and decided that in bidsapp mode, the snakemake working directory should be changed to the user's output dir (see #61). The problem created here is that paths relative to the snakemake directory can no longer be resolved. The current workaround is for developers to use
workflow.basedir + "path"
or an equivalent every time they reference a path in their workflow.However, given that
results
,config
, and.snakemake
are all already privileged folder names in a snakemake app, it should be possible to do a "shallow shadow" of the snakemake directory, meaning a symlink to every top level folder and file. Along with these snakemake folders, symlinks forresults
,config
, and.snakemake
would point to the user's output directory. So the shadow dir would look something like this, where--->
points to a symlink destination:This approach offers two primary advantages:
workflow.basedir
, a semi-frequent source of confusion.The primary disadvantage is that any root level files or directories created in the workflow will be saved to the shadow_dir and will not persist. Two obvious examples would be the
logs/
andbenchmarks/
folder, both of which should be saved to the output directory. To work around this, we would have to register any such folders in therun.py
file so that Snakebids can make the relevant symlinks ahead of time. The API might look something like this:So it's trading one annoyance for another. In defense of the "binding" approach described above, it centralizes the workflow modifications to one line. One no longer needs to remember to use
workflow.basedir
for every relative path in the workflow. In fact, arguably, a well-behaved workflow will only output files to theresults
(viaconfig['output_dir']
),logs
, andbenchmarks
dirs, and the above line could be put in the boilerplate app, so friction could be reduced to nearly 0 for devs without special needs.On the other hand, it increases the complexity of Snakebids (more things that can break, etc), and the effort required to implement is perhaps not worth the potential payoff. So I'm not completely sold on the idea.
Alternative Approach
One similar (but less good I think) idea in the same vein of "binding" or "registration" is to allow devs to make symlinks to root level snakemake_dir directories in the output directory. For example, one could "bind"
resources
and that would make aresources
symlink right in the output_dir which would point to theresources
directory in the snakemake dir. That way, any relative paths in the workflow pointing intoresources
would continue to work. When the app finishes, these symlinks would be deleted.The problem here is that should the app get interrupted, the symlinks would remain in the output_dir as extra garbage. This approach is thus not clean like the above.
Beta Was this translation helpful? Give feedback.
All reactions