Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow scripts to be run instead of notebooks #93

Open
TeaganKing opened this issue Apr 10, 2024 · 10 comments
Open

Allow scripts to be run instead of notebooks #93

TeaganKing opened this issue Apr 10, 2024 · 10 comments

Comments

@TeaganKing
Copy link
Collaborator

We'd like to be able to run a script rather than just a notebook with CUPiD.

A good test of this would be to run a sea ice script instead of a sea ice notebook.

This will require saving figures generated from the notebook and reading them in with a standard naming convention to an html file that could show the figures. See #16 for the html component of this.

@mnlevy1981
Copy link
Collaborator

@dabail10 started to work on this and ran into a major stumbling block -- the ploomer script runner doesn't offer an obvious way to change environments before running the script. Dave's example was being run in cupid-dev, which doesn't have the necessary analysis packages installed. Lev will dig into the ploomer documentation and then I'll work with Dave to get a fix in place.

@rmshkv
Copy link
Contributor

rmshkv commented Apr 18, 2024

As a first pass, having looked through the Ploomber documentation, it doesn't look like there's a supported way to specify an environment for the ScriptRunner task. I'd recommend starting with looking at the documentation for ScriptRunner. If we wanted to go the route of putting in a PR to Ploomber or implementing our own ScriptRunner class, it actually looks like it might be a pretty straightforward update to their code, but that might be out of scope, in which case we should put together a more hacky solution in our own code.

@TeaganKing
Copy link
Collaborator Author

TeaganKing commented May 14, 2024

Just for reference, the function which includes ScriptRunner is create_ploomber_script_task.

In the ScriptRunner documentation, it looks like we should be able to specify the environment in the configuration file using something like this, adapted for the script that you have (and I don't entirely remember the structure that you had in the config file thus far):

tasks:
  - source: script.py
  - product: output.txt
  - env: cupid-analysis

So, specifying 'env' in an additional line should ideally work to allow for using the cupid-analysis environment. Maybe we can test it out tomorrow?

@rmshkv
Copy link
Contributor

rmshkv commented May 14, 2024

Hmm, maybe I missed something - could you point me to where in the documentation or Ploomber source code that feature is mentioned?

@TeaganKing
Copy link
Collaborator Author

Apologies, I may have jumped ahead without enough research on this... I saw this in an example, but after another look, I'm now thinking that the example was not accurate and it might not actually work in the format that I commented in the code snippet above. @rmshkv did you already have a particular place in the Ploomber code you were looking at to update?

@TeaganKing TeaganKing linked a pull request May 15, 2024 that will close this issue
@TeaganKing
Copy link
Collaborator Author

It looks like the kernelspec_name in NotebookRunner is set roughly here. Perhaps we could add an environment to this part of ScriptRunner?

Or, if we prefer to do something from our own side, could we possibly try something like this from within the script? I'm not sure what the best method is, but wanted to toss out some ideas.

import subprocess
subprocess.run("conda activate cupid-analysis", shell=True, check=True, executable='/bin/bash')

For reference, I'm including the NotebookRunner documentation and ScriptRunner documentation.

@TeaganKing
Copy link
Collaborator Author

TeaganKing commented May 15, 2024

From @mnlevy1981 , we could also possibly use something like os.system("conda run -n cupid-analysis python script.py") in run.py to replace line 254 (call to create_ploomber_script_task.

@rmshkv
Copy link
Contributor

rmshkv commented May 15, 2024

Some rough thoughts: I haven't tried any of these particular solutions, but in general, I think the first attempt should be along the lines of modifying Ploomber's code or implementing our own script runner task code if we don't want to contribute to that code base. It'd be nice to keep the interface of specifying the environment in config.yml rather than having to add code to the script itself, and it would also be best to keep the script running mechanism as a Ploomber task, since that task creation step isn't actually where the script code is being run (that happens later when the DAG is built, which takes into account dependencies between tasks if those exist, including between script and notebook tasks).

@TeaganKing
Copy link
Collaborator Author

TeaganKing commented Aug 20, 2024

I'm working on updates to ploomber to address this.

@TeaganKing
Copy link
Collaborator Author

TeaganKing commented Aug 20, 2024

One other note, however: with the context of a potential new environment that includes not only the dependencies for running the CUPiD commands but also for executing notebooks (and scripts), perhaps this issue would be resolved by a combined environment anyways. That said, the feature of being able to specify a non-default kernel is still useful, although perhaps not as important as getting the script functionality initially implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants