Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there a limit to the tree length? Some commands not executing despite similar ones are #38

Open
stemangiola opened this issue Oct 4, 2020 · 18 comments

Comments

@stemangiola
Copy link

I have a makeflow file with ~17K commands. Some of them at the root of the tree

dev/test_simulation/input__slope_0.5__foreignProp_0.8__S_30__whichChanging_1__run_2.rds:
        Rscript ~/PhD/deconvolution/ARMET/dev/test_simulation_makeflow_pipeline/create_input.R 0.5 0.8 30 1 2 dev/test_simulation/input__slope_0.5__foreignProp_0.8__S_30__whichChanging_1__run_2.rds

Are not executed for some reason, while other combination of parameters are. I don't understand why.

@stemangiola
Copy link
Author

@stemangiola
Copy link
Author

As you can see I have few holes in my benchmark

image

The workflow hangs and does not submit any more jobs, and if I interrupt and start again it hangs on starting workflow

@btovar
Copy link
Member

btovar commented Oct 5, 2020 via email

@btovar
Copy link
Member

btovar commented Oct 5, 2020

Stefano, which command line are you using to run the workflows?

When you say you are changing parameters, are you also changing cores, memory, etc., or only parameters of your tasks?

@stemangiola
Copy link
Author

Each block of tests depending on what algorithm is tested is run with different resources

here the command

makeflow -T slurm -j 100  --do-not-save-failed-output test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow

@btovar
Copy link
Member

btovar commented Oct 5, 2020

Could you send me the log.out file from:
makeflow -T slurm -j 100 --do-not-save-failed-output test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow > log.out 2>&1

@stemangiola
Copy link
Author

parsing dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow...
local resources: 32 cores, 193277 MB memory, 148722940 MB disk
max running remote jobs: 100
max running local jobs: 100
checking dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow for consistency...
dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow has 38880 rules.
recovering from log file dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow.makeflowlog...
checking for old running or failed jobs...
checking files for unexpected changes... (use --skip-file-check to skip this step)
starting workflow....

and hangs forever

@btovar
Copy link
Member

btovar commented Oct 5, 2020

I forgot to add the -dall debug flag, sorry about that:

makeflow -dall -T slurm -j 100 --do-not-save-failed-output test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow > log.out 2>&1

@stemangiola
Copy link
Author

log.zip

@btovar
Copy link
Member

btovar commented Oct 5, 2020

Stefano, could you also send me dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow.batchlog?

@stemangiola
Copy link
Author

I don't have batchlog. I have rerun the whole workflow. I think one of the issue (non consistent) is that I increased the combination in the makefile after the workflow was completed and some of the new banchmark dies not execute.

It is common to execute the whole workflow and try some some parameter combinations

@btovar
Copy link
Member

btovar commented Oct 7, 2020

Stefano, something that just occurred to me. Are you re-running the makeflow in place without a cleaning operation in between? It could be that makeflow is getting confused by a mismatch between the previous execution log and a newly modified makeflow.

@stemangiola
Copy link
Author

Probably it is the case. But does cleaning lead to the deletion of the dependencies that are already completed. Of course if I delete the log everything gets deleted when the makeflow is called again

@btovar
Copy link
Member

btovar commented Oct 7, 2020

Yes, they will be deleted. A safer mode of operation in this case is to not modify the original file, but instead write the updates to differently named makeflow files. Then you can execute each update in sequence.

@stemangiola
Copy link
Author

I understand, but this is not always possible in combinatorics scenario.

expand_grid(
	slope = c(-2, -1, -.5, .5, 1, 2), 
	foreign_prop = c(0, 0.5, 0.8),
	S = c(30, 60, 90),
	which_changing = 1:16,
	run = 1:5,
	method = c("ARMET", "cibersort", "llsr", "epic")
)

I can add arbitrary parameter space here with no effort. It would be great if makeflow could update the log file with the new dependencies, and just add them to the tree.

Otherwise makeflow would be suitable to only static workflows.

@btovar
Copy link
Member

btovar commented Oct 8, 2020

I think that just appending new rules may be workable, with the understanding that removing a rule, or changing a previously executed rule will result in failure. Would that be something helpful to your use case?

@stemangiola
Copy link
Author

Yes. Usually when doing benchmarking we want to increase combinations. We don't need to delete rules as we can ignore already executed dependencies. And we would eliminate rules on another run if needed.

The issue is that if now I add rules to an existing makefile (with log) the only one executing are the new one at the bottom. The new one in the middle are ignored. This mixed behaviour seems more unwanted than designed.

@btovar
Copy link
Member

btovar commented Oct 8, 2020

Stefano, thanks for your input! Let me discuss it with the team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants