Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tar_make throws error after completing pipeline with crew_controller_slurm #53

Open
11 tasks done
monicathieu opened this issue Mar 3, 2025 · 2 comments
Open
11 tasks done
Assignees

Comments

@monicathieu
Copy link

Prework

  • Read and agree to the Contributor Code of Conduct and contributing guidelines.
  • Confirm that your issue is a genuine bug in the crew.cluster package itself and not a user error, known limitation, or issue from another package that crew.cluster depends on.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

When running a targets pipeline with a Slurm controller, an error gets thrown after the pipeline finishes. It doesn't prevent the pipeline from running all the way through if the pipeline is in fact clean, but it does trigger the tar_make() pipeline error reporting code after reporting "ended pipeline". It looks like a data.frame() error thrown when targets calls crew_summary() on the Slurm controller after finishing the pipeline. Maybe something to do with new expected arguments (or newly deprecated ones) for crew_controller_slurm()?

Most of the time, I'm okay with ignoring the error because the pipeline has actually worked. Occasionally, though, my pipeline does actually throw a real error that I need to debug. In those cases, the crew_summary() error is still technically the last error thrown by the pipeline, and so THAT'S the error captured and reported in tar_meta(). In order to actually capture the target error, I then have to re-run without using crew with tar_make(use_crew=FALSE) so that the last error thrown by the pipeline is the target error.

thank you for your help!

Reproducible example

  • Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Targets script below. Note that some of the Slurm parameters are specific to our own cluster, like the account and partition names.

library(targets)

# Set target options:
tar_option_set(
  packages = c("tibble"),
  controller = crew.cluster::crew_controller_slurm(
    workers = 2,
    seconds_idle = 30,
    options_cluster = crew.cluster::crew_options_slurm(
      verbose = TRUE,
      script_lines = "#SBATCH --account=default",
      log_output = "/home/%u/log/crew_log_%A.out",
      log_error = "/home/%u/log/crew_log_%A.err",
      memory_gigabytes_required = 1,
      cpus_per_task = 1,
      time_minutes = 29,
      partition = "short"
    )
  )
)

tar_source()

sparse <- tar_target(
  name = test_target,
  command = tibble(x = 1:5)
)

list(
  sparse
)

tar_make() call, using the SLURM controller:

tar_make()

tar_make() output below

-
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

tar_source(): these files do not exist: R
▶ dispatched target test_target
Submitted batch job 76395
● completed target test_target [0.018 seconds, 176 bytes]
▶ ended pipeline [7.959 seconds]
Warning messages:
1: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
2: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
3: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
4: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
5: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
6: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
7: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
8: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
9: seconds_interval was deprecated on 2025-01-17 (version 0.10.2.9003). Alternative: none (no longer used). 
10: Unknown or uninitialised column: `worker`. 
Error:
! targets::tar_make() error

── Debugging ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    • tar_errored()
    • tar_meta(fields = any_of("error"), complete_only = TRUE)
    • tar_workspace()
    • tar_workspaces()

── How to ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    • Debug: https://books.ropensci.org/targets/debugging.html
    • Help: https://books.ropensci.org/targets/help.html

── Last error message ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    arguments imply differing number of rows: 1, 0

── Last error traceback ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    base::tryCatch(base::withCallingHandlers({ NULL base::saveRDS(base::do.c...
    tryCatchList(expr, classes, parentenv, handlers)
    tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
    doTryCatch(return(expr), name, parentenv, handler)
    tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
    tryCatchOne(expr, names, parentenv, handlers[[1L]])
    doTryCatch(return(expr), name, parentenv, handler)
    base::withCallingHandlers({ NULL base::saveRDS(base::do.call(base::do.ca...
    base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/tmp/R...
    base::do.call(base::do.call, base::c(base::readRDS("/tmp/RtmptJZ1uy/call...
    (function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
    (function (targets_function, targets_arguments, options, envir = NULL, s...
    tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
    tryCatchList(expr, classes, parentenv, handlers)
    tryCatchOne(expr, names, parentenv, handlers[[1L]])
    doTryCatch(return(expr), name, parentenv, handler)
    withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
    targets::tar_callr_inner_try(targets_function = targets_function, target...
    do.call(targets_function, targets_arguments)
    (function (pipeline, path_store, names_quosure, shortcut, reporter, seco...
    crew_init(pipeline = pipeline, meta = meta_init(path_store = path_store)...
    self$run_crew()
    self$finalize_crew()
    crew_summary(self$controller)
    data_frame(controller = summary$controller, worker = summary$worker, sec...
    data.frame(..., stringsAsFactors = FALSE)
    stop(gettextf("arguments imply differing number of rows: %s", paste(uniq...

Expected result

Ideally, this error would not throw! I have confirmed that running the same pipeline using tar_make(use_crew=FALSE) runs the pipeline and doesn't throw this issue, so it's not pipeline-related. I have also reproduced this error on my machine (Ubuntu server) in multiple targets pipelines.

Diagnostic information

  • A reproducible example.
  • Session info, available through sessionInfo() or reprex(si = TRUE).
  • A stack trace from traceback() or rlang::trace_back().
  • The SHA-1 hash of the GitHub commit of crew.cluster currently installed. packageDescription("crew.cluster")$GithubSHA1 shows you this.

Session info:

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] targets_1.10.0

loaded via a namespace (and not attached):
 [1] base64url_1.4      jsonlite_1.8.9     compiler_4.4.1     renv_1.0.11        promises_1.3.2     tidyselect_1.2.1   Rcpp_1.0.14        xml2_1.3.7        
 [9] callr_3.7.6        later_1.4.1        yaml_2.3.10        fastmap_1.2.0      R6_2.5.1           igraph_2.1.4       nanonext_1.5.1     knitr_1.49        
[17] htmlwidgets_1.6.4  visNetwork_2.1.2   backports_1.5.0    tibble_3.2.1       mirai_2.1.0        pillar_1.10.1      rlang_1.1.5        xfun_0.50         
[25] cli_3.6.3          withr_3.0.2        magrittr_2.0.3     ps_1.8.1           digest_0.6.37      processx_3.8.5     rstudioapi_0.17.1  crew_1.0.0        
[33] secretbase_1.0.4   lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.3     glue_1.8.0         data.table_1.16.4  codetools_0.2-20   getip_0.1-4       
[41] crew.cluster_0.3.4 tools_4.4.1        pkgconfig_2.0.3    htmltools_0.5.8.1 
@wlandau
Copy link
Owner

wlandau commented Mar 3, 2025

The infrastructure of crew and targets changed a lot recently, and not all versions are mutually compatible. Would you please try again with the latest releases?

  • targets 1.10.1
  • crew.cluster 0.3.4
  • crew 1.0.0
  • mirai 2.1.0
  • nanonext 1.5.1

@monicathieu
Copy link
Author

Ahhh! That fixed it--I didn't realize all of the packages had such recent interdependent updates. Thank you so much for your prompt help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants