tar_make throws error after completing pipeline with crew_controller_slurm #53

monicathieu opened this issue Mar 3, 2025 · 2 comments
When running a targets pipeline with a Slurm controller, an error gets thrown after the pipeline finishes. It doesn't prevent the pipeline from running all the way through if the pipeline is in fact clean, but it does trigger the tar_make() pipeline error reporting code after reporting "ended pipeline". It looks like a data.frame() error thrown when targets calls crew_summary() on the Slurm controller after finishing the pipeline. Maybe something to do with new expected arguments (or newly deprecated ones) for crew_controller_slurm()?

Most of the time, I'm okay with ignoring the error because the pipeline has actually worked. Occasionally, though, my pipeline does actually throw a real error that I need to debug. In those cases, the crew_summary() error is still technically the last error thrown by the pipeline, and so THAT'S the error captured and reported in tar_meta(). In order to actually capture the target error, I then have to re-run without using crew with tar_make(use_crew=FALSE) so that the last error thrown by the pipeline is the target error.

thank you for your help!

Reproducible example

Targets script below. Note that some of the Slurm parameters are specific to our own cluster, like the account and partition names.


# Set target options:
  packages = c("tibble"),
  controller = crew.cluster::crew_controller_slurm(
    workers = 2,
    seconds_idle = 30,
    options_cluster = crew.cluster::crew_options_slurm(
      verbose = TRUE,
      script_lines = "#SBATCH --account=default",
      log_output = "/home/%u/log/crew_log_%A.out",
      log_error = "/home/%u/log/crew_log_%A.err",
      memory_gigabytes_required = 1,
      cpus_per_task = 1,
      time_minutes = 29,
      partition = "short"


sparse <- tar_target(
  name = test_target,
  command = tibble(x = 1:5)


tar_make() call, using the SLURM controller:


tar_make() output below

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

tar_source(): these files do not exist: R
▶ dispatched target test_target
Submitted batch job 76395
● completed target test_target [0.018 seconds, 176 bytes]
▶ ended pipeline [7.959 seconds]
Warning messages:
1: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
2: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
3: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
4: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
5: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
6: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
7: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
8: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
9: seconds_interval was deprecated on 2025-01-17 (version Alternative: none (no longer used). 
10: Unknown or uninitialised column: `worker`. 
! targets::tar_make() error

── Debugging ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    • tar_errored()
    • tar_meta(fields = any_of("error"), complete_only = TRUE)
    • tar_workspace()
    • tar_workspaces()

── How to ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    • Debug:
    • Help:

── Last error message ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    arguments imply differing number of rows: 1, 0

── Last error traceback ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    base::tryCatch(base::withCallingHandlers({ NULL base::saveRDS(base::do.c...
    tryCatchList(expr, classes, parentenv, handlers)
    tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
    doTryCatch(return(expr), name, parentenv, handler)
    tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
    tryCatchOne(expr, names, parentenv, handlers[[1L]])
    doTryCatch(return(expr), name, parentenv, handler)
    base::withCallingHandlers({ NULL base::saveRDS(
    base::saveRDS(, base::c(base::readRDS("/tmp/R..., base::c(base::readRDS("/tmp/RtmptJZ1uy/call...
    (function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
    (function (targets_function, targets_arguments, options, envir = NULL, s...
    tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
    tryCatchList(expr, classes, parentenv, handlers)
    tryCatchOne(expr, names, parentenv, handlers[[1L]])
    doTryCatch(return(expr), name, parentenv, handler)
    withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
    targets::tar_callr_inner_try(targets_function = targets_function, target..., targets_arguments)
    (function (pipeline, path_store, names_quosure, shortcut, reporter, seco...
    crew_init(pipeline = pipeline, meta = meta_init(path_store = path_store)...
    data_frame(controller = summary$controller, worker = summary$worker, sec...
    data.frame(..., stringsAsFactors = FALSE)
    stop(gettextf("arguments imply differing number of rows: %s", paste(uniq...

Expected result

Ideally, this error would not throw! I have confirmed that running the same pipeline using tar_make(use_crew=FALSE) runs the pipeline and doesn't throw this issue, so it's not pipeline-related. I have also reproduced this error on my machine (Ubuntu server) in multiple targets pipelines.

Diagnostic information

  • A reproducible example.
  • Session info, available through sessionInfo() or reprex(si = TRUE).
  • A stack trace from traceback() or rlang::trace_back().
  • The SHA-1 hash of the GitHub commit of crew.cluster currently installed. packageDescription("crew.cluster")$GithubSHA1 shows you this.

Session info:

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/ 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] targets_1.10.0

loaded via a namespace (and not attached):
 [1] base64url_1.4      jsonlite_1.8.9     compiler_4.4.1     renv_1.0.11        promises_1.3.2     tidyselect_1.2.1   Rcpp_1.0.14        xml2_1.3.7        
 [9] callr_3.7.6        later_1.4.1        yaml_2.3.10        fastmap_1.2.0      R6_2.5.1           igraph_2.1.4       nanonext_1.5.1     knitr_1.49        
[17] htmlwidgets_1.6.4  visNetwork_2.1.2   backports_1.5.0    tibble_3.2.1       mirai_2.1.0        pillar_1.10.1      rlang_1.1.5        xfun_0.50         
[25] cli_3.6.3          withr_3.0.2        magrittr_2.0.3     ps_1.8.1           digest_0.6.37      processx_3.8.5     rstudioapi_0.17.1  crew_1.0.0        
[33] secretbase_1.0.4   lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.3     glue_1.8.0         data.table_1.16.4  codetools_0.2-20   getip_0.1-4       
[41] crew.cluster_0.3.4 tools_4.4.1        pkgconfig_2.0.3    htmltools_0.5.8.1 
wlandau commented Mar 3, 2025

The infrastructure of crew and targets changed a lot recently, and not all versions are mutually compatible. Would you please try again with the latest releases?

  • targets 1.10.1
  • crew.cluster 0.3.4
  • crew 1.0.0
  • mirai 2.1.0
  • nanonext 1.5.1

Ahhh! That fixed it--I didn't realize all of the packages had such recent interdependent updates. Thank you so much for your prompt help!

