Workflow running out of memory #570

spitfiredd · 2022-11-17T22:15:44Z

Describe the Bug

Worker processes not spawning with enough memory or scaling; therefore Nexflow will error with exit status 137 (not enough memory)

Steps to Reproduce

name: foo
schemaVersion: 1
workflows:
  foo:
    type:
      language: nextflow
      version: dsl2
    sourceURL: workflows/foo
contexts:
  dev:
    instanceTypes:
      - "r5.large"
    engines:
      - type: nextflow
        engine: nextflow

Child processes are spawing with 1vCPU and 1024 MEMORY

Relevant Logs

Main Process

2022-11-17T14:00:01.866-08:00	Version: 22.04.3 build 5703
2022-11-17T14:00:01.866-08:00	Created: 18-05-2022 19:22 UTC
2022-11-17T14:00:01.866-08:00	System: Linux 4.14.294-220.533.amzn2.x86_64
2022-11-17T14:00:01.866-08:00	Runtime: Groovy 3.0.10 on OpenJDK 64-Bit Server VM 11.0.16.1+9-LTS
2022-11-17T14:00:01.866-08:00	Encoding: UTF-8 (ANSI_X3.4-1968)
2022-11-17T14:00:01.866-08:00	Process: 47@ip-redacted.compute.internal [redacted]
2022-11-17T14:00:01.866-08:00	CPUs: 2 - Mem: 2 GB (1.5 GB) - Swap: 2 GB (2 GB)
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.780 [main] WARN com.amazonaws.util.Base64 - JAXB is unavailable. Will fallback to SDK implementation which may be less performant.If you are using Java 9+, you will need to include javax.xml.bind:jaxb-api as a dependency.
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.799 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): redacted
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.799 [main] DEBUG nextflow.Session - Work-dir: redacted
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.799 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /root/.nextflow/assets/redacted/bin
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.871 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[AwsBatchExecutor]
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.886 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.954 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.975 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 3; maxThreads: 1000
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:58.123 [main] DEBUG nextflow.Session - Session start invoked
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:59.049 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution

Child Process

2022-11-17T14:00:01.867-08:00	Essential container in task exited - OutOfMemoryError: Container killed due to memory usage
2022-11-17T14:00:01.867-08:00	Command executed:
2022-11-17T14:00:01.867-08:00	fastp     -i USDA_soil_C35-5-1_1.fastq.gz     -I USDA_soil_C35-5-1_2.fastq.gz     -o "USDA_soil_C35-5-1.trim.R1.fq.gz"     -O "USDA_soil_C35-5-1.trim.R2.fq.gz"     --length_required 50     -h "USDA_soil_C35-5-1.html"     -w 16
2022-11-17T14:00:01.867-08:00	Command exit status:
2022-11-17T14:00:01.867-08:00	137
2022-11-17T14:00:01.867-08:00	Command output:
2022-11-17T14:00:01.867-08:00	(empty)
2022-11-17T14:00:01.867-08:00	Command error:
2022-11-17T14:00:01.867-08:00	  .command.sh: line 2:   188 Killed                  fastp -i USDA_soil_C35-5-1_1.fastq.gz -I USDA_soil_C35-5-1_2.fastq.gz -o "USDA_soil_C35-5-1.trim.R1.fq.gz" -O "USDA_soil_C35-5-1.trim.R2.fq.gz" --length_required 50 -h "USDA_soil_C35-5-1.html" -w 16

Expected Behavior

spawn processes with enough memory or scale.

Actual Behavior

Container ran out of memory

Screenshots

Additional Context

ran workflow with the following command: agc workflow run foo --context dev

Operating System: Linux
AGC Version: 1.5.1
Was AGC setup with a custom bucket: no
Was AGC setup with a custom VPC: no

The text was updated successfully, but these errors were encountered:

biofilos · 2022-11-28T07:48:06Z

I am seeing a similar behavior with cromwell. I give a task 64GB. In AWS batch, I see the following warning next to the Memory information

Configuration conflict
This value was submitted using containerOverrides.memory which has been deprecated and was not used as an override. Instead, the MEMORY value found in the job definition’s resourceRequirements key was used instead. More information about the deprecated key can be found in the AWS Batch API documentation.

I see an "Essential container in task exited". However, when I click on the job definition. It appears to have 8GB allocated memory.
Is there a different way to specify memory?

vvalleru · 2022-11-28T19:21:06Z

Thanks for reporting this issue. Is this an issue with the 1.5.2 release as well?

biofilos · 2022-12-01T11:37:03Z

It is still an issue with v 1.5.2 (cromwell)

markjschreiber · 2022-12-07T15:52:58Z

@spitfiredd The child processes are spawned with a default of 1vCPU and 1024 MEMORY. If tasks need more memory or CPU then you would typically make these requests as process directives for CPU and memory. (https://www.nextflow.io/docs/latest/process.html#cpus) and (https://www.nextflow.io/docs/latest/process.html#memory).

markjschreiber · 2022-12-07T15:54:49Z

@biofilos AGC is currently using an older version of Cromwell. This older version uses the deprecated call to AWS Batch, hence the error. In our next release we will update the version of Cromwell used.

As a possible work around, you might consider deploying a miniwdl context to run the WDL.

spitfiredd added the bug Something isn't working label Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow running out of memory #570

Workflow running out of memory #570

spitfiredd commented Nov 17, 2022

biofilos commented Nov 28, 2022

vvalleru commented Nov 28, 2022

biofilos commented Dec 1, 2022

markjschreiber commented Dec 7, 2022

markjschreiber commented Dec 7, 2022

Workflow running out of memory #570

Workflow running out of memory #570

Comments

spitfiredd commented Nov 17, 2022

biofilos commented Nov 28, 2022

vvalleru commented Nov 28, 2022

biofilos commented Dec 1, 2022

markjschreiber commented Dec 7, 2022

markjschreiber commented Dec 7, 2022