Skip to content
This repository has been archived by the owner on May 31, 2024. It is now read-only.

Workflow running out of memory #570

Open
spitfiredd opened this issue Nov 17, 2022 · 5 comments
Open

Workflow running out of memory #570

spitfiredd opened this issue Nov 17, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@spitfiredd
Copy link

Describe the Bug

Worker processes not spawning with enough memory or scaling; therefore Nexflow will error with exit status 137 (not enough memory)

Steps to Reproduce

name: foo
schemaVersion: 1
workflows:
  foo:
    type:
      language: nextflow
      version: dsl2
    sourceURL: workflows/foo
contexts:
  dev:
    instanceTypes:
      - "r5.large"
    engines:
      - type: nextflow
        engine: nextflow

Child processes are spawing with 1vCPU and 1024 MEMORY

Relevant Logs

Main Process

2022-11-17T14:00:01.866-08:00	Version: 22.04.3 build 5703
2022-11-17T14:00:01.866-08:00	Created: 18-05-2022 19:22 UTC
2022-11-17T14:00:01.866-08:00	System: Linux 4.14.294-220.533.amzn2.x86_64
2022-11-17T14:00:01.866-08:00	Runtime: Groovy 3.0.10 on OpenJDK 64-Bit Server VM 11.0.16.1+9-LTS
2022-11-17T14:00:01.866-08:00	Encoding: UTF-8 (ANSI_X3.4-1968)
2022-11-17T14:00:01.866-08:00	Process: 47@ip-redacted.compute.internal [redacted]
2022-11-17T14:00:01.866-08:00	CPUs: 2 - Mem: 2 GB (1.5 GB) - Swap: 2 GB (2 GB)
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.780 [main] WARN com.amazonaws.util.Base64 - JAXB is unavailable. Will fallback to SDK implementation which may be less performant.If you are using Java 9+, you will need to include javax.xml.bind:jaxb-api as a dependency.
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.799 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): redacted
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.799 [main] DEBUG nextflow.Session - Work-dir: redacted
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.799 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /root/.nextflow/assets/redacted/bin
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.871 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[AwsBatchExecutor]
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.886 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.954 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:57.975 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 3; maxThreads: 1000
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:58.123 [main] DEBUG nextflow.Session - Session start invoked
2022-11-17T14:00:01.866-08:00	Nov-17 21:53:59.049 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution

Child Process

2022-11-17T14:00:01.867-08:00	Essential container in task exited - OutOfMemoryError: Container killed due to memory usage
2022-11-17T14:00:01.867-08:00	Command executed:
2022-11-17T14:00:01.867-08:00	fastp     -i USDA_soil_C35-5-1_1.fastq.gz     -I USDA_soil_C35-5-1_2.fastq.gz     -o "USDA_soil_C35-5-1.trim.R1.fq.gz"     -O "USDA_soil_C35-5-1.trim.R2.fq.gz"     --length_required 50     -h "USDA_soil_C35-5-1.html"     -w 16
2022-11-17T14:00:01.867-08:00	Command exit status:
2022-11-17T14:00:01.867-08:00	137
2022-11-17T14:00:01.867-08:00	Command output:
2022-11-17T14:00:01.867-08:00	(empty)
2022-11-17T14:00:01.867-08:00	Command error:
2022-11-17T14:00:01.867-08:00	  .command.sh: line 2:   188 Killed                  fastp -i USDA_soil_C35-5-1_1.fastq.gz -I USDA_soil_C35-5-1_2.fastq.gz -o "USDA_soil_C35-5-1.trim.R1.fq.gz" -O "USDA_soil_C35-5-1.trim.R2.fq.gz" --length_required 50 -h "USDA_soil_C35-5-1.html" -w 16

Expected Behavior

spawn processes with enough memory or scale.

Actual Behavior

Container ran out of memory

Screenshots

Additional Context

ran workflow with the following command: agc workflow run foo --context dev

Operating System: Linux
AGC Version: 1.5.1
Was AGC setup with a custom bucket: no
Was AGC setup with a custom VPC: no

@spitfiredd spitfiredd added the bug Something isn't working label Nov 17, 2022
@biofilos
Copy link

I am seeing a similar behavior with cromwell. I give a task 64GB. In AWS batch, I see the following warning next to the Memory information

Configuration conflict
This value was submitted using containerOverrides.memory which has been deprecated and was not used as an override. Instead, the MEMORY value found in the job definition’s resourceRequirements key was used instead. More information about the deprecated key can be found in the AWS Batch API documentation.

I see an "Essential container in task exited". However, when I click on the job definition. It appears to have 8GB allocated memory.
Is there a different way to specify memory?

@vvalleru
Copy link
Contributor

Thanks for reporting this issue. Is this an issue with the 1.5.2 release as well?

@biofilos
Copy link

biofilos commented Dec 1, 2022

It is still an issue with v 1.5.2 (cromwell)

@markjschreiber
Copy link
Contributor

@spitfiredd The child processes are spawned with a default of 1vCPU and 1024 MEMORY. If tasks need more memory or CPU then you would typically make these requests as process directives for CPU and memory. (https://www.nextflow.io/docs/latest/process.html#cpus) and (https://www.nextflow.io/docs/latest/process.html#memory).

@markjschreiber
Copy link
Contributor

@biofilos AGC is currently using an older version of Cromwell. This older version uses the deprecated call to AWS Batch, hence the error. In our next release we will update the version of Cromwell used.

As a possible work around, you might consider deploying a miniwdl context to run the WDL.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants