Repcred runs out of memory #37

bcorrie · 2024-04-15T20:17:48Z

I ran this on a Repertoire of 4M sequences. Would you expect this? Repcred was set to downsample??? This ran on a compute node with 8GB of memory, so this tells me that the job used more than that for a significant amount of time.

This is where the output got to:

1/53
2/53 [global-options]
3/53
4/53 [input-parameters]
5/53
6/53 [unnamed-chunk-1]

See #35 for details of scalability/performance testing.

4,000,000 sequences, Sample ID: p1974_d60, Failed

The text was updated successfully, but these errors were encountered:

ssnn-airr · 2024-04-16T13:30:27Z

Is this in ipa1? I am using this command and I only get 4579 sequences. If found the repertoire_id using the gateway.

curl -k -s --data '{"filters":{"op":"=","content":{"field":"repertoire_id", "value"
:"60"}}, "format":"tsv"}' https://ipa1.ireceptor.org/airr/v1/rearrangement > p1974_d60.tsv

bcorrie · 2024-04-16T20:39:27Z

Sorry, that is on ipa3.ireceptor.org. Unfortunately on our old repositories our repertoire_id fields are not unique so this type of confusion can happen.

$ curl -k -s --data '{"filters":{"op":"=","content":{"field":"repertoire_id", "value":"60"}},"facets":"repertoire_id"}' https://ipa3.ireceptor.org/airr/v1/rearrangement
{
    "Info": DELETED
    "Facet": [
        {
            "repertoire_id": "60",
            "count": 3992474
        }
    ]
}

Also, unfortunately, on the Gateway there is no easy way to see which of the IPAs this repertoire is on. If the repertoire_id was unique, then you could search them all and it would only show up on one of them. This is an issue we need to address...

ssnn-airr · 2024-04-26T15:32:26Z

And I run out of patience. It takes forever to run the chunk CDR3_Chimera_Check. I need to figure out where the issue is. I will keep you posted.

bcorrie · 2024-05-01T16:31:39Z

When I am running these jobs, repcred is reporting that it is downsampling:

Warning message:
In normalizePath(opt$OUTDIR) :
  path[1]="ipa1.ireceptor.org/370/370_repcred_report": No such file or directory

Running repcred
|- Repertoire:
|  /scratch/ireceptorgw/gateway-clean/jobs/c1dd2cf7-25e8-4647-9090-ce0b8040beee-007/gateway_analysis/ipa1.ireceptor.org/370/370.tsv
|- Reference germline(s):
|  
|- Downsample:
|  TRUE
|- Output dir:
|  /scratch/ireceptorgw/gateway-clean/jobs/c1dd2cf7-25e8-4647-9090-ce0b8040beee-007/gateway_analysis/ipa1.ireceptor.org/370/370_repcred_report
|- Output format:
|  all 



processing file: _main.Rmd
Killed
slurmstepd: error: Detected 1 oom_kill event in StepId=30273871.batch. Some of the step tasks have been OOM Killed.

So it is running out of memory either while down sampling, or maybe one of the analysis steps isn't down sampling???

The last job reported this before it was killed for exceeding memory limits.

IR-INFO: Running Repcred on ipa1.ireceptor.org/370/370.tsv - Tue Apr 30 04:32:58 PM PDT 2024
1/63                          
2/63 [global-options]         
3/63                          
4/63 [input-parameters]       
5/63                          
6/63 [unnamed-chunk-1]        
IR-ERROR: Repcred failed on file ipa1.ireceptor.org/370/370.tsv
IR-INFO: Done running Repcred on ipa1.ireceptor.org/370/370.tsv - Tue Apr 30 04:36:28 PM PDT 2024

bcorrie · 2024-05-01T17:50:16Z

Hmm, this failed with running with 15GB of memory, so maybe this is a bug of some kind. It seems odd that 2M works fine in 8GB but 4M fails on 15GB. I am re-running with 30GB to confirm.

bcorrie · 2024-05-01T18:27:17Z

Looks like my job isn't getting memory allocated like I think it is... So ignore my comment about it failing with 15GB. I need to test still.

bcorrie · 2024-05-01T19:08:57Z

It looks like 4M sequences requires about 12 GB of memory, which is why it failed at 8GB. If I run with 30GB it works fine and one of the job summary tools reports over 11GB of memory used.

bcorrie · 2024-05-01T19:11:42Z

The largest repertoire in the ADC is 16M annotations, so this would presumably require a very large amount of memory if this scales linearly which at a very basic level it seems to be close to that based on my quick testing.

bcorrie mentioned this issue Apr 15, 2024

Repcred testing - performance benchmarking #35

Open

ssnn-airr self-assigned this Apr 16, 2024

bcorrie mentioned this issue Apr 30, 2024

Repcred testing - performance on IGH #39

Open

github-project-automation bot added this to rep-cred Aug 28, 2024

github-project-automation bot moved this to In progress in rep-cred Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repcred runs out of memory #37

Repcred runs out of memory #37

bcorrie commented Apr 15, 2024 •

edited

Loading

ssnn-airr commented Apr 16, 2024

bcorrie commented Apr 16, 2024 •

edited

Loading

ssnn-airr commented Apr 26, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

Repcred runs out of memory #37

Repcred runs out of memory #37

Comments

bcorrie commented Apr 15, 2024 • edited Loading

ssnn-airr commented Apr 16, 2024

bcorrie commented Apr 16, 2024 • edited Loading

ssnn-airr commented Apr 26, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

bcorrie commented May 1, 2024

bcorrie commented Apr 15, 2024 •

edited

Loading

bcorrie commented Apr 16, 2024 •

edited

Loading