Skip to content

Commit

Permalink
moar memory
Browse files Browse the repository at this point in the history
  • Loading branch information
mcovarr committed Nov 4, 2024
1 parent dd50fda commit 6df0554
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ Once the VAT table is created and a tsv is exported, the AoU research workbench
- Specify a `split_intervals_disk_size_override` of 1000 (GiB).
- For `GvsExtractCallsetPgenMerged` only, specify a `scatter_count` of 25000 (overrides the default 30000 which gets inflated by this particular interval list to result in too many shards for Cromwell).
- For both PGEN and VCF extracts of ACAF only:
- Specify an `extract_overhead_memory_override_gib` of 5 (GiB, up from the default of 3 GiB).
- Specify an `extract_overhead_memory_override_gib` of 10 (GiB, up from the default of 3 GiB).
- Specify a `y_bed_weight_scaling` of 8 (up from the default of 4).
- When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task _in the WDL_ without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will likely OOM and automatically re-run with more memory.
- If you want to collect the monitoring logs from a large number of `Extract` shards, the `summarize_task_monitor_logs.py` script will not work if the task is scattered too wide. Use the `summarize_task_monitor_logs_from_file.py` script, instead, which takes a FOFN of GCS paths instead of a space-separated series of localized files.
Expand Down

0 comments on commit 6df0554

Please sign in to comment.