-
Notifications
You must be signed in to change notification settings - Fork 2
Description
two additional small user-oriented feature requests for bambam-omf:
parallelism: Option
a CLI argument passed in as part of OmfOperation::Network. it can be used in two locations:
rg_chunk_size and file_concurrency_limit
for the operation we are performing, i believe we want file_concurrency_size to be the same value as the overall task parallelism (one "file" per thread). our ideal rg_chunk_size is also likely n_tasks / n_threads, so we should probably compute that one programmatically.
task parallelism
when building the io_runtime in OvertureMapsCollector::collect_from_path. this can be specified using the tokio Runtime's Builder (pool size argument).
progress bar over row groups
using a progress bar without a "total" value isn't very helpful, but, with the refactor in #88 we have broken this into two phases. the first phase retrieves the row group metadata as a Vec<RowGroupTask>. we can then set up an async progress bar to show progress over these tasks. this requires an Arc<Mutex<Bar>> and a critical section to .update(1) the bar before we rgt.build_stream(). there are other examples in bambam or compass's repos that show this in action.