Skip to content

bambam-omf download features #91

@robfitzgerald

Description

@robfitzgerald

two additional small user-oriented feature requests for bambam-omf:

parallelism: Option

a CLI argument passed in as part of OmfOperation::Network. it can be used in two locations:

rg_chunk_size and file_concurrency_limit

for the operation we are performing, i believe we want file_concurrency_size to be the same value as the overall task parallelism (one "file" per thread). our ideal rg_chunk_size is also likely n_tasks / n_threads, so we should probably compute that one programmatically.

task parallelism

when building the io_runtime in OvertureMapsCollector::collect_from_path. this can be specified using the tokio Runtime's Builder (pool size argument).

progress bar over row groups

using a progress bar without a "total" value isn't very helpful, but, with the refactor in #88 we have broken this into two phases. the first phase retrieves the row group metadata as a Vec<RowGroupTask>. we can then set up an async progress bar to show progress over these tasks. this requires an Arc<Mutex<Bar>> and a critical section to .update(1) the bar before we rgt.build_stream(). there are other examples in bambam or compass's repos that show this in action.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions