[Question]: Why does nvshmem_fence default to using nvshmemi_fence<NVSHMEMI_THREADGROUP_THREAD>()?

### Question

Does using `NVSHMEMI_THREADGROUP_THREAD` as the default scope cause excessive redundant work? Specifically, when `nvshmem_fence()` is called from a **warp** or **block**, all threads execute `nvshmemi_ibgda_fence()`, each seeing `index_in_scope == 0` and `scope_size == 1`, and thus redundantly iterating over all **DCIs and RC QPs** to issue `ibgda_quiet(qp)` calls. Could this lead to significant performance overhead?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Why does nvshmem_fence default to using nvshmemi_fence<NVSHMEMI_THREADGROUP_THREAD>()? #61

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: Why does nvshmem_fence default to using nvshmemi_fence<NVSHMEMI_THREADGROUP_THREAD>()? #61

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions