Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allreduce performance optimization and correctness fix #360

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Commits on Sep 16, 2024

  1. clipping fp16/bf16 addition

    chhwang committed Sep 16, 2024
    Configuration menu
    Copy the full SHA
    a8c30c0 View commit details
    Browse the repository at this point in the history
  2. rccl-tests pass

    chhwang committed Sep 16, 2024
    Configuration menu
    Copy the full SHA
    780f0f8 View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2024

  1. align with msccl clipping

    chhwang committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    c6cf400 View commit details
    Browse the repository at this point in the history
  2. revert

    chhwang committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    d929d25 View commit details
    Browse the repository at this point in the history
  3. final barrier for allreduce8

    chhwang committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    dee8fe2 View commit details
    Browse the repository at this point in the history
  4. this is weird

    chhwang committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    abe69b8 View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2024

  1. apps/nccl: 16B LLPacket for allreduce7

    This fixes the data correctness issue of rccl allreduce
    for half datatypes.
    nusislam committed Sep 26, 2024
    Configuration menu
    Copy the full SHA
    ca6741c View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2024

  1. apps/nccl: performance optimization for allreduce7

    Add unroll and non-temporal store
    nusislam committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    6484dce View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2024

  1. Configuration menu
    Copy the full SHA
    01e105b View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2024

  1. Update allgather.hpp

    chhwang authored Oct 2, 2024
    Configuration menu
    Copy the full SHA
    ea4f77d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    99b5997 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2024

  1. Configuration menu
    Copy the full SHA
    f9def85 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2024

  1. apps/nccl: allgather tuning

    nusislam committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    cdbb2de View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2024

  1. Configuration menu
    Copy the full SHA
    45d40a5 View commit details
    Browse the repository at this point in the history