Add tile-based API #145

Binyang2014 · 2023-07-26T11:41:43Z

Provide tile-based api

void put2D(uint64_t dstOffset, uint64_t srcOffset, uint32_t width, uint32_t height)
void put2DWithSignal(uint64_t dstOffset, uint64_t srcOffset, uint32_t width, uint32_t height)

To support this, add a new structure fields2D in ChannelTrigger. In this structure we replace the 64bit size to two 32 bit fields (a 32bit width and a 32bit height). Also add another flag multiDimensionFlag in fields2D to distinguish with fields structure

Example to use tile-based API:
When setup the connections, need to call channelService->addPitch first

  for (int r = 0; r < worldSize; r++) {
    if (r == rank) {
      continue;
    }
    std::shared_ptr<mscclpp::Connection> conn;
    if ((rankToNode(r) == rankToNode(gEnv->rank)) && !useIbOnly) {
      conn = communicator->connectOnSetup(r, 0, mscclpp::Transport::CudaIpc);
    } else {
      conn = communicator->connectOnSetup(r, 0, ibTransport);
    }
    connections[r] = conn;
    communicator->sendMemoryOnSetup(recvBufRegMem, r, 0);
    auto remoteMemory = communicator->recvMemoryOnSetup(r, 0);
    communicator->setup();

    mscclpp::SemaphoreId cid = channelService->add2DChannel(conn, std::pair<size_t, size_t>(dstPitch, srcPitch));
    communicator->setup();
  }

Then use the put2D API

if (threadIdx.x == 0) proxyChan.put2DWithSignal(offset, width * sizeof(int), height);

Binyang2014 · 2023-08-07T06:18:30Z

The API cudaMemcpy2DAsync seems slower than cudaMemcpyAsync for 1D data. Need to investigate

saeedmaleki · 2023-09-28T02:57:09Z

src/proxy_channel.cc

@@ -29,6 +29,16 @@ MSCCLPP_API_CPP SemaphoreId ProxyService::buildAndAddSemaphore(Communicator& com
  return semaphores_.size() - 1;
 }

+MSCCLPP_API_CPP SemaphoreId ProxyService::buildAndAddSemaphore(Communicator& communicator,


This doesn't make much sense to me, why do we need an extra an way of building a semaphore? We only need to provide a 2D write over 1D arrays. So, just a 2D write is enough. Right?

Since we need to set pitch/stride for the channel (the name semaphore is not accurate). The reason we don't set stride in the put2D API is our trigger is only 128bit. We don't have extra bits for the it.

Binyang2014 added 5 commits July 25, 2023 09:16

add tile api

6449e7c

bug fix

ca0115e

add test

4cda05c

add UT

6d18f7b

add doc string

72b87d9

chhwang mentioned this pull request Jul 26, 2023

MSCCL++ v0.3.0 Release Plan #89

Closed

chhwang marked this pull request as ready for review July 26, 2023 12:47

Binyang2014 added 3 commits July 27, 2023 11:36

Merge branch 'main' into binyli/tile-api

4b9e709

address comments

59e15c8

doc string

b9ec5a6

Binyang2014 force-pushed the binyli/tile-api branch from ae230ed to b9ec5a6 Compare July 27, 2023 09:40

Binyang2014 requested review from saeedmaleki and chhwang July 28, 2023 02:33

update

a54e6a7

Binyang2014 force-pushed the binyli/tile-api branch from b82a86a to a54e6a7 Compare July 28, 2023 03:04

Binyang2014 added 3 commits August 18, 2023 05:35

merge main

05d35cd

fix

774a010

clean up

b40cda8

chhwang mentioned this pull request Aug 16, 2023

MSCCL++ v0.4.0 Release Plan (Released) #160

Closed

saeedmaleki suggested changes Sep 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tile-based API #145

Add tile-based API #145

Binyang2014 commented Jul 26, 2023 •

edited

Loading

Binyang2014 commented Aug 7, 2023

saeedmaleki Sep 28, 2023

Binyang2014 Sep 28, 2023

Add tile-based API #145

Are you sure you want to change the base?

Add tile-based API #145

Conversation

Binyang2014 commented Jul 26, 2023 • edited Loading

Binyang2014 commented Aug 7, 2023

saeedmaleki Sep 28, 2023

Choose a reason for hiding this comment

Binyang2014 Sep 28, 2023

Choose a reason for hiding this comment

Binyang2014 commented Jul 26, 2023 •

edited

Loading