Skip to content

Conversation

@UESTC-AHao
Copy link
Contributor

@UESTC-AHao UESTC-AHao commented Oct 21, 2025

Purpose

What this PR does / why we need it?

This feature introduces GPU Direct Storage (GDS) support to enable direct data transfer between GPU memory and storage, bypassing CPU memory as an intermediate buffer. This significantly reduces memory bandwidth bottlenecks and improves KV cache loading/offloading performance.

Modifications

Does this PR introduce any user-facing change?

A new parameter, useDirect, has been added to the service-startup command. When set to true, GDS transfer is enabled and the data path is Device Memory <-> Storage (direct). When set to false, the path becomes Device Memory <-> Host Memory <-> Storage, as shown below:

"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
"ucm_connector_config": {
"storage_backends": "/home/nfs",
"useDirect" : true

Test

How was this patch tested?

embed without GDS:
img_v3_02r9_6b47c813-3ecc-4a9f-9b85-8f027bdee16g

embed with GDS:
img_v3_02r9_5d4e7d1e-d099-44b4-a75b-b53189f3f03g

fetch without GDS:
img_v3_02r9_e8fe7199-9ad2-4e45-98d4-54e49f16865g

fetch with GDS:
img_v3_02r9_6050b6b2-34b7-40ae-86e0-1b42f7c9d45g

qyh111
qyh111 previously approved these changes Oct 31, 2025
UC_INFO("GDS driver initialized successfully");
} else {
UC_ERROR("GDS driver initialized unsuccessfully");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling is insufficient, failures need to be visible to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Received,I'll fix it.

}
Status S2DSync(int fd, void* address, const size_t length, const size_t fileOffset, const size_t devOffset) override
{
return Status::OK();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Status::Unsupported();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

bool DirectStorageQueue::Init(Device& device)
{
if (this->deviceId_ < 0) { return true; }
DeviceFactory::Setup(useDirect);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface should be invoked in Setup(), not in Init().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U are right,I'll fix it.

Status DirectStorageQueue::S2D(Task::Shard& shard, const Device& device) {
auto path = this->layout_->DataFilePath(shard.block, false);
int fd = -1;
auto status = HandlePool<std::string, int>::Instance().Get(path, fd,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use a singleton, use a member variable instead for HandlePool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the current circumstances, singletons in processes created by fork are hanging. I will fix this.

int fd = -1;
auto status = HandlePool<std::string, int>::Instance().Get(path, fd,
[&path](int& newFd) -> Status {
newFd = open(path.c_str(), O_RDONLY | O_DIRECT, 0644);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use file interface in file.h

Copy link
Contributor Author

@UESTC-AHao UESTC-AHao Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Received.

int fd = -1;
auto status = HandlePool<std::string, int>::Instance().Get(path, fd,
[&path](int& newFd) -> Status {
newFd = open(path.c_str(), O_WRONLY | O_CREAT | O_DIRECT, 0644);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are the handles cached in HandlePool closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use member variables and release resources in the destructor.

for (size_t i = 0; i < streamNumber; i++) {
auto q = std::make_shared<DirectStorageQueue>();
status =
q->Setup(deviceId, ioSize, bufferNumber, &this->failureSet_, layout, timeoutMs, useDirect);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DirectStorageQueue is always direct, and any other queue is never direct, the useDirect flag is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok,I'll fix it.

q->Setup(deviceId, ioSize, bufferNumber, &this->failureSet_, layout, timeoutMs, useDirect);
if (status.Failure()) { break; }
this->queues_.emplace_back(std::move(q));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the if and else branches contain duplicated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants