Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] GEMM 优化专题 — Warp tile #24

Merged
merged 6 commits into from
Feb 18, 2024

Conversation

AndSonder
Copy link
Collaborator

No description provided.

本文中我们将会在第一个循环, 第二个循环之前加入一个 warp tiling 的循环, 以此来提高访存效率。Warp 是 GPU 硬件中的一个概念, 一个 warp 由 32 个线程组成, 这 32 个线程会被分配到一个 SM 中的一个 warp scheduler 中, warp scheduler 会负责调度这 32 个线程的执行。一个 SM 上可能会有多个 warp scheduler, 下图是一个 SM 中 warp scheduler 的示意图:

![picture 1](images/81ad9eb1ae1f6ac40f63798a57c63c10363a84784f1b7314ee0e90ed860fd6b9.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这张图片有2MB,可以尝试压缩下

3. Thread Tile: 指令可以在同一个 CUDA 核心上并行执行(即指令级并行性,又称 ILP)

:::note
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的note渲染有一些问题

@Aurelius84 Aurelius84 merged commit 4e16f5d into PaddleJitLab:develop Feb 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants