Skip to content

Conversation

mangguo321
Copy link
Contributor

Details:

  • Implement X-Attention which is run in pre-inference stage before PagedAttention in attention pipeline to generate sparse attention blocks to accelerate long prompt inference

Tickets:

@mangguo321 mangguo321 requested review from a team as code owners September 16, 2025 03:33
@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra labels Sep 16, 2025
@mangguo321 mangguo321 marked this pull request as draft September 16, 2025 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant