Perf: Replace full SVD with torch.svd_lowrank for acceleration #115

lantudou · 2025-12-24T08:38:25Z

The code used torch.linalg.svd to compute the full singular value decomposition of the weight matrix. This proved to be computationally expensive and memory-intensive, especially since we only utilize the top rank components.

This PR replaces it with torch.svd_lowrank, utilizing a randomized algorithm to approximate the dominant singular values efficiently.

Changes:
Switched to torch.svd_lowrank for faster decomposition.
Set niter=4 and q=10 (oversampling) to achieve an optimal balance between speed and accuracy.
Adjusted tensor transposition logic to align with svd_lowrank's output format (which returns v instead of Vh).

Performance Impact In my local environment, this optimization significantly eliminates a major bottleneck during quantization:

Low-rank creation latency: Dropped from ~5s to ~100ms.
Total Runtime: The overall calibration and quantization process is approximately 6x faster.

This PR supersedes #111. I've resubmitted a clean version to resolve the conflicts caused by recent breaking changes in the main branch. Apologies for the inconvenience.

Perf: Replace full SVD with torch.svd_lowrank for acceleration

a9cc930

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: Replace full SVD with torch.svd_lowrank for acceleration #115

Perf: Replace full SVD with torch.svd_lowrank for acceleration #115

Uh oh!

lantudou commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Perf: Replace full SVD with torch.svd_lowrank for acceleration #115

Are you sure you want to change the base?

Perf: Replace full SVD with torch.svd_lowrank for acceleration #115

Uh oh!

Conversation

lantudou commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant