Skip to content

Conversation

@lantudou
Copy link

The code used torch.linalg.svd to compute the full singular value decomposition of the weight matrix. This proved to be computationally expensive and memory-intensive, especially since we only utilize the top rank components.

This PR replaces it with torch.svd_lowrank, utilizing a randomized algorithm to approximate the dominant singular values efficiently.

Changes:
Switched to torch.svd_lowrank for faster decomposition.
Set niter=4 and q=10 (oversampling) to achieve an optimal balance between speed and accuracy.
Adjusted tensor transposition logic to align with svd_lowrank's output format (which returns v instead of Vh).

Performance Impact In my local environment, this optimization significantly eliminates a major bottleneck during quantization:

Low-rank creation latency: Dropped from ~5s to ~100ms.
Total Runtime: The overall calibration and quantization process is approximately 6x faster.

This PR supersedes #111. I've resubmitted a clean version to resolve the conflicts caused by recent breaking changes in the main branch. Apologies for the inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant