You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested flashiner==0.2.0 with vllm unit tests. When prefilling, the precision mismatch seems unacceptable (~0.04 absolute error), and all tests on prefilling have failed due to tensor mismatch assertion failed. (logs listed below)
Even I wrote the naive attention in fp32, the issue is still there and the difference between the two results remain unchanged.
Env:
torch==2.3.0+cu121
vllm==0.6.4
cuda version: 12.1
Test file adapted from the vllm codebase at test/kernels/test_flashinfer.py
I tested flashiner==0.2.0 with vllm unit tests. When prefilling, the precision mismatch seems unacceptable (~0.04 absolute error), and all tests on prefilling have failed due to tensor mismatch assertion failed. (logs listed below)
Even I wrote the naive attention in fp32, the issue is still there and the difference between the two results remain unchanged.
Env:
torch==2.3.0+cu121
vllm==0.6.4
cuda version: 12.1
Test file adapted from the vllm codebase at
test/kernels/test_flashinfer.py
Full test code at https://gist.github.com/Dr-Left/ec3336767ae860a964501d9f0fbb35c0
Test failed logs:
The text was updated successfully, but these errors were encountered: