Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flashinfer==0.2.0 precision error when tested on vLLM unit tests #736

Open
Dr-Left opened this issue Jan 13, 2025 · 0 comments
Open

Flashinfer==0.2.0 precision error when tested on vLLM unit tests #736

Dr-Left opened this issue Jan 13, 2025 · 0 comments
Assignees

Comments

@Dr-Left
Copy link

Dr-Left commented Jan 13, 2025

I tested flashiner==0.2.0 with vllm unit tests. When prefilling, the precision mismatch seems unacceptable (~0.04 absolute error), and all tests on prefilling have failed due to tensor mismatch assertion failed. (logs listed below)

Even I wrote the naive attention in fp32, the issue is still there and the difference between the two results remain unchanged.

Env:
torch==2.3.0+cu121
vllm==0.6.4
cuda version: 12.1

Test file adapted from the vllm codebase at test/kernels/test_flashinfer.py

Full test code at https://gist.github.com/Dr-Left/ec3336767ae860a964501d9f0fbb35c0

Test failed logs:

============================= test session starts ==============================
platform linux -- Python 3.9.21, pytest-8.3.4, pluggy-1.5.0 -- ~/.conda/envs/vllm/bin/python
cachedir: .pytest_cache
rootdir: ~/draft_dp/vllm
configfile: pyproject.toml
plugins: anyio-4.7.0
collecting ... collected 480 items / 336 deselected / 144 selected

vllm/tests/kernels/test_flashinfer.py::test_flashinfer_prefill_with_paged_kv[None-dtype0-16-128-num_heads0-seq_lens0] 2025-01-13 08:20:38,634 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-01-13 08:21:26,992 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-01-13 08:21:27,021 - INFO - flashinfer.jit: Loading JIT ops: page
2025-01-13 08:21:38,031 - INFO - flashinfer.jit: Finished loading JIT ops: page
FAILED [  0%]

        qo_indptr = [0]
        kv_indptr = [0]
        kv_indices = []
        kv_last_page_lens = []
        for i in range(num_seqs):
            seq_len = kv_lens[i]
            assert seq_len > 0
            num_blocks = (seq_len + block_size - 1) // block_size
            kv_indices.extend(block_tables[i, :num_blocks])
            kv_indptr.append(kv_indptr[-1] + num_blocks)
            kv_last_page_len = seq_len % block_size
            if kv_last_page_len == 0:
                kv_last_page_len = block_size
            kv_last_page_lens.append(kv_last_page_len)
            qo_indptr.append(qo_indptr[-1] + query_lens[i])

        qo_indptr = torch.tensor(qo_indptr, dtype=torch.int32)
        kv_indptr = torch.tensor(kv_indptr, dtype=torch.int32)
        kv_indices = torch.tensor(kv_indices, dtype=torch.int32)
        kv_last_page_lens = torch.tensor(kv_last_page_lens, dtype=torch.int32)

        workspace_buffer = torch.empty(128 * 1024 * 1024, dtype=torch.int8)
        torch.set_default_device("cpu")
        wrapper = flashinfer.BatchPrefillWithPagedKVCacheWrapper(
            workspace_buffer, "NHD", backend="fa3")
        wrapper.begin_forward(
            qo_indptr,
            kv_indptr,
            kv_indices,
            kv_last_page_lens,
            num_query_heads,
            num_kv_heads,
            head_size,
            block_size,
            q_data_type=dtype,
            kv_data_type=dtype,
        )
        torch
                                    scale=scale,
                                    soft_cap=soft_cap)
>       torch.testing.assert_close(output, ref_output, atol=1e-2, rtol=1e-2), \
            f"{torch.max(torch.abs(output - ref_output))}"
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 1776 / 276480 (0.6%)
E       Greatest absolute difference: 0.0408935546875 at index (1, 1, 23) (up to 0.01 allowed)
E       Greatest relative difference: 453.25 at index (1, 0, 118) (up to 0.01 allowed)

vllm/tests/kernels/test_flashinfer.py:269: AssertionError
----------------------------- Captured stderr call -----------------------------
2025-01-13 08:20:38,634 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-01-13 08:21:26,992 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-01-13 08:21:27,021 - INFO - flashinfer.jit: Loading JIT ops: page
2025-01-13 08:21:38,031 - INFO - flashinfer.jit: Finished loading JIT ops: page
=============================== warnings summary ===============================
vllm/vllm/connections.py:8
  ~/draft_dp/vllm/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
  No module named 'vllm._version'
    from vllm.version import __version__ as VLLM_VERSION

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED vllm/tests/kernels/test_flashinfer.py::test_flashinfer_prefill_with_paged_kv[None-dtype0-16-128-num_heads0-seq_lens0] - AssertionError: Tensor-likes are not close!

Mismatched elements: 1776 / 276480 (0.6%)
Greatest absolute difference: 0.0408935546875 at index (1, 1, 23) (up to 0.01 allowed)
Greatest relative difference: 453.25 at index (1, 0, 118) (up to 0.01 allowed)
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
=========== 1 failed, 336 deselected, 1 warning in 61.75s (0:01:01) ============
@yzh119 yzh119 self-assigned this Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants