There was no speed improvement and gpu memory did not decrease.

When I used VisionZip on qwen2.5-vl-7b, compared with the original implementation, I found that although the number of visual tokens decreased, the speed of inference did not increase and the amount of GPU memory did not decrease. Especially the GPU memory, it even increased more than before the compression. This was mainly caused by the following code.
```
if layer_num == len_blocks-1:
    hidden_states, logits, attn_key = blk(hidden_states, cu_seqlens=cu_seqlens_now, position_embeddings=position_embeddings, return_logits=True)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

There was no speed improvement and gpu memory did not decrease. #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

There was no speed improvement and gpu memory did not decrease. #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions