Thank you for your excellent work on VisionZip. I have two questions:
- FastV prunes R% of tokens from layer K onwards. Does the table below use the average number of tokens across all decoder layers, or is it calculated differently?
- In addition to latency improvements, did you measure the theoretical FLOPs reduction for the different pruning approaches? The calculation process would be greatly appreciated.
Thank you for your time and for considering these questions.