📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
-
Updated
Nov 28, 2025 - Python
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Pre-built wheels that erase Flash Attention 3 installation headaches.
Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless image editing tasks.
Toy Flash Attention implementation in torch
Demonstration for the Qwen-Image-Edit-2511 model with lazy-loaded LoRA adapters for advanced single- and multi-image editing. Supports 7 specialized LoRAs including photo-to-anime, multi-angle camera control, pose transfer (Any-Pose), upscaling, style transfer, light migration, and manga tone. Features fast inference (4 steps default).
Demonstration for the Qwen/Qwen-Image-Edit-2511 model, specialized in object manipulation via lazy-loaded LoRA adapters. Supports adding or removing specific elements (e.g., logos, accessories, clothing) in single- or multi-image inputs while preserving lighting, realism, and background details. Features precise prompt control and fast inference.
Demonstration for the Qwen/Qwen-Image-Edit-2509 model, enhanced with lazy-loaded LoRA adapters for specialized image editing tasks like texture application, object fusion, material transfer, and light migration. Uses a fused Lightning LoRA for rapid inference (4 steps default)
Add a description, image, and links to the flash-attention-3 topic page so that developers can more easily learn about it.
To associate your repository with the flash-attention-3 topic, visit your repo's landing page and select "manage topics."