Awesome research works for Efficient AI

A curated list of research works on efficient AI systems, methods, and applications.

Efficient AI Systems on Mobile and Edge Devices

Efficient Inference using Heterogeneous Processors (e.g., CPU, GPU, NPU, etc.)

[MobiCom 2024] Perceptual-Centric Image Super-Resolution using Heterogeneous Processors on Mobile Devices [paper]
[Sensys 2023] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU [paper]
[MobiSys 2023] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors [paper]
[ATC 2023] Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices [paper]
[IPSN 2023] PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators [paper]
[SenSys 2022] BlastNet: Exploiting Duo-Blocks for Cross-Processor Real-Time DNN Inference [paper]
[MobiSys 2022] Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors [paper]
[MobiSys 2022] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices [paper]

Adaptive Inference for Optimized Resource Utilization

[MobiCom 2024] Panopticus: Omnidirectional 3D Object Detection on Resource-constrained Edge Devices [paper]
[MobiSys 2023] OmniLive: Super-Resolution Enhanced 360° Video Live Streaming for Mobile Devices [paper]
[MobiSys 2023] HarvNet: Resource-Optimized Operation of Multi-Exit Deep Neural Networks on Energy Harvesting Devices [paper]
[MobiCom 2022] NeuLens: Spatial-based Dynamic Acceleration of Convolutional Neural Networks on Edge [paper]
[MobiCom 2021] Flexible high-resolution object detection on edge devices with tunable latency [paper]

On-device LLM

[arXiv 2024] PowerInfer-2: Fast Large Language Model Inference on a Smartphone [paper]
[ASPLOS 2025] Empowering 1000 tokens/second on-deviceLLM prefilling with mllm-NPU [paper] [code]
[MobiCom 2024] MELTing point: Mobile Evaluation of Language Transformers [paper] [code]
[MobiCom 2024] Mobile Foundation Model as Firmware [paper] [code]

On-device Training, Model Adaptation

[SenSys 2024] AdaShadow: Responsive Test-time Model Adaptation in Non-stationary Mobile Environments [paper]
[SenSys 2023] EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge [paper]
[MobiCom 2023] Cost-effective On-device Continual Learning over Memory Hierarchy with Miro [paper]
[MobiCom 2023] AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments [paper]
[MobiSys 2023] ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection [paper]
[SenSys 2023] On-NAS: On-Device Neural Architecture Search on Memory-Constrained Intelligent Embedded Systems [paper]
[MobiCom 2022] Mandheling: mixed-precision on-device DNN training with DSP offloading [paper]
[MobiSys 2022] Memory-efficient DNN training on mobile devices [paper]

Edge-Cloud Collaborative Inference

[MobiSys 2024] ARISE: High-Capacity AR Offloading Inference Serving via Proactive Scheduling [paper]
[MobiSys 2024] CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent Execution [paper]
[MobiCom 2023] AccuMO: Accuracy-Centric Multitask Offloading in Edge-Assisted Mobile Augmented Reality [paper]
[MobiSys 2022] DeepMix: Mobility-aware, Lightweight, and Hybrid 3D Object Detection for Headsets [paper]

Profilers

[SenSys 2023] nnPerf: Demystifying DNN Runtime Inference Latency on Mobile Platforms [paper]
[MobiSys 2021] nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices [paper]

Efficient AI Serving Systems

LLM Serving

[SOSP 2024] PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU [paper]
[SOSP 2023] Efficient Memory Management for Large Language Model Serving with PagedAttention [paper]

Live ML Serving

[NSDI 2024] Vulcan: Automatic Query Planning for Live ML Analytics [paper]

Efficient AI methods

Elastic Neural Networks

[ICML 2024] FLEXTRON: Many-in-One Flexible Large Language Model [paper]
[CVPR 2023 Highlight] Stitchable Neural Networks [paper] [code]

Efficient LLM/VLMs

[ICML 2024] MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases [paper] [code]
[arXiv 2023] MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices [paper] [code]

Quantization

[MLSys 2024 Best Paper] AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration [paper] [code]

Pruning and Compression

[CVPR 2023] DepGraph: Towards Any Structural Pruning [paper] [code]
[ICML 2023] Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming [paper] [code]
[NeurIPS 2022] Structural Pruning via Latency-Saliency Knapsack [paper] [code]

Efficient Vision Transformer (ViT)

[ICLR 2023 top 5%] Token Merging: Your ViT but Faster [paper] [code]
[ICCV 2023] Rethinking Vision Transformers for MobileNet Size and Speed [paper] [code]
[ICCV 2023] EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction [paper] [code]
[CVPR 2023] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer [paper] [code]
[CVPR 2022 Oral] PoolFormer: MetaFormer Is Actually What You Need for Vision [paper] [code]

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome research works for Efficient AI

Efficient AI Systems on Mobile and Edge Devices

Efficient Inference using Heterogeneous Processors (e.g., CPU, GPU, NPU, etc.)

Adaptive Inference for Optimized Resource Utilization

On-device LLM

On-device Training, Model Adaptation

Edge-Cloud Collaborative Inference

Profilers

Efficient AI Serving Systems

LLM Serving

Live ML Serving

Efficient AI methods

Elastic Neural Networks

Efficient LLM/VLMs

Quantization

Pruning and Compression

Efficient Vision Transformer (ViT)

About

Releases

Packages

Contributors 2

jeho-lee/Awesome-Efficient-AI

Folders and files

Latest commit

History

Repository files navigation

Awesome research works for Efficient AI

Efficient AI Systems on Mobile and Edge Devices

Efficient Inference using Heterogeneous Processors (e.g., CPU, GPU, NPU, etc.)

Adaptive Inference for Optimized Resource Utilization

On-device LLM

On-device Training, Model Adaptation

Edge-Cloud Collaborative Inference

Profilers

Efficient AI Serving Systems

LLM Serving

Live ML Serving

Efficient AI methods

Elastic Neural Networks

Efficient LLM/VLMs

Quantization

Pruning and Compression

Efficient Vision Transformer (ViT)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages