Skip to content

jeho-lee/Awesome-Efficient-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 

Repository files navigation

Awesome research works for Efficient AI

A curated list of research works on efficient AI systems, methods, and applications.

Efficient AI Systems on Mobile and Edge Devices

Efficient Inference using Heterogeneous Processors (e.g., CPU, GPU, NPU, etc.)

  • [MobiCom 2024] Perceptual-Centric Image Super-Resolution using Heterogeneous Processors on Mobile Devices [paper]
  • [Sensys 2023] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU [paper]
  • [MobiSys 2023] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors [paper]
  • [ATC 2023] Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices [paper]
  • [IPSN 2023] PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators [paper]
  • [SenSys 2022] BlastNet: Exploiting Duo-Blocks for Cross-Processor Real-Time DNN Inference [paper]
  • [MobiSys 2022] Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors [paper]
  • [MobiSys 2022] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices [paper]

Adaptive Inference for Optimized Resource Utilization

  • [MobiCom 2024] Panopticus: Omnidirectional 3D Object Detection on Resource-constrained Edge Devices [paper]
  • [MobiSys 2023] OmniLive: Super-Resolution Enhanced 360° Video Live Streaming for Mobile Devices [paper]
  • [MobiSys 2023] HarvNet: Resource-Optimized Operation of Multi-Exit Deep Neural Networks on Energy Harvesting Devices [paper]
  • [MobiCom 2022] NeuLens: Spatial-based Dynamic Acceleration of Convolutional Neural Networks on Edge [paper]
  • [MobiCom 2021] Flexible high-resolution object detection on edge devices with tunable latency [paper]

On-device LLM

  • [arXiv 2024] PowerInfer-2: Fast Large Language Model Inference on a Smartphone [paper]
  • [ASPLOS 2025] Empowering 1000 tokens/second on-deviceLLM prefilling with mllm-NPU [paper] [code]
  • [MobiCom 2024] MELTing point: Mobile Evaluation of Language Transformers [paper] [code]
  • [MobiCom 2024] Mobile Foundation Model as Firmware [paper] [code]

On-device Training, Model Adaptation

  • [SenSys 2024] AdaShadow: Responsive Test-time Model Adaptation in Non-stationary Mobile Environments [paper]
  • [SenSys 2023] EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge [paper]
  • [MobiCom 2023] Cost-effective On-device Continual Learning over Memory Hierarchy with Miro [paper]
  • [MobiCom 2023] AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments [paper]
  • [MobiSys 2023] ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection [paper]
  • [SenSys 2023] On-NAS: On-Device Neural Architecture Search on Memory-Constrained Intelligent Embedded Systems [paper]
  • [MobiCom 2022] Mandheling: mixed-precision on-device DNN training with DSP offloading [paper]
  • [MobiSys 2022] Memory-efficient DNN training on mobile devices [paper]

Edge-Cloud Collaborative Inference

  • [MobiSys 2024] ARISE: High-Capacity AR Offloading Inference Serving via Proactive Scheduling [paper]
  • [MobiSys 2024] CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent Execution [paper]
  • [MobiCom 2023] AccuMO: Accuracy-Centric Multitask Offloading in Edge-Assisted Mobile Augmented Reality [paper]
  • [MobiSys 2022] DeepMix: Mobility-aware, Lightweight, and Hybrid 3D Object Detection for Headsets [paper]

Profilers

  • [SenSys 2023] nnPerf: Demystifying DNN Runtime Inference Latency on Mobile Platforms [paper]
  • [MobiSys 2021] nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices [paper]

Efficient AI Serving Systems

LLM Serving

  • [SOSP 2024] PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU [paper]
  • [SOSP 2023] Efficient Memory Management for Large Language Model Serving with PagedAttention [paper]

Live ML Serving

  • [NSDI 2024] Vulcan: Automatic Query Planning for Live ML Analytics [paper]

Efficient AI methods

Elastic Neural Networks

  • [ICML 2024] FLEXTRON: Many-in-One Flexible Large Language Model [paper]
  • [CVPR 2023 Highlight] Stitchable Neural Networks [paper] [code]

Efficient LLM/VLMs

  • [ICML 2024] MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases [paper] [code]
  • [arXiv 2023] MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices [paper] [code]

Quantization

  • [MLSys 2024 Best Paper] AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration [paper] [code]

Pruning and Compression

  • [CVPR 2023] DepGraph: Towards Any Structural Pruning [paper] [code]
  • [ICML 2023] Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming [paper] [code]
  • [NeurIPS 2022] Structural Pruning via Latency-Saliency Knapsack [paper] [code]

Efficient Vision Transformer (ViT)

  • [ICLR 2023 top 5%] Token Merging: Your ViT but Faster [paper] [code]
  • [ICCV 2023] Rethinking Vision Transformers for MobileNet Size and Speed [paper] [code]
  • [ICCV 2023] EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction [paper] [code]
  • [CVPR 2023] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer [paper] [code]
  • [CVPR 2022 Oral] PoolFormer: MetaFormer Is Actually What You Need for Vision [paper] [code]