- 24_Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve [paper]
- 24_Fast Distributed Inference Serving for Large Language Models [paper]
- 24_OSDI_ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models [paper] [code]
- 24_Splitwise: Efficient Generative LLM Inference Using Phase Splitting [paper]
- 23_Nips_Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline [paper] [code]
- 23_ICML_FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU [paper] [code]
- 23_SOSP_Efficient Memory Management for Large Language Model Serving with PagedAttention [paper] [code]
- 22_NSDI_Orca: A Distributed Serving System for Transformer-Based Generative Models [paper]
- 24_Eurosys_Atlas: Hybrid Cloud Migration Advisor for Interactive Microservices [paper] [code]
- 22_SoCC_SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management [paper]
- 22_Asplos_SOL: Safe On-Node Learning in Cloud Platforms [paper]
- 21_ICML_Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism [paper]
- 20_ICML_Inductive-bias-driven Reinforcement Learning for Efficient Schedules in Heterogeneous Clusters [paper]
- 17_NSDI_CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics [paper] [code]
- 24_Eurosys_Erlang: Application-Aware Autoscaling for Cloud Microservices [paper] [[code]](https://github.com/vigsachi/ erlang)
- 24_SC_Fast and Efficient Scaling for Microservices with SurgeGuard [paper]
- 24_NSDI_Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices [paper] [code]
- 23_ATC_AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems [paper] [Code]
- 22_Eurosys_DeepRest: Deep Resource Estimation for Interactive Microservices [paper] [code] [paper] [code]
- 22_SoCC_DeepScaling: Microservices AutoScaling for Stable CPU Utilization in Large Scale Cloud Systems [paper]
- 22_SoCC_The Power of Prediction: Microservice Auto Scaling via Workload Learning [paper]
- 22_TCC_Microscaler: Cost-effective scaling for microservice applications in the cloud with an online learning approach [paper]
- 22_HPDC_Practical Efficient Microservice Autoscaling with QoS Assurance [paper]
- 22_ICWS_HRA: An Intelligent Holistic Resource Autoscaling Framework for Multi-service Applications [paper]
- 21_SoCC_SHOWAR: Right-Sizing And Efficient Scheduling of Microservices [paper]
- 20_OSDI_FIRM: An intelligent fine-grained resource management framework for slo-oriented microservices [paper]
- 20_Eurosys_Autopilot: Workload autoscaling at google [paper]
- 20_ICWS_A-sarsa: A predictive container auto-scaling algorithm based on reinforcement learning [paper]
- 19_Eurosys_Grandslam: Guaranteeing slas for jobs in microservices execution frameworks [paper]
- 19_ICWS_Microscaler: Automatic scaling for microservices with an online learning approach [paper]
- 19_CloudCom_Learning Predictive Autoscaling Policies for Cloud-hosted Microservices Using Trace-driven Modeling [paper]
- 19_Cloud_Horizontal and vertical scaling of container-based applications using reinforcement learning [paper]
19_Eurosys_GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks [paper]
23_ICPP_DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems [paper] 23_WWW_DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web Services [paper]