Resource schedule

LLM

24_Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve [paper]
24_Fast Distributed Inference Serving for Large Language Models [paper]
24_OSDI_ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models [paper] [code]
24_Splitwise: Efficient Generative LLM Inference Using Phase Splitting [paper]
23_Nips_Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline [paper] [code]
23_ICML_FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU [paper] [code]
23_SOSP_Efficient Memory Management for Large Language Model Serving with PagedAttention [paper] [code]
22_NSDI_Orca: A Distributed Serving System for Transformer-Based Generative Models [paper]

Resource Allocation

24_Eurosys_Atlas: Hybrid Cloud Migration Advisor for Interactive Microservices [paper] [code]
22_SoCC_SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management [paper]
22_Asplos_SOL: Safe On-Node Learning in Cloud Platforms [paper]
21_ICML_Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism [paper]
20_ICML_Inductive-bias-driven Reinforcement Learning for Efficient Schedules in Heterogeneous Clusters [paper]
17_NSDI_CherryPick: Adaptively Unearthing the Best Cloud Conﬁgurations for Big Data Analytics [paper] [code]

Autoscale

24_Eurosys_Erlang: Application-Aware Autoscaling for Cloud Microservices [paper] [[code]](https://github.com/vigsachi/ erlang)
24_SC_Fast and Efficient Scaling for Microservices with SurgeGuard [paper]
24_NSDI_Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices [paper] [code]
23_ATC_AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems [paper] [Code]
22_Eurosys_DeepRest: Deep Resource Estimation for Interactive Microservices [paper] [code] [paper] [code]
22_SoCC_DeepScaling: Microservices AutoScaling for Stable CPU Utilization in Large Scale Cloud Systems [paper]
22_SoCC_The Power of Prediction: Microservice Auto Scaling via Workload Learning [paper]
22_TCC_Microscaler: Cost-effective scaling for microservice applications in the cloud with an online learning approach [paper]
22_HPDC_Practical Efficient Microservice Autoscaling with QoS Assurance [paper]
22_ICWS_HRA: An Intelligent Holistic Resource Autoscaling Framework for Multi-service Applications [paper]
21_SoCC_SHOWAR: Right-Sizing And Efficient Scheduling of Microservices [paper]
20_OSDI_FIRM: An intelligent fine-grained resource management framework for slo-oriented microservices [paper]
20_Eurosys_Autopilot: Workload autoscaling at google [paper]
20_ICWS_A-sarsa: A predictive container auto-scaling algorithm based on reinforcement learning [paper]
19_Eurosys_Grandslam: Guaranteeing slas for jobs in microservices execution frameworks [paper]
19_ICWS_Microscaler: Automatic scaling for microservices with an online learning approach [paper]
19_CloudCom_Learning Predictive Autoscaling Policies for Cloud-hosted Microservices Using Trace-driven Modeling [paper]
19_Cloud_Horizontal and vertical scaling of container-based applications using reinforcement learning [paper]

Job Schedule

19_Eurosys_GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks [paper]

Power Control

23_ICPP_DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems [paper] 23_WWW_DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web Services [paper]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Resource schedule

LLM

Resource Allocation

Autoscale

Job Schedule

Power Control

Files

README.md

Latest commit

History

README.md

File metadata and controls

Resource schedule

LLM

Resource Allocation

Autoscale

Job Schedule

Power Control