Skip to content

Latest commit

 

History

History
53 lines (42 loc) · 5.95 KB

README.md

File metadata and controls

53 lines (42 loc) · 5.95 KB

Resource schedule

LLM

  • 24_Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve [paper]
  • 24_Fast Distributed Inference Serving for Large Language Models [paper]
  • 24_OSDI_ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models [paper] [code]
  • 24_Splitwise: Efficient Generative LLM Inference Using Phase Splitting [paper]
  • 23_Nips_Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline [paper] [code]
  • 23_ICML_FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU [paper] [code]
  • 23_SOSP_Efficient Memory Management for Large Language Model Serving with PagedAttention [paper] [code]
  • 22_NSDI_Orca: A Distributed Serving System for Transformer-Based Generative Models [paper]

Resource Allocation

  • 24_Eurosys_Atlas: Hybrid Cloud Migration Advisor for Interactive Microservices [paper] [code]
  • 22_SoCC_SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management [paper]
  • 22_Asplos_SOL: Safe On-Node Learning in Cloud Platforms [paper]
  • 21_ICML_Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism [paper]
  • 20_ICML_Inductive-bias-driven Reinforcement Learning for Efficient Schedules in Heterogeneous Clusters [paper]
  • 17_NSDI_CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics [paper] [code]

Autoscale

  • 24_Eurosys_Erlang: Application-Aware Autoscaling for Cloud Microservices [paper] [[code]](https://github.com/vigsachi/ erlang)
  • 24_SC_Fast and Efficient Scaling for Microservices with SurgeGuard [paper]
  • 24_NSDI_Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices [paper] [code]
  • 23_ATC_AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems [paper] [Code]
  • 22_Eurosys_DeepRest: Deep Resource Estimation for Interactive Microservices [paper] [code] [paper] [code]
  • 22_SoCC_DeepScaling: Microservices AutoScaling for Stable CPU Utilization in Large Scale Cloud Systems [paper]
  • 22_SoCC_The Power of Prediction: Microservice Auto Scaling via Workload Learning [paper]
  • 22_TCC_Microscaler: Cost-effective scaling for microservice applications in the cloud with an online learning approach [paper]
  • 22_HPDC_Practical Efficient Microservice Autoscaling with QoS Assurance [paper]
  • 22_ICWS_HRA: An Intelligent Holistic Resource Autoscaling Framework for Multi-service Applications [paper]
  • 21_SoCC_SHOWAR: Right-Sizing And Efficient Scheduling of Microservices [paper]
  • 20_OSDI_FIRM: An intelligent fine-grained resource management framework for slo-oriented microservices [paper]
  • 20_Eurosys_Autopilot: Workload autoscaling at google [paper]
  • 20_ICWS_A-sarsa: A predictive container auto-scaling algorithm based on reinforcement learning [paper]
  • 19_Eurosys_Grandslam: Guaranteeing slas for jobs in microservices execution frameworks [paper]
  • 19_ICWS_Microscaler: Automatic scaling for microservices with an online learning approach [paper]
  • 19_CloudCom_Learning Predictive Autoscaling Policies for Cloud-hosted Microservices Using Trace-driven Modeling [paper]
  • 19_Cloud_Horizontal and vertical scaling of container-based applications using reinforcement learning [paper]

Job Schedule

19_Eurosys_GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks [paper]

Power Control

23_ICPP_DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems [paper] 23_WWW_DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web Services [paper]