Skip to content

hustshawn/deepseek-on-eks-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepSeek on EKS Demo

Project Description

This project demonstrates the deployment and management of DeepSeek AI models on Amazon Elastic Kubernetes Service (EKS). It showcases various large language models, including Qwen-14B, Qwen-32B, and DeepSeek-R1-Distill-Llama-8B, using vLLM for efficient inference. The project also includes node pool management with Karpenter, performance testing capabilities, and a user-friendly web interface for interacting with the models.

Components

Deployments

  1. Qwen-14B Deployment (qwen-14b-deployment.yaml)

    • Deploys the DeepSeek-R1-Distill-Qwen-14B model
    • Uses vLLM for serving
    • Configured with GPU support
  2. Qwen-32B Deployment (qwen-32b-deployment.yaml)

    • Deploys the DeepSeek-R1-Distill-Qwen-32B model
    • Similar configuration to Qwen-14B, but with adjusted resource requirements
  3. DeepSeek-R1-Distill-Llama-8B Deployment (deployment.yaml)

    • Deploys the DeepSeek-R1-Distill-Llama-8B model
    • Uses vLLM for serving
    • Configured with GPU support

Node Pools

  1. GPU Node Pool (nodepool.yaml)

    • Configures Karpenter for managing GPU-enabled nodes
    • Supports various GPU instance types (g5, g6, g6e, p5, p4)
    • Includes both spot and on-demand instances
  2. ML Accelerator Node Pool (nodepool.yaml)

    • Configures Karpenter for managing nodes with AWS Inferentia and Trainium
    • Supports instance families: inf1, inf2, trn1, trn1n

Performance Testing

  • GenAI Performance Tool (genai-perf.yaml, prompts.sh)
    • Deploys a Triton Inference Server for performance testing
    • Includes scripts for running performance profiles on the deployed models

User Interface

  • Open-WebUI (open-webui.yaml)
    • Deploys a web-based user interface for interacting with the DeepSeek AI models
    • Connects to the deployed vLLM services

Setup and Usage

  1. Prerequisites

    • Amazon EKS cluster
    • kubectl configured to access your cluster
    • Karpenter installed and configured
  2. Node Pool Configuration

    • Apply the node pool configurations:

      kubectl apply -f nodepool.yaml
  3. Deployment

    • Apply the deployment YAML files:

      kubectl apply -f qwen-14b-deployment.yaml
      kubectl apply -f qwen-32b-deployment.yaml
      kubectl apply -f deployment.yaml
  4. Performance Testing

    • Deploy the Triton Inference Server:

      kubectl apply -f genai-perf.yaml
    • Use the prompts.sh script to run performance tests

  5. User Interface

    • Deploy the Open-WebUI:

      kubectl apply -f open-webui.yaml
    • Access the UI through the exposed service

Project Structure

.
├── deepseek-using-vllm-on-eks
│   ├── chatbot-ui
│   │   ├── application
│   │   │   ├── app.py
│   │   │   ├── Dockerfile
│   │   │   └── requirements.txt
│   │   └── manifests
│   │       ├── deployment.yaml
│   │       └── ingress-class.yaml
│   ├── CODE_OF_CONDUCT.md
│   ├── CONTRIBUTING.md
│   ├── main.tf
│   ├── manifests
│   │   ├── deepseek-deployment-gpu.yaml
│   │   └── gpu-nodepool.yaml
│   └── README.md
├── ec2nodepool.yaml
├── genai-perf.yaml
├── k8s-manifest
│   ├── genai-perf
│   │   ├── genai-perf-2409.yaml
│   │   └── genai-perf-2412.yaml
│   ├── priority-class.yaml
│   ├── sglang
│   │   ├── llama-8b-sglang.yaml
│   │   └── qwen-32b-sglang.yaml
│   └── vllm
│       ├── llama-8b-vllm.yaml
│       ├── qwen-14b-deployment.yaml
│       └── qwen-32b-deployment.yaml
├── nodepool.yaml
├── open-webui.yaml
├── prompts.sh
└── README.md

Contributing

Contributions to this project are welcome. Please refer to the CONTRIBUTING.md file for guidelines.

License

This project is licensed under [LICENSE_NAME]. Please see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages