Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add Vertex AI compatible prediction route for /generate #3866

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

KCFindstr
Copy link

Motivation

The Google Cloud Vertex AI Online Prediction has some requirements on the prediction routes, request, and response formats. This PR enables SGLang docker containers to be served directly on Google Cloud Vertex AI without modification (unless advanced features such as multi-node serving are needed).

Modifications

I added a Vertex AI compatible route for the /generate API, but with Vertex-specific request and response formats. It dynamically mounts based on the AIP_PREDICT_ROUTE environment variable

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Add unit tests as outlined in the Running Unit Tests.
  • Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.

    The new API route does not provide additional value to non-Vertex users. Will provide examples on Vertex AI.

  • Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.

    N/A

  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

@KCFindstr KCFindstr marked this pull request as ready for review February 26, 2025 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant