chore: use uvicorn to start llama stack server everywhere #3625

ehhuang · 2025-09-30T23:01:53Z

What does this PR do?

#3462 allows using uvicorn to start llama stack server which supports spawning multiple workers.

This PR enables us to launch >1 workers from llama stack run (will add the parameter in a follow-up PR, keeping this PR on simplifying) by removing the old way of launching stack server and consolidates launching via uvicorn.run only.

Test Plan

ran llama stack run starter
CI

leseb

love it! but it's missing a ton of --env from the docs and other places?

CONTRIBUTING.md:uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
docs/docs/building_applications/tools.mdx:--env TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY}
docs/docs/building_applications/tools.mdx:    --env WOLFRAM_ALPHA_API_KEY=${WOLFRAM_ALPHA_API_KEY}
docs/docs/contributing/index.mdx:uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
docs/docs/contributing/new_api_provider.mdx: typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the `--env` flag to the test command.
docs/docs/distributions/building_distro.mdx:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/building_distro.mdx:  --env OLLAMA_URL=http://host.docker.internal:11434
docs/docs/distributions/building_distro.mdx:* `--env INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the model to use for inference
docs/docs/distributions/building_distro.mdx:* `--env OLLAMA_URL=http://host.docker.internal:11434`: Configures the URL for the Ollama service
docs/docs/distributions/building_distro.mdx:usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
docs/docs/distributions/building_distro.mdx:  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
docs/docs/distributions/configuration.mdx:- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
docs/docs/distributions/configuration.mdx:llama stack run --config run.yaml --env API_KEY=sk-123 --env BASE_URL=https://custom-api.com
docs/docs/distributions/remote_hosted_distro/watsonx.md:  --env WATSONX_API_KEY=$WATSONX_API_KEY \
docs/docs/distributions/remote_hosted_distro/watsonx.md:  --env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
docs/docs/distributions/remote_hosted_distro/watsonx.md:  --env WATSONX_BASE_URL=$WATSONX_BASE_URL
docs/docs/distributions/self_hosted_distro/dell.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/dell.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env SAFETY_MODEL=$SAFETY_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/dell.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/dell.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env SAFETY_MODEL=$SAFETY_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
docs/docs/distributions/self_hosted_distro/dell.md:  --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md:  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md:  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
docs/docs/distributions/self_hosted_distro/nvidia.md:  --env NVIDIA_API_KEY=$NVIDIA_API_KEY
docs/docs/distributions/self_hosted_distro/nvidia.md:  --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
docs/docs/distributions/self_hosted_distro/nvidia.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL
docs/docs/getting_started/detailed_tutorial.mdx:  --env OLLAMA_URL=http://host.docker.internal:11434
docs/docs/getting_started/detailed_tutorial.mdx:  --env OLLAMA_URL=http://localhost:11434
docs/getting_started_llama4.ipynb:        "        f\"uv run --with llama-stack llama stack run meta-reference-gpu --image-type venv --env INFERENCE_MODEL={model_id}\",\n",
docs/zero_to_hero_guide/README.md:      --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/zero_to_hero_guide/README.md:      --env SAFETY_MODEL=$SAFETY_MODEL \
docs/zero_to_hero_guide/README.md:      --env OLLAMA_URL=$OLLAMA_URL
llama_stack/cli/stack/run.py:            "--env",
llama_stack/cli/stack/run.py:                    run_args.extend(["--env", f"{key}={value}"])
llama_stack/core/build.py:            "--env-name",
llama_stack/core/build_venv.sh:  echo "Usage: $0 --env-name <env_name> --normal-deps <pip_dependencies> [--external-provider-deps <external_provider_deps>] [--optional-deps <special_pip_deps>]"
llama_stack/core/build_venv.sh:  echo "Example: $0 --env-name mybuild --normal-deps 'numpy pandas scipy' --external-provider-deps 'foo' --optional-deps 'bar'"
llama_stack/core/build_venv.sh:    --env-name)
llama_stack/core/build_venv.sh:        echo "Error: --env-name requires a string value" >&2
llama_stack/core/build_venv.sh:  echo "Error: --env-name and --normal-deps are required." >&2
llama_stack/core/server/server.py:        "--env",
llama_stack/core/start_stack.sh:  echo "Usage: $0 <env_type> <env_path_or_name> <port> [--config <yaml_config>] [--env KEY=VALUE]..."
llama_stack/core/start_stack.sh:    --env)
llama_stack/core/start_stack.sh:        env_vars="$env_vars --env $2"
llama_stack/core/start_stack.sh:        echo -e "${RED}Error: --env requires a KEY=VALUE argument${NC}" >&2
llama_stack/distributions/dell/doc_template.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md:  --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md:  --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/dell/doc_template.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md:  --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md:  --env SAFETY_MODEL=$SAFETY_MODEL \
llama_stack/distributions/dell/doc_template.md:  --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
llama_stack/distributions/dell/doc_template.md:  --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/dell/doc_template.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md:  --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md:  --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/dell/doc_template.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md:  --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md:  --env SAFETY_MODEL=$SAFETY_MODEL \
llama_stack/distributions/dell/doc_template.md:  --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
llama_stack/distributions/dell/doc_template.md:  --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/meta-reference-gpu/doc_template.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
llama_stack/distributions/meta-reference-gpu/doc_template.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llama_stack/distributions/meta-reference-gpu/doc_template.md:  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
llama_stack/distributions/meta-reference-gpu/doc_template.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
llama_stack/distributions/meta-reference-gpu/doc_template.md:  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llama_stack/distributions/meta-reference-gpu/doc_template.md:  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
llama_stack/distributions/nvidia/doc_template.md:  --env NVIDIA_API_KEY=$NVIDIA_API_KEY
llama_stack/distributions/nvidia/doc_template.md:  --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
llama_stack/distributions/nvidia/doc_template.md:  --env INFERENCE_MODEL=$INFERENCE_MODEL
llama_stack/env.py:            f"\n3. Pass directly to pytest: pytest --env {key}=your-key"
scripts/install.sh:      --env OLLAMA_URL="http://ollama-server:${OLLAMA_PORT}")
tests/README.md:- Any API keys you need to use should be set in the environment, or can be passed in with the --env option.
tests/integration/README.md:- `--env`: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.
tests/integration/conftest.py:    env_vars = config.getoption("--env") or []
tests/integration/conftest.py:    parser.addoption("--env", action="append", help="Set environment variables, e.g. --env KEY=value")

ehhuang · 2025-10-03T18:55:42Z

@leseb I've removed the --env deprecation out of this PR for ease of reviewing :)

# What does this PR do? ## Test Plan

leseb · 2025-10-06T07:27:31Z

@leseb I've removed the --env deprecation out of this PR for ease of reviewing :)

Makes sense!

leseb

Pulled and testing locally with a few openai chat completion calls. Thanks!

0.3.0 with llamastack/llama-stack#3625 forces us to use "llama stack run" and the server module doesn't execute the server anymore. Signed-off-by: Sébastien Han <seb@redhat.com>

The next 0.3.0 version will have a different way to run the server. The server module does not run the server anymore. It is happening via ``` llama stack run <path to config> ``` For more details llamastack/llama-stack#3625 Signed-off-by: Sébastien Han <seb@redhat.com>

The next 0.3.0 version will have a different way to run the server. The server module does not run the server anymore. It is happening via ``` llama stack run <path to config> ``` For more details llamastack/llama-stack#3625  ## Summary by CodeRabbit - Chores - Updated container entrypoint to launch via the CLI, providing more consistent startup behavior across environments. - Preserves existing configuration (same run YAML path); no action required from users. - May result in slightly different log format and signal handling during startup/shutdown. - No changes to user-facing APIs or features.

0.3.0 with llamastack/llama-stack#3625 forces us to use "llama stack run" and the server module doesn't execute the server anymore. Signed-off-by: Sébastien Han <seb@redhat.com>

0.3.0 with llamastack/llama-stack#3625 forces us to use "llama stack run" and the server module doesn't execute the server anymore. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 90365e7)

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2025

ehhuang force-pushed the pr3625 branch 3 times, most recently from 7343031 to 70736e8 Compare October 3, 2025 00:09

ehhuang changed the title ~~remove main~~ chore: use uvicorn to start llama stack server everywhere Oct 3, 2025

ehhuang changed the title ~~chore: use uvicorn to start llama stack server everywhere~~ chore!: use uvicorn to start llama stack server everywhere Oct 3, 2025

ehhuang force-pushed the pr3625 branch 5 times, most recently from efcfdd0 to a22534c Compare October 3, 2025 04:57

ehhuang marked this pull request as ready for review October 3, 2025 05:16

ehhuang requested review from ashwinb, bbrowning, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 3, 2025 05:16

leseb requested changes Oct 3, 2025

View reviewed changes

ehhuang force-pushed the pr3625 branch from a22534c to 31e20b1 Compare October 3, 2025 18:04

ehhuang changed the title ~~chore!: use uvicorn to start llama stack server everywhere~~ chore: use uvicorn to start llama stack server everywhere Oct 3, 2025

ehhuang force-pushed the pr3625 branch from 31e20b1 to 9191514 Compare October 3, 2025 18:55

remove main

9b4b048

# What does this PR do? ## Test Plan

ehhuang force-pushed the pr3625 branch from 9191514 to 9b4b048 Compare October 3, 2025 21:21

ehhuang requested a review from leseb October 3, 2025 21:22

leseb approved these changes Oct 6, 2025

View reviewed changes

leseb merged commit 426cac0 into llamastack:main Oct 6, 2025
43 checks passed

leseb mentioned this pull request Oct 6, 2025

chore: use the new CLI to run the server llamastack/llama-stack-k8s-operator#171

Merged

leseb mentioned this pull request Oct 10, 2025

fix: add the new entrypoint opendatahub-io/llama-stack-distribution#68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: use uvicorn to start llama stack server everywhere #3625

chore: use uvicorn to start llama stack server everywhere #3625

Uh oh!

ehhuang commented Sep 30, 2025 •

edited

Loading

Uh oh!

leseb left a comment

Uh oh!

ehhuang commented Oct 3, 2025

Uh oh!

leseb commented Oct 6, 2025

Uh oh!

leseb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: use uvicorn to start llama stack server everywhere #3625

chore: use uvicorn to start llama stack server everywhere #3625

Uh oh!

Conversation

ehhuang commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

ehhuang commented Oct 3, 2025

Uh oh!

leseb commented Oct 6, 2025

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehhuang commented Sep 30, 2025 •

edited

Loading