-
Notifications
You must be signed in to change notification settings - Fork 1.2k
chore: use uvicorn to start llama stack server everywhere #3625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7343031
to
70736e8
Compare
efcfdd0
to
a22534c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it! but it's missing a ton of --env
from the docs and other places?
CONTRIBUTING.md:uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
docs/docs/building_applications/tools.mdx:--env TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY}
docs/docs/building_applications/tools.mdx: --env WOLFRAM_ALPHA_API_KEY=${WOLFRAM_ALPHA_API_KEY}
docs/docs/contributing/index.mdx:uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
docs/docs/contributing/new_api_provider.mdx: typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the `--env` flag to the test command.
docs/docs/distributions/building_distro.mdx: --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/building_distro.mdx: --env OLLAMA_URL=http://host.docker.internal:11434
docs/docs/distributions/building_distro.mdx:* `--env INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the model to use for inference
docs/docs/distributions/building_distro.mdx:* `--env OLLAMA_URL=http://host.docker.internal:11434`: Configures the URL for the Ollama service
docs/docs/distributions/building_distro.mdx:usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
docs/docs/distributions/building_distro.mdx: --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
docs/docs/distributions/configuration.mdx:- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
docs/docs/distributions/configuration.mdx:llama stack run --config run.yaml --env API_KEY=sk-123 --env BASE_URL=https://custom-api.com
docs/docs/distributions/remote_hosted_distro/watsonx.md: --env WATSONX_API_KEY=$WATSONX_API_KEY \
docs/docs/distributions/remote_hosted_distro/watsonx.md: --env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
docs/docs/distributions/remote_hosted_distro/watsonx.md: --env WATSONX_BASE_URL=$WATSONX_BASE_URL
docs/docs/distributions/self_hosted_distro/dell.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md: --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md: --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/dell.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md: --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md: --env SAFETY_MODEL=$SAFETY_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md: --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
docs/docs/distributions/self_hosted_distro/dell.md: --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/dell.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md: --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md: --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/dell.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md: --env DEH_URL=$DEH_URL \
docs/docs/distributions/self_hosted_distro/dell.md: --env SAFETY_MODEL=$SAFETY_MODEL \
docs/docs/distributions/self_hosted_distro/dell.md: --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
docs/docs/distributions/self_hosted_distro/dell.md: --env CHROMA_URL=$CHROMA_URL
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md: --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md: --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
docs/docs/distributions/self_hosted_distro/nvidia.md: --env NVIDIA_API_KEY=$NVIDIA_API_KEY
docs/docs/distributions/self_hosted_distro/nvidia.md: --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
docs/docs/distributions/self_hosted_distro/nvidia.md: --env INFERENCE_MODEL=$INFERENCE_MODEL
docs/docs/getting_started/detailed_tutorial.mdx: --env OLLAMA_URL=http://host.docker.internal:11434
docs/docs/getting_started/detailed_tutorial.mdx: --env OLLAMA_URL=http://localhost:11434
docs/getting_started_llama4.ipynb: " f\"uv run --with llama-stack llama stack run meta-reference-gpu --image-type venv --env INFERENCE_MODEL={model_id}\",\n",
docs/zero_to_hero_guide/README.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
docs/zero_to_hero_guide/README.md: --env SAFETY_MODEL=$SAFETY_MODEL \
docs/zero_to_hero_guide/README.md: --env OLLAMA_URL=$OLLAMA_URL
llama_stack/cli/stack/run.py: "--env",
llama_stack/cli/stack/run.py: run_args.extend(["--env", f"{key}={value}"])
llama_stack/core/build.py: "--env-name",
llama_stack/core/build_venv.sh: echo "Usage: $0 --env-name <env_name> --normal-deps <pip_dependencies> [--external-provider-deps <external_provider_deps>] [--optional-deps <special_pip_deps>]"
llama_stack/core/build_venv.sh: echo "Example: $0 --env-name mybuild --normal-deps 'numpy pandas scipy' --external-provider-deps 'foo' --optional-deps 'bar'"
llama_stack/core/build_venv.sh: --env-name)
llama_stack/core/build_venv.sh: echo "Error: --env-name requires a string value" >&2
llama_stack/core/build_venv.sh: echo "Error: --env-name and --normal-deps are required." >&2
llama_stack/core/server/server.py: "--env",
llama_stack/core/start_stack.sh: echo "Usage: $0 <env_type> <env_path_or_name> <port> [--config <yaml_config>] [--env KEY=VALUE]..."
llama_stack/core/start_stack.sh: --env)
llama_stack/core/start_stack.sh: env_vars="$env_vars --env $2"
llama_stack/core/start_stack.sh: echo -e "${RED}Error: --env requires a KEY=VALUE argument${NC}" >&2
llama_stack/distributions/dell/doc_template.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md: --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md: --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/dell/doc_template.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md: --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md: --env SAFETY_MODEL=$SAFETY_MODEL \
llama_stack/distributions/dell/doc_template.md: --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
llama_stack/distributions/dell/doc_template.md: --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/dell/doc_template.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md: --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md: --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/dell/doc_template.md: --env INFERENCE_MODEL=$INFERENCE_MODEL \
llama_stack/distributions/dell/doc_template.md: --env DEH_URL=$DEH_URL \
llama_stack/distributions/dell/doc_template.md: --env SAFETY_MODEL=$SAFETY_MODEL \
llama_stack/distributions/dell/doc_template.md: --env DEH_SAFETY_URL=$DEH_SAFETY_URL \
llama_stack/distributions/dell/doc_template.md: --env CHROMA_URL=$CHROMA_URL
llama_stack/distributions/meta-reference-gpu/doc_template.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
llama_stack/distributions/meta-reference-gpu/doc_template.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llama_stack/distributions/meta-reference-gpu/doc_template.md: --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
llama_stack/distributions/meta-reference-gpu/doc_template.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
llama_stack/distributions/meta-reference-gpu/doc_template.md: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llama_stack/distributions/meta-reference-gpu/doc_template.md: --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
llama_stack/distributions/nvidia/doc_template.md: --env NVIDIA_API_KEY=$NVIDIA_API_KEY
llama_stack/distributions/nvidia/doc_template.md: --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
llama_stack/distributions/nvidia/doc_template.md: --env INFERENCE_MODEL=$INFERENCE_MODEL
llama_stack/env.py: f"\n3. Pass directly to pytest: pytest --env {key}=your-key"
scripts/install.sh: --env OLLAMA_URL="http://ollama-server:${OLLAMA_PORT}")
tests/README.md:- Any API keys you need to use should be set in the environment, or can be passed in with the --env option.
tests/integration/README.md:- `--env`: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.
tests/integration/conftest.py: env_vars = config.getoption("--env") or []
tests/integration/conftest.py: parser.addoption("--env", action="append", help="Set environment variables, e.g. --env KEY=value")
@leseb I've removed the --env deprecation out of this PR for ease of reviewing :) |
# What does this PR do? ## Test Plan
Makes sense! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled and testing locally with a few openai chat completion calls. Thanks!
0.3.0 with llamastack/llama-stack#3625 forces us to use "llama stack run" and the server module doesn't execute the server anymore. Signed-off-by: Sébastien Han <seb@redhat.com>
The next 0.3.0 version will have a different way to run the server. The server module does not run the server anymore. It is happening via ``` llama stack run <path to config> ``` For more details llamastack/llama-stack#3625 Signed-off-by: Sébastien Han <seb@redhat.com>
The next 0.3.0 version will have a different way to run the server. The server module does not run the server anymore. It is happening via ``` llama stack run <path to config> ``` For more details llamastack/llama-stack#3625 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated container entrypoint to launch via the CLI, providing more consistent startup behavior across environments. - Preserves existing configuration (same run YAML path); no action required from users. - May result in slightly different log format and signal handling during startup/shutdown. - No changes to user-facing APIs or features. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
0.3.0 with llamastack/llama-stack#3625 forces us to use "llama stack run" and the server module doesn't execute the server anymore. Signed-off-by: Sébastien Han <seb@redhat.com>
0.3.0 with llamastack/llama-stack#3625 forces us to use "llama stack run" and the server module doesn't execute the server anymore. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 90365e7)
What does this PR do?
#3462 allows using uvicorn to start llama stack server which supports spawning multiple workers.
This PR enables us to launch >1 workers from
llama stack run
(will add the parameter in a follow-up PR, keeping this PR on simplifying) by removing the old way of launching stack server and consolidates launching via uvicorn.run only.Test Plan
ran
llama stack run starter
CI