feat: IPXE OS using templates and designated arguments/artifacts #434
feat: IPXE OS using templates and designated arguments/artifacts #434pbreton wants to merge 10 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Patrice Breton <pbreton@nvidia.com> # Conflicts: # Cargo.lock
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
There was a problem hiding this comment.
Pull request overview
Adds a new “template-based iPXE OS definition” variant to the OS model and RPC API, backed by a new carbide-ipxe-renderer crate that renders iPXE scripts from embedded YAML templates (with parameter/artifact validation and hashing). This is intended to enable finer control over OS boot scripts and future artifact caching/proxying while remaining compatible with existing OS flows.
Changes:
- Extend Forge RPC/proto + RBAC with iPXE template discovery APIs (
GetIpxeTemplate,ListIpxeTemplates) and a newOperatingSystemoneof variant (ipxe_os_def). - Introduce
crates/ipxe-rendererwith embeddedtemplates.yamland a renderer implementation + unit tests. - Wire template rendering into PXE instruction generation for the new
OperatingSystemVariant::IpxeOsDefinition.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
crates/rpc/proto/forge.proto |
Adds new RPCs and proto messages/enums for template-based iPXE OS definitions and template listing. |
crates/rpc/build.rs |
Adds serde derives for newly introduced Forge proto types. |
crates/ipxe-renderer/src/lib.rs |
Implements template loading/rendering, validation, hashing, and local-URL fabrication; adds unit tests. |
crates/ipxe-renderer/templates.yaml |
Defines the embedded iPXE script templates. |
crates/ipxe-renderer/README.md |
Documents renderer usage, template system, and validation. |
crates/ipxe-renderer/Cargo.toml |
New crate manifest for the renderer. |
crates/api/src/ipxe.rs |
Adds rendering path for OperatingSystemVariant::IpxeOsDefinition in PXE instructions. |
crates/api/src/api.rs |
Implements GetIpxeTemplate / ListIpxeTemplates RPC handlers backed by the renderer. |
crates/api/src/auth/internal_rbac_rules.rs |
Grants permissions for the new template RPCs. |
crates/api/src/web/instance.rs |
Adds a match arm for the new RPC OS variant (currently returns default). |
crates/api-model/src/os.rs |
Adds the new OS variant and RPC conversion logic. |
crates/api-db/src/lib.rs |
Exposes a new ipxe_os_definition DB module (but see review comments). |
crates/api-db/src/instance.rs |
Adds match arms for the new OS variant during persistence (currently a no-op). |
crates/api/Cargo.toml / crates/api-db/Cargo.toml / Cargo.lock |
Wires in the new renderer crate dependency. |
crates/admin-cli/src/instance/cmds.rs |
Displays the new OS variant in instance output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| os_ipxe_script = ipxe.ipxe_script.clone(); | ||
| } | ||
| OperatingSystemVariant::OsImage(id) => os_image_id = Some(id), | ||
| OperatingSystemVariant::IpxeOsDefinition(_) => {} |
There was a problem hiding this comment.
The IpxeOsDefinition variant is ignored during persistence ({}), which means instances will be written with empty os_ipxe_script and NULL os_image_id. On read-back this will look like an inline iPXE OS with an empty script, and the OS-definition ID is lost. Add a DB column (e.g., os_ipxe_os_definition_id) and persist it here (and in snapshot/reads) so the new variant round-trips correctly.
| let mut ipxeos = IpxeOs { | ||
| id: "test-os".to_string(), | ||
| name: "Ubuntu 22.04".to_string(), | ||
| ipxe_template_name: "qcow-image".to_string(), | ||
| parameters: vec![ |
There was a problem hiding this comment.
The README example doesn’t match the actual IpxeOs struct in src/lib.rs (e.g., it sets id, which is commented out / not a field). Update the example to use only real fields (name, description, hash, tenant_id, ipxe_template_name, parameters, artifacts, ...), otherwise users copying it will hit compile errors.
| pub mod instance_network_config; | ||
| pub mod instance_type; | ||
| pub mod ip_allocator; | ||
| pub mod ipxe_os_definition; |
There was a problem hiding this comment.
pub mod ipxe_os_definition; is added, but there is no corresponding ipxe_os_definition.rs (or ipxe_os_definition/mod.rs) in crates/api-db/src. As-is, this will fail to compile; either add the missing module file(s) or remove the module declaration until the implementation is included.
| pub mod ipxe_os_definition; |
| for value in values { | ||
| if let Some(pos) = result.find(&placeholder) { | ||
| result.replace_range(pos..pos + placeholder.len(), value); |
There was a problem hiding this comment.
Placeholder substitution only replaces the first occurrence of each {{name}} (via find + replace_range). Several templates include the same placeholder multiple times (e.g., discovery-scout-aarch64-dpu uses {{base_url}} twice), which will leave unreplaced placeholders and cause render failures. Replace all occurrences (e.g., loop while find returns Some, or use replace when the same value should fill every occurrence).
| for value in values { | |
| if let Some(pos) = result.find(&placeholder) { | |
| result.replace_range(pos..pos + placeholder.len(), value); | |
| if values.len() == 1 { | |
| // If there is a single value, replace all occurrences of this placeholder with it | |
| let value = &values[0]; | |
| result = result.replace(&placeholder, value); | |
| } else { | |
| // If there are multiple values, replace one occurrence per value (preserve existing behavior) | |
| for value in values { | |
| if let Some(pos) = result.find(&placeholder) { | |
| result.replace_range(pos..pos + placeholder.len(), value); | |
| } |
| os_ipxe_script = ipxe.ipxe_script.clone(); | ||
| } | ||
| OperatingSystemVariant::OsImage(id) => os_image_id = Some(id), | ||
| OperatingSystemVariant::IpxeOsDefinition(_) => {} |
There was a problem hiding this comment.
The IpxeOsDefinition variant is ignored during persistence ({}), which means instances will be written with empty os_ipxe_script and NULL os_image_id. On read-back this will look like an inline iPXE OS with an empty script, and the OS-definition ID is lost. Add a DB column (e.g., os_ipxe_os_definition_id) and persist it here (and in snapshot/reads) so the new variant round-trips correctly.
| os_ipxe_script = ipxe.ipxe_script.clone(); | ||
| } | ||
| OperatingSystemVariant::OsImage(id) => os_image_id = Some(*id), | ||
| OperatingSystemVariant::IpxeOsDefinition(_) => {} |
There was a problem hiding this comment.
The IpxeOsDefinition variant is ignored during persistence ({}), which means instances will be written with empty os_ipxe_script and NULL os_image_id. On read-back this will look like an inline iPXE OS with an empty script, and the OS-definition ID is lost. Add a DB column (e.g., os_ipxe_os_definition_id) and persist it here (and in snapshot/reads) so the new variant round-trips correctly.
|
|
||
| let has_extra_params = ipxeos | ||
| .parameters | ||
| .iter() |
There was a problem hiding this comment.
has_extra_params treats any parameter not listed in required_params/reserved_params as an extra param, even when its value is empty. Since rendering already ignores empty values, this can incorrectly reject OS definitions (when the template lacks {{extra}}) due solely to empty optional inputs. Consider filtering has_extra_params to only count non-empty parameters.
| let has_extra_params = ipxeos | |
| .parameters | |
| .iter() | |
| // Only treat parameters with non-empty values as "provided" when checking for extras, | |
| // since rendering already ignores empty-valued parameters. | |
| let has_extra_params = ipxeos | |
| .parameters | |
| .iter() | |
| .filter(|p| !p.value.is_empty()) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: pbreton <pbreton@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: pbreton <pbreton@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: pbreton <pbreton@users.noreply.github.com>
Description
Add a third variant for Operating Systems using iPXE script templates and allowing fine control on arguments and artifacts.
This is to enable local caching in the future and tighter control on Operating systems for DGX OS.
Remains compatible with existing code.
Type of Change
Related Issues (Optional)
Breaking Changes: NO
Testing