Skip to content

feat: IPXE OS using templates and designated arguments/artifacts #434

Open
pbreton wants to merge 10 commits intoNVIDIA:mainfrom
pbreton:feat/ipxe-template-renderer
Open

feat: IPXE OS using templates and designated arguments/artifacts #434
pbreton wants to merge 10 commits intoNVIDIA:mainfrom
pbreton:feat/ipxe-template-renderer

Conversation

@pbreton
Copy link
Contributor

@pbreton pbreton commented Mar 3, 2026

Description

Add a third variant for Operating Systems using iPXE script templates and allowing fine control on arguments and artifacts.
This is to enable local caching in the future and tighter control on Operating systems for DGX OS.
Remains compatible with existing code.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes: NO

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Signed-off-by: Patrice Breton <pbreton@nvidia.com>

# Conflicts:
#	Cargo.lock
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Copilot AI review requested due to automatic review settings March 3, 2026 22:22
@pbreton pbreton requested a review from a team as a code owner March 3, 2026 22:22
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pbreton pbreton changed the title feat: IPXE OS using templates and designated arguments and artifacts feat: IPXE OS using templates and designated arguments/artifacts Mar 3, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “template-based iPXE OS definition” variant to the OS model and RPC API, backed by a new carbide-ipxe-renderer crate that renders iPXE scripts from embedded YAML templates (with parameter/artifact validation and hashing). This is intended to enable finer control over OS boot scripts and future artifact caching/proxying while remaining compatible with existing OS flows.

Changes:

  • Extend Forge RPC/proto + RBAC with iPXE template discovery APIs (GetIpxeTemplate, ListIpxeTemplates) and a new OperatingSystem oneof variant (ipxe_os_def).
  • Introduce crates/ipxe-renderer with embedded templates.yaml and a renderer implementation + unit tests.
  • Wire template rendering into PXE instruction generation for the new OperatingSystemVariant::IpxeOsDefinition.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
crates/rpc/proto/forge.proto Adds new RPCs and proto messages/enums for template-based iPXE OS definitions and template listing.
crates/rpc/build.rs Adds serde derives for newly introduced Forge proto types.
crates/ipxe-renderer/src/lib.rs Implements template loading/rendering, validation, hashing, and local-URL fabrication; adds unit tests.
crates/ipxe-renderer/templates.yaml Defines the embedded iPXE script templates.
crates/ipxe-renderer/README.md Documents renderer usage, template system, and validation.
crates/ipxe-renderer/Cargo.toml New crate manifest for the renderer.
crates/api/src/ipxe.rs Adds rendering path for OperatingSystemVariant::IpxeOsDefinition in PXE instructions.
crates/api/src/api.rs Implements GetIpxeTemplate / ListIpxeTemplates RPC handlers backed by the renderer.
crates/api/src/auth/internal_rbac_rules.rs Grants permissions for the new template RPCs.
crates/api/src/web/instance.rs Adds a match arm for the new RPC OS variant (currently returns default).
crates/api-model/src/os.rs Adds the new OS variant and RPC conversion logic.
crates/api-db/src/lib.rs Exposes a new ipxe_os_definition DB module (but see review comments).
crates/api-db/src/instance.rs Adds match arms for the new OS variant during persistence (currently a no-op).
crates/api/Cargo.toml / crates/api-db/Cargo.toml / Cargo.lock Wires in the new renderer crate dependency.
crates/admin-cli/src/instance/cmds.rs Displays the new OS variant in instance output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

os_ipxe_script = ipxe.ipxe_script.clone();
}
OperatingSystemVariant::OsImage(id) => os_image_id = Some(id),
OperatingSystemVariant::IpxeOsDefinition(_) => {}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IpxeOsDefinition variant is ignored during persistence ({}), which means instances will be written with empty os_ipxe_script and NULL os_image_id. On read-back this will look like an inline iPXE OS with an empty script, and the OS-definition ID is lost. Add a DB column (e.g., os_ipxe_os_definition_id) and persist it here (and in snapshot/reads) so the new variant round-trips correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +41
let mut ipxeos = IpxeOs {
id: "test-os".to_string(),
name: "Ubuntu 22.04".to_string(),
ipxe_template_name: "qcow-image".to_string(),
parameters: vec![
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README example doesn’t match the actual IpxeOs struct in src/lib.rs (e.g., it sets id, which is commented out / not a field). Update the example to use only real fields (name, description, hash, tenant_id, ipxe_template_name, parameters, artifacts, ...), otherwise users copying it will hit compile errors.

Copilot uses AI. Check for mistakes.
pub mod instance_network_config;
pub mod instance_type;
pub mod ip_allocator;
pub mod ipxe_os_definition;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub mod ipxe_os_definition; is added, but there is no corresponding ipxe_os_definition.rs (or ipxe_os_definition/mod.rs) in crates/api-db/src. As-is, this will fail to compile; either add the missing module file(s) or remove the module declaration until the implementation is included.

Suggested change
pub mod ipxe_os_definition;

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +274
for value in values {
if let Some(pos) = result.find(&placeholder) {
result.replace_range(pos..pos + placeholder.len(), value);
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placeholder substitution only replaces the first occurrence of each {{name}} (via find + replace_range). Several templates include the same placeholder multiple times (e.g., discovery-scout-aarch64-dpu uses {{base_url}} twice), which will leave unreplaced placeholders and cause render failures. Replace all occurrences (e.g., loop while find returns Some, or use replace when the same value should fill every occurrence).

Suggested change
for value in values {
if let Some(pos) = result.find(&placeholder) {
result.replace_range(pos..pos + placeholder.len(), value);
if values.len() == 1 {
// If there is a single value, replace all occurrences of this placeholder with it
let value = &values[0];
result = result.replace(&placeholder, value);
} else {
// If there are multiple values, replace one occurrence per value (preserve existing behavior)
for value in values {
if let Some(pos) = result.find(&placeholder) {
result.replace_range(pos..pos + placeholder.len(), value);
}

Copilot uses AI. Check for mistakes.
os_ipxe_script = ipxe.ipxe_script.clone();
}
OperatingSystemVariant::OsImage(id) => os_image_id = Some(id),
OperatingSystemVariant::IpxeOsDefinition(_) => {}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IpxeOsDefinition variant is ignored during persistence ({}), which means instances will be written with empty os_ipxe_script and NULL os_image_id. On read-back this will look like an inline iPXE OS with an empty script, and the OS-definition ID is lost. Add a DB column (e.g., os_ipxe_os_definition_id) and persist it here (and in snapshot/reads) so the new variant round-trips correctly.

Copilot uses AI. Check for mistakes.
os_ipxe_script = ipxe.ipxe_script.clone();
}
OperatingSystemVariant::OsImage(id) => os_image_id = Some(*id),
OperatingSystemVariant::IpxeOsDefinition(_) => {}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IpxeOsDefinition variant is ignored during persistence ({}), which means instances will be written with empty os_ipxe_script and NULL os_image_id. On read-back this will look like an inline iPXE OS with an empty script, and the OS-definition ID is lost. Add a DB column (e.g., os_ipxe_os_definition_id) and persist it here (and in snapshot/reads) so the new variant round-trips correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +415 to +418

let has_extra_params = ipxeos
.parameters
.iter()
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_extra_params treats any parameter not listed in required_params/reserved_params as an extra param, even when its value is empty. Since rendering already ignores empty values, this can incorrectly reject OS definitions (when the template lacks {{extra}}) due solely to empty optional inputs. Consider filtering has_extra_params to only count non-empty parameters.

Suggested change
let has_extra_params = ipxeos
.parameters
.iter()
// Only treat parameters with non-empty values as "provided" when checking for extras,
// since rendering already ignores empty-valued parameters.
let has_extra_params = ipxeos
.parameters
.iter()
.filter(|p| !p.value.is_empty())

Copilot uses AI. Check for mistakes.
pbreton and others added 3 commits March 3, 2026 16:00
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: pbreton <pbreton@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: pbreton <pbreton@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: pbreton <pbreton@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants