Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexibility refactor #9

Merged
merged 24 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,488 changes: 1,488 additions & 0 deletions 3tier-fabric.excalidraw

Large diffs are not rendered by default.

Binary file added 3tier-fabric.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ build: ## compile all .proto files and generate artifacts
mkdir -p $(GENERATED_DIR)
python3 -m grpc_tools.protoc \
--proto_path=./protos \
--python_out=$(GENERATED_DIR) --pyi_out=$(GENERATED_DIR) \
et_def.proto infra.proto
--python_out=$(GENERATED_DIR) \
--pyi_out=$(GENERATED_DIR) \
--grpc_python_out=$(GENERATED_DIR) \
et_def.proto infra.proto bind.proto service.proto
python3 -m pip uninstall -y keysight-chakra
python3 setup.py bdist_wheel
python3 -m pip install --no-cache .
Expand Down
202 changes: 195 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,197 @@
# Infrastructure as a graph
# Infrastructure as a Graph

Predefined Infrastructure as a graph includes the following:
Create infrastructure as a graph using messages from infra.proto. The messages allow a user to easily create logical infrastructure as vertexes and edges and scale it up and scale it out without duplicating content.

- a Generic host package
- a ZionEx host package
- a generic rack switch package
- a generic pod switch package
- tests showing how to create infrastructure
Submit it to the infrastructure `create` API to validate and store the data in a cogdb inmemory data store.

Use the `query` api to find paths from npu to npu.

![Process](process.png)

Given the following logical `3 tier fabric`

![3tierfabric](3tier-fabric.png)

the following steps illustrate how to create infrastructure as a graph consisting of `dgxa100` and `tomahawk3` devices as hosts and switches:

- [Create devices](#create-a-device-inventory)
- [Scale out the devices](#create-device-instances)
- [Connect the devices](#connect-device-instances)
- [Optionally extend the graph](#extending-infrastructure-as-a-graph)
- [Validate the graph](#validate-the-infrastructure)
- [Query the graph](#query-the-infrastructure)

## Create a Device Inventory

A device inventory is designed to define the device components, links and connections once and that device can subsequently be reused in a `DeviceInstance` message to scale out the device under different names.

> Note that the entire device does not need to be described in full detail. The level of device detail should be dictated by the needs of the application.

- use the `Component` message to define individual components (vertexes) that are present in a device
- use the `Component.count` field to scale up the number of components in the device
- use the `Link` message to define different links within the device
- use the `Device` message to contain `Component` and `Link` messages
- use the `Device.connections` field to connect components (vertexes) to each other with an associated link to form an edge
- the format of a `connections` string is described in the infra.proto file

### DgxA100 device

```yaml
infrastructure:
inventory:
devices:
- name: dgxa100
components:
- name: a100
count: 8
npu:
memory:
- name: nvsw
count: 6
switch:
nvswitch:
- name: nic
count: 8
nic:
ethernet:
- name: pciesw
count: 4
switch:
pcie:
links:
- name: nvlink
description: NVLink 3.0, 25GBs/50GBs bidirectional
bandwidth:
gBs: 50
- name: pcie
description: PCI Express x16 (gen 4.0)
bandwidth:
gBs: 24.7
connections:
- a100.[:].nvlink.nvsw.[:].MTM
- a100.[0:1].pcie.pciesw.0.MTO
- nic.[0:1].pcie.pciesw.0.MTO
```

### Tomahawk3 device

```yaml
infrastructure:
inventory:
devices:
- name: th3
components:
- name: asic
count: 1
custom:
memory:
- name: nic
count: 32
nic:
ethernet:
links:
- name: mii
custom:
bandwidth:
connections:
- nic.[:].mii.asic.0
```

## Create Device Instances

Scale out the infrastructure by using the `DeviceInstance` message and appending instances of them to the `Infrastructure.device_instances` field.

- the `DeviceInstance.name` field is a unique key that enables reuse of device inventory under different names
- the `DeviceInstance.device` field is a name that must exist in the `Infrastructure.inventory.devices` map key

```yaml
infrastructure:
device_instances:
- name: host
device: dgxa100
count: 4
- name: racksw
device: th3
count: 4
- name: podsw
device: th3
count: 2
- name: spinesw
device: th3
count: 2
```

## Connect Device Instances

Connect device instances using the `Infrastructure.connections` field to create the `3 tier fabric` as a graph.

```yaml
infrastructure:
connections:
- host.0.nic.[0:8].eth.racksw.0.nic.[0:8].OTO
- host.1.nic.[0:8].eth.racksw.0.nic.[0:8].OTO
- host.2.nic.[0:8].eth.racksw.0.nic.[0:8].OTO
- host.3.nic.[0:8].eth.racksw.0.nic.[0:8].OTO
- racksw.0.nic.[8:11].eth.podsw.0.nic.[0:8].OTO
- racksw.1.nic.[8:11].eth.podsw.0.nic.[0:8].OTO
- racksw.2.nic.[8:11].eth.podsw.1.nic.[0:8].OTO
- racksw.3.nic.[8:11].eth.podsw.1.nic.[0:8].OTO
```

## Extending Infrastructure as a Graph

Use the Bindings message in bind.proto to extend the logical infrastructure by adding any data that is outside the scope of the graph.

This is done by binding logical endpoints to any data such as:

### Global Metadata

```yaml
bindings:
- targets:
- infrasrtucture: Metadata
data:
name: DeviceTypes
description: Key value metadata map of a user specified device type to an infrastructure inventory device name
value:
- @type: type.googleapis.com/google.protobuf.Struct
- device_types:
- key: host
value: dgxa100
- key: switch
value: th3
```

### Physical Configuration

```yaml
bindings:
- targets:
- device_instance: racksw
- device_instance_index: podsw.0
- device_instance: spinesw
data:
name: OpenConfigInterface
description: Switch configuration
value:
- @type: type.googleapis.com/google.protobuf.Struct
- config:
- type: ...
- mtu: ...
- loopback-mode: ...
- enabled: ...
```

### Application Configuration

```yaml

```

## Validate the Infrastructure

Explodes any connection shortcuts and validates the overall graph to ensure referential integrity.

## Query the Infrastructure

Use queries to extract paths.
Binary file added dgxa100-schematic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading