A Rust tool for topology-aware node selection on HPC systems (SLURM).
Compile the project to generate the optimized binary:
cargo build --releaseThe binary is located at ./target/release/job_placer.
JobPlacer doesn't just look at a list of nodes; it evaluates them based on a Starting Node (the anchor) and how they connect through the network tree.
The anchor node is automatically selected. If you are under a SLURM allocation or if you specify an explicit nodelist, anchors will be sampled from that pool of nodes. All other nodes are then checked against this anchor to see if they meet your requirements.
Distance is simply the number of "steps" or "hops" between the Anchor and another node.
Define your topology requirements in a JSON file.
Constraints allow selecting nodes based on network distance (hops).
This is a rule that forces the search to stay within a specific "branch" or "box" in the network tree.
- The Constraint: When you request nodes at a certain distance, you can also require that they all share the same Parent.
- Flexible Levels: This parent can be a small local switch or a large group switch.
| Feature | Definition | Simple Meaning |
|---|---|---|
| Starting Node | The Anchor | The central point used to measure everyone else. |
| Distance | Number of Hops | How many steps away from the anchor is this node? |
| Shared Parent | Common Group | Do these nodes share the same "ancestor" switch? |
The following example query, starting from an anchor node, is searching for 2 nodes on the same L1 switch (distance 2: Anchor -> L1 switch -> TargetNode). The second constraint looks for an additional node ad distance 4 (Anchor -> L1 switch #1 -> L2 switch -> L1 switch #2 -> TargetNode).
{
"constraints": [
{
"type": "NodesAtDistance",
"count": 2,
"distance": 2,
"reference": "First"
},
{
"type": "NodesAtDistance",
"count": 1,
"distance": 4,
"reference": "First"
}
]
}Required Constraint Fields
type: Constraint type (see below).count: Number of nodes to select.distance: Required network distance (in hops).reference: Reference node for distance calculation (e.g., "First").
Constraint Types
NodesAtDistance: Selects nodes at the specified distance from the anchor node.NodesAtDistanceWithSharedParent: Selects nodes at the specified distance that also share the same parent at a given topology level. Additional required field:parent_level.
JobPlacer is designed to run inside an allocation. It reads the nodes SLURM gave you, finds the most efficient subset, and exports them.
#!/bin/bash
#SBATCH -A IscrC_OMG-25
#SBATCH -p boost_usr_prod
#SBATCH --time=00:01:00
#SBATCH -N 8
#SBATCH --ntasks-per-node=1
#SBATCH --job-name=RustTopo
#SBATCH --output=topo_%j.out
# ==============================================================================
# STEP 1: Run Topology Selection (Rust Binary)
# ==============================================================================
echo "--- Searching for optimal topology within $SLURM_JOB_NUM_NODES nodes ---"
echo "Allocated nodes: $SLURM_JOB_NODELIST"
# Run the Rust tool and capture ALL output into a variable
TOPO_OUTPUT=$(./target/release/job_placer -s leonardo --topology-scontrol <(cat <<EOF
{
"constraints": [
{
"type": "NodesAtDistance",
"count": 1,
"distance": 2,
"reference": "First"
}
]
}
EOF
))
TOPO_EXIT_CODE=$?
echo "Tool exit code: $TOPO_EXIT_CODE"
echo "Tool output: $TOPO_OUTPUT"
# ==============================================================================
# STEP 2: Verify & Execute
# ==============================================================================
# Check if the command failed
if [ $TOPO_EXIT_CODE -ne 0 ]; then
echo "❌ ERROR: Topology search failed with exit code $TOPO_EXIT_CODE."
exit $TOPO_EXIT_CODE
fi
SELECTED_NODES="$TOPO_OUTPUT"
echo "✅ SUCCESS! Selected Nodes: $SELECTED_NODES"
echo "--- Launching Application on Selected Nodes ---"
# Launch the actual job ONLY on the selected nodes
# -w : Forces Slurm to run on this specific list
# -N : Must match the number of nodes you found
# --exclusive : Ensures pure dedicated access
srun -N 2 -w "$SELECTED_NODES" --exclusive \
bash -c "echo I am running on a selected node: \$(hostname)"
echo "--- Done ---"
When you check your Slurm output (cat topo_*.out), you will see:
| Result | Meaning |
|---|---|
✅ SUCCESS! ... |
The tool found nodes matching your JSON criteria. |
❌ ERROR: ... |
No subset of your allocation satisfies the distance/parent constraints. |
Tip: If it fails to find nodes, try increasing the number of nodes requested in your #SBATCH -N header to give the tool a larger search space.
TODO describe job_placer_placement_classes bin.