Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions context/archive/reservation/context-scheduling-logic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Scheduling Logic and Resource Cluster Interactions

This document explains how job creation and job scaling translate into scheduling and resource changes, and how those changes interact with the Resource Cluster and the Resource Cluster Scaler.

## A) Job Creation → Scheduling → Task Execution

```mermaid
sequenceDiagram
autonumber
participant Client
participant JCM as JobClustersManagerActor
participant JC as JobClusterActor
participant JA as JobActor
participant MS as MantisScheduler
participant RCSA as ResourceClusterAwareSchedulerActor
participant RC as ResourceClusterActor
participant TE as Task Executor

Client->>JCM: SubmitJobRequest
JCM->>JC: forward(request)
JC->>JA: InitJob (initialize workers)
JA->>MS: scheduleWorkers(BatchScheduleRequest)
MS->>RCSA: BatchScheduleRequestEvent
RCSA->>RC: getTaskExecutorsFor(allocationRequests)
RC-->>RCSA: allocations (TaskExecutorID -> AllocationRequest)
RCSA->>TE: submitTask(executeStageRequestFactory.of(...))
TE-->>RCSA: Ack / Failure
RCSA->>JA: WorkerLaunched / WorkerLaunchFailed
Note over RC: Inventory & matching by ExecutorStateManagerImpl
```

Key notes:
- Job submission enters at `JobClustersManagerActor.onJobSubmit`, forwarded to `JobClusterActor`, then `JobActor.initialize` constructs workers.
- `JobActor` enqueues workers via batch scheduling to the resource-cluster-aware scheduler.
- `ResourceClusterAwareSchedulerActor` queries `ResourceClusterActor` for best-fit executors and pushes tasks to TEs via `TaskExecutorGateway`.

## B) Job Stage Scale-Up/Down and Resource Cluster Scaler Loop

```mermaid
sequenceDiagram
autonumber
participant JA as JobActor
participant WM as WorkerManager
participant MS as MantisScheduler
participant RCSA as ResourceClusterAwareSchedulerActor
participant RC as ResourceClusterActor
participant RCS as ResourceClusterScalerActor
participant Host as ResourceClusterHostActor
participant TE as Task Executor

JA->>JA: onScaleStage(StageScalingPolicy, ScalerRule min/max)
JA->>WM: scaleStage(newNumWorkers)
alt Scale Up
WM->>MS: scheduleWorkers(BatchScheduleRequest)
MS->>RCSA: BatchScheduleRequestEvent
RCSA->>RC: getTaskExecutorsFor(...)
RC-->>RCSA: allocations or NoResourceAvailable
RCSA->>TE: submitTask(...)
note over RC: pendingJobRequests updated when not enough TEs
else Scale Down
WM->>RC: unscheduleAndTerminateWorker(...)
end

loop Periodic Capacity Loop (every scalerPullThreshold)
RCS->>RC: GetClusterUsageRequest
RC-->>RCS: GetClusterUsageResponse (idle/total by SKU)
RCS->>RCS: apply rules (minIdleToKeep, maxIdleToKeep, cooldown)
alt ScaleUp decision
RCS->>Host: ScaleResourceRequest(desired size increase)
Host-->>RC: New TEs provisioned
TE->>RC: TaskExecutorRegistration + Heartbeats
RC->>RC: mark available (unless disabled)
else ScaleDown decision
RCS->>RC: GetClusterIdleInstancesRequest
RC-->>RCS: idle TE list
RCS->>Host: ScaleResourceRequest(desired size decrease, idleInstances)
RCS->>RC: DisableTaskExecutorsRequest(idle targets during drain)
end
end
```

Key notes:
- Job scale-up adds workers; if TEs are insufficient, the scheduler records demand (pending) so the scaler sees fewer effective idle slots and scales up capacity.
- Scale-down removes workers and may free idle TEs; the scaler can shrink capacity by selecting idle instances to drain and disabling them during deprovisioning.

## Code Reference Map

- Submission & job initialization:
- `JobClustersManagerActor.onJobSubmit(...)`
- `JobActor.onJobInitialize(...)`, `initialize(...)`, `submitInitialWorkers(...)`, `queueTasks(...)`

- Scheduling (push model):
- `ResourceClusterAwareSchedulerActor.onBatchScheduleRequestEvent(...)`, `onAssignedScheduleRequestEvent(...)`
- `ResourceClusterActor` + `ExecutorStateManagerImpl.findBestFit(...)`, `findBestGroupBySizeNameMatch(...)`, `findBestGroupByFitnessCalculator(...)`

- Job scaling:
- `JobActor.onScaleStage(...)` → `WorkerManager.scaleStage(...)`

- Resource-cluster autoscaling:
- `ResourceClusterScalerActor.onTriggerClusterUsageRequest(...)`, `onGetClusterUsageResponse(...)`, `onGetClusterIdleInstancesResponse(...)`
- `ExecutorStateManagerImpl.getClusterUsage(...)` (idle = available − pending), `pendingJobRequests`
- `ResourceClusterActor.onTaskExecutorRegistration(...)` (TE joins, becomes available)

## Coupling Between Scheduling and Scaling

- When allocations fail due to lack of capacity, the resource cluster records pending demand per job/group; the scaler subtracts this pending from available to compute effective idle, triggering scale-up.
- New TEs register with the resource cluster and are immediately considered for subsequent scheduling waves.
Loading
Loading