Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Document #1

Merged
merged 6 commits into from
Mar 22, 2025
Merged

Design Document #1

merged 6 commits into from
Mar 22, 2025

Conversation

arazmj
Copy link
Owner

@arazmj arazmj commented Mar 20, 2025

No description provided.

DESIGN.md Outdated

GetJobStatus(JobStatusRequest) returns (JobStatusResponse)

GetJobLogs(JobLogsRequest) returns (JobLogsResponse)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the job is still running?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return the updated logs if the job is still running. (Similar behavior as kubectl)

DESIGN.md Outdated
* Job output is captured and stored in memory and sent to CLI clients.

### Job Execution:
* The process runs within its assigned cgroup.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the process added to the assigned cgroup?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify at what point the job PID is written to cgroup.procs? When does the child actually start executing, and when does it join the cgroup?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job is first executed, once we received the PID we add it to cgroup.procs

* The action will run if the client identity is defined in `sentry-roles.toml` file and the user has permission for the action.
* The server validates request and spawns a new process using `/bin/sh -c`.
* The process is assigned to a cgroup with defined CPU, memory, and I/O constraints.
* Job output is captured and stored in memory and sent to CLI clients.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the library informs the gRPC that new data was written by the job and needs to be forwarded to clients?
can you iterate a bit on it?

Copy link
Owner Author

@arazmj arazmj Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library has a public method StreamOutput() that takes a callback function as a parameter.

type OutputCallback func(data []byte) error
func (m *JobManager) StreamOutput(jobID int32, callback OutputCallback) error {}

The server will call this method with a function closure that calls the stream.

s.manager.StreamOutput(req.JobId, func(data []byte) error {
		return stream.Send(&pb.JobOutput{
			JobId:    req.JobId,
			Data:     data,
		})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one callback completes slowly, for example if the stream.Send() is on a slow connection?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the job started a running goroutine in the background reads the stdout/stdout and calls stream.Send() for each client them to clients.

DESIGN.md Outdated
### Job Initialization:
* User starts a job via CLI.
* The server authorizes the user against for the request against the roles defined in sentry-security.
* The client identity is the CN field of the CA
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which TLS parameters like version, cyphers... do you plan to use here?

Copy link
Owner Author

@arazmj arazmj Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use TLS1.3 with the default AEAD cipher suites and curve preferences.

var defaultCipherSuitesTLS13 = []uint16{
    TLS_AES_128_GCM_SHA256,
    TLS_AES_256_GCM_SHA384,
    TLS_CHACHA20_POLY1305_SHA256,
}
var defaultCurvePreferences = []CurveID{
    X25519,    // X25519 (ECDH/curve25519)
    CurveP256, // NIST P-256
    CurveP384, // NIST P-384
    CurveP521, // NIST P-521
}

rpc KillJob (KillJobRequest) returns (KillJobResponse) {}
rpc GetJobStatus (JobStatusRequest) returns (JobStatusResponse) {}
rpc StreamJobLogs (JobLogsRequest) returns (stream JobOutput) {}
rpc ListJobs (ListJobsRequest) returns (ListJobsResponse) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: remove support for listing - it is not a requirement for the challenge.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I need the job ids and the state of running jobs for testing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to get that from GetJobStatus though right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start request could also return the identifier instead of an empty response.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, removing it makes the testing more difficult while having the ListJob doesn't add so much to the code.
But I can still remove it if you have strong openion.

DESIGN.md Outdated
Comment on lines 76 to 77
bool is_running = 1;
string status = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use an enum instead of multiple fields?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed status field. It is not needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does is_running provide enough information about a job though? Is it not running because the job ended or because a user stopped it?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you explain how many states for status do we need?

  • Stopped
  • Finished
  • Running

DESIGN.md Outdated
* Job output is captured and stored in memory and sent to CLI clients.

### Job Execution:
* The process runs within its assigned cgroup.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify at what point the job PID is written to cgroup.procs? When does the child actually start executing, and when does it join the cgroup?

* The action will run if the client identity is defined in `sentry-roles.toml` file and the user has permission for the action.
* The server validates request and spawns a new process using `/bin/sh -c`.
* The process is assigned to a cgroup with defined CPU, memory, and I/O constraints.
* Job output is captured and stored in memory and sent to CLI clients.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one callback completes slowly, for example if the stream.Send() is on a slow connection?

DESIGN.md Outdated
```

## Job IDs
Jobs are identified using UUIDs rather than process IDs or sequential counters. This choice addresses several key requirements:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but it seems like some content is missing here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

}

message StartJobRequest {
string command = 1;
Copy link

@timothyb89 timothyb89 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this includes command + arguments, how will it be executed? Will you be parsing the string yourself? If so, can you think of any design tweaks that might remove the need to do so?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Ideally we want to be able to do something like sentry start -cmd bash -c 'i=1; while true; do echo "$i"; i=$((i+1)); sleep 1; done'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this still would involve some level of parsing on both client and server since command is just a single string value, right? Can you think of a change to the data model that might simplify this?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have some ideas. I can handle that. I will change the data type.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyb89 Would something like this be acceptable? Note the -- after bash.

sentry start -cmd bash -- -c 'i=1; while true; do echo "$i"; i=$((i+1)); sleep 1; done'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good start, but does the proto change at all to accommodate this?

Copy link
Owner Author

@arazmj arazmj Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will look like this

message StartJobRequest {
  string command = 1;
  repeated string commandArgs = 2;
}

DESIGN.md Outdated
}

message JobOutput {
int32 job_id = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is job_id still an int32?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

@arazmj arazmj merged commit 3a03702 into main Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants