Updates Parse Runner Structure #70

gbdubs · 2023-11-20T22:12:17Z

No description provided.

bcspragu · 2023-11-21T01:41:58Z

cmd/runner/main.go

 func (h *handler) parsePortfolio(ctx context.Context, taskID task.ID, req *task.ParsePortfolioRequest) error {
 	// Load the portfolio from blob storage, place it in /mnt/raw_portfolios, where
 	// the `process_portfolios.R` script expects it to be.
-	for _, assetID := range req.AssetIDs {
-		srcURI := blob.Join(h.blob.Scheme(), h.sourcePortfolioContainer, assetID)


If we're not using the sourcePortfolioContainer any more, can you remove the flag and associated config + terraform?

But also, I like the previous structure better because it means the runner is only ever pulling from a single bucket, whereas moving it to an opaque blob means the caller can set it to whatever, which introduces a (small) new attack vector. What's the motivation for the opaque BlobURIs?

We could pursue the approach you describe here, but it'd require us fully reworking the blob storage architechure, and doing so in a way that isn't clearly better.

Why?

In order to know how to access or delete a blob we need to know the full URI, including the scheme, the container, and the blob ID. Since blobs are present in more context than just the source portfolio, we couldn't infer or extract that information into the code (without just duplicating the logic of what a blob is and how it is handled). Thus, we either can represent that in the DB as a full string in the DB, or chunk it into parts (perhaps represented by enum values for the portfolio container, ex), so that as you suggest here we would be more certain that we're always pulling from the same container.

That doesn't provide additional security, however, it just adds structure. That structure would make migrations harder, encourage introspection/more manual munging, and be more error prone in situations where they are read, deleted or retrieved.

bcspragu · 2023-11-21T01:44:04Z

cmd/runner/main.go

-				AssetIDs: req.AssetIDs,
-				Outputs:  out,
+				TaskID:  taskID,
+				Request: req,


Why duplicate the whole request in the response? Wouldn't it be better to just have a standard API interface between srv + runner like anything else, where we input the stuff we need and output the stuff we need?

Round-tripped data in an async request paradigm has huge benefits:

In this case, it removes the requirement we write down details of the task in our database, since the info is available on the task completing. This removes the need for (a) a DB write and (b) error handling in case the DB state mismatches with the async task state (as it could do in the case of a transaction failure, ex).

Doesn't require the async processing to know what is going to be important on the return trip (this turned out to be hugely important for a variety of things we built into async CM processing)

No code changes needed to round trip more data (though you still need a binary deploy)

With payloads as small as they are right now, there isn't a real cost (if that changes in the future I agree we can reevaluate)

In short this is the easier to maintain, more future proof way of doing it. As you say, premature optimization is at the heart of many issues.

cmd/runner/main.go

cmd/runner/taskrunner/taskrunner.go

Updates Parse Runner Structure

029c37d

gbdubs requested a review from bcspragu November 20, 2023 22:12

bcspragu approved these changes Nov 21, 2023

View reviewed changes

gbdubs added 2 commits December 6, 2023 08:36

Merge branch 'main' into grady/runner

f0ac865

Addresses Review Comments

5ba1882

gbdubs enabled auto-merge (squash) December 6, 2023 15:49

gbdubs merged commit 9e88ed4 into main Dec 6, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates Parse Runner Structure #70

Updates Parse Runner Structure #70

gbdubs commented Nov 20, 2023

bcspragu Nov 21, 2023

gbdubs Dec 6, 2023

bcspragu Nov 21, 2023

gbdubs Dec 6, 2023

Updates Parse Runner Structure #70

Updates Parse Runner Structure #70

Conversation

gbdubs commented Nov 20, 2023

bcspragu Nov 21, 2023

Choose a reason for hiding this comment

gbdubs Dec 6, 2023

Choose a reason for hiding this comment

bcspragu Nov 21, 2023

Choose a reason for hiding this comment

gbdubs Dec 6, 2023

Choose a reason for hiding this comment