Implement file processor#690
Conversation
jem-davies
left a comment
There was a problem hiding this comment.
I see that you have altered it to draft after I started to take a look...
But as well as the comments - what about thinking about the shutdown procedure - i.e. reading from a big file and Bento is shutdown - will that happen gracefully -
What about the atomicity of the operations too?
|
@jem-davies should the read method use a scanner like the file input? I seems that scanner impl handles the prob with large files. Reading in chunks etc. |
Yeah that's what I was thinking 👍 |
2627bd5 to
53f70d5
Compare
|
For gracefull shutdown during large file reads i changed the read method to use a scanner. |
53f70d5 to
6ec3a54
Compare
| Description("The file operation to perform."). | ||
| Examples("read", "write", "delete", "move", "rename", "stat"), |
There was a problem hiding this comment.
| Description("The file operation to perform."). | |
| Examples("read", "write", "delete", "move", "rename", "stat"), | |
| Description("The file operation to perform."), |
nit: remove these examples - the options are already displayed on the docs page.
| Description("The file operation to perform."). | ||
| Examples("read", "write", "delete", "move", "rename", "stat"), | ||
| service.NewInterpolatedStringField(fileProcessorFieldPath). | ||
| Description("The source file path. Supports interpolation for dynamic paths."). |
There was a problem hiding this comment.
| Description("The source file path. Supports interpolation for dynamic paths."). | |
| Description("The source file path."). |
It is already mentioned automatically for NewInterpolatedStringFields.
| "/tmp/${! json(\"document.id\") }.txt", | ||
| ).LintRule(`if this == "" { [ "'path' must be set to a non-empty string" ] }`), | ||
| service.NewInterpolatedStringField(fileProcessorFieldDest). | ||
| Description("The destination path for 'move' and 'rename' operations. Supports interpolation."). |
There was a problem hiding this comment.
| Description("The destination path for 'move' and 'rename' operations. Supports interpolation."). | |
| Description("The destination path for 'move' and 'rename' operations."). |
| fileProcessorFieldDest = "dest" | ||
| fileProcessorFieldScanner = "scanner" | ||
| ) | ||
|
|
There was a problem hiding this comment.
nit: Add const for the operations options - and then reference them later
| fileProcessorFieldPath = "path" | ||
| fileProcessorFieldDest = "dest" |
There was a problem hiding this comment.
| fileProcessorFieldPath = "path" | |
| fileProcessorFieldDest = "dest" | |
| fileProcessorFieldPath = "source_path" | |
| fileProcessorFieldDest = "destination_path" |
nit
| if err := ackFn(ctx, nil); err != nil { | ||
| p.log.Warnf("Failed to acknowledge scanner batch: %v", err) | ||
| } |
There was a problem hiding this comment.
| if err := ackFn(ctx, nil); err != nil { | |
| p.log.Warnf("Failed to acknowledge scanner batch: %v", err) | |
| } |
Replace with the defer file.Close()
There was a problem hiding this comment.
will all scanners move on to NextBatch if we fail to ack the one we are processing?
| } | ||
|
|
||
| func (p *fileProcessor) processWrite(_ context.Context, msg *service.Message) (service.MessageBatch, error) { | ||
| path, err := p.conf.Path.TryString(msg) |
There was a problem hiding this comment.
Just noting again regarding path / dest or source_path / destination_path - here we are writing to path and not dest - which is fine. But worth documenting this in the operations section of the docs imo.
| } | ||
|
|
||
| // Use atomic write pattern: write to temp file, then rename | ||
| tempFile := path + ".tmp" |
There was a problem hiding this comment.
Would add some sort of random prefix to ".tmp" - so to make sure we don't get a collision.
|
|
||
| // atomicCopyAndDelete performs an atomic copy from src to dest and then deletes src | ||
| // This ensures that either the operation completes fully or leaves the source intact | ||
| func (p *fileProcessor) atomicCopyAndDelete(_ context.Context, srcPath, destPath string, msg *service.Message) (service.MessageBatch, error) { |
There was a problem hiding this comment.
We have context in the func signature but we aren't going to use it, and it's an unexported func so unclear why.
| } | ||
| defer srcFile.Close() | ||
|
|
||
| content, err := io.ReadAll(srcFile) |
There was a problem hiding this comment.
This is potentially going to lead to OOM issues - thinking that this could be re-worked using a streaming approach to writing to the tempFile - and then we could also observe context cancellation.
resolves #649