-
Notifications
You must be signed in to change notification settings - Fork 4
2gb upload limit fix #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
2gb upload limit fix #352
Conversation
demiankatz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Jason-Benson, great work -- this looks like a solid solution to the problem. See below for the usual nitpicks and suggestions, many of which are just about possibly removing log messages that are no longer needed. :-)
| } | ||
|
|
||
| async addMasterMetadataDatastream(): Promise<void> { | ||
| console.log(`Adding master metadata datastream to ${this.pid}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this console.log?
| await fedoraObject.addMasterMetadataDatastream(contentFile.name); | ||
| fs.truncateSync(contentFile.name, 0); | ||
| fs.rmSync(contentFile.name); | ||
| console.log("FedoraObject.build: Done"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this one?
| params.logMessage ?? "Adding datastream " + id + " to " + this.pid + " with " + data.length + " bytes", | ||
| ); | ||
| await this.fedora.addDatastream(this.pid, id, params, data, expectedStatus); | ||
| console.log(`Added datastream ${id} to ${this.pid}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to remove all the console.log messages from this file as well? If you want to keep any of these messages in the long term, it would be better in this file to call this.log() which will send the message to the job processing log instead of to the console.
| return this.fedora.getDatastreamAsBuffer(this.pid, datastream); | ||
| } | ||
|
|
||
| async getDatastreamToTempFile(datastream: string, treatMissingAsEmpty = false): Promise<string> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason the method is called getDatastreamToTempFile here but downloadDatastreamToTempFile in the Fedora class? I'd be inclined to use the "download" name in both places for consistency, though you can persuade me otherwise if you kept them separate for a specific reason. :-)
|
|
||
| return http(method, url, data, options).catch((err) => { | ||
| console.error(`Request failed for ${method.toUpperCase()} ${url}:`, err); | ||
| console.error("Full error:", JSON.stringify(err, Object.getOwnPropertyNames(err), 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is JSON.stringify needed here? I thought that console.error should be able to handle a raw exception object appropriately... though maybe this makes it less verbose?
| } else { | ||
| // For string/Buffer payloads, compute digests here. | ||
| if (typeof data === "string" || Buffer.isBuffer(data)) { | ||
| const md5 = crypto.createHash("md5").update(data).digest("hex"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to have lost the sha512 hash here as well.
| username: this.config.fedoraUsername, | ||
| password: this.config.fedoraPassword, | ||
| }; | ||
| const options = Object.assign({}, auth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Object.assign in other parts of the code to combine multiple objects together, but since there is only one object here (auth), I think this call is unnecessary. You could just assign the values directly to an options variable and skip the assign step.
| */ | ||
| async downloadDatastreamToTempFile(pid: string, datastream: string, treatMissingAsEmpty = false): Promise<string> { | ||
| const path = pid + "/" + datastream; | ||
| const urlPath = path[0] == "/" ? path.slice(1) : path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this slash-trimming step is necessary here. Looks like you've based this code on the generic _request method, but since a pid will never start with a slash, I don't think you need to mirror all the logic of the more generic method.
| return result; | ||
| } | ||
|
|
||
| extractTextFromFile(filename: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems to be copied from the extractText method above -- so I'd suggest calling it from extractText at the appropriate point to reduce redundancy.
I also wonder if it would be better NOT to have this method truncate and delete the file -- that could be an unexpected side effect if this is used in a different context. Maybe that logic should happen outside of this method.
| "mysql2": "^3.16", | ||
| "n3": "^2", | ||
| "nanoid": "^5.1.6", | ||
| "n3": "^1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why these dependencies have been downgraded here; I suspect this may be some kind of merge issue and not intended as part of the PR.
To get around node's 2gb limit on handling, I reworked the ingest and metadata handling to stream the files in chunks.