Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable s3 tagging JVM option #10029

Merged
merged 10 commits into from
Apr 17, 2024
1 change: 1 addition & 0 deletions doc/release-notes/10022_upload_redirect_without_tagging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
If your S3 store does not support tagging and gives an error when redirecting uploads, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. Disabling the tagging makes it harder to identify abandoned files (created in cases where the user does not complete the upload operation) with an external script but they can still be removed using the [Cleanup Storage of a Dataset](https://guides.dataverse.org/en/5.13/api/native-api.html#cleanup-storage-of-a-dataset) API endpoint.
8 changes: 8 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -792,6 +792,13 @@ Larger installations may want to increase the number of open S3 connections allo

``./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"``

By default, when redirecting an upload to the S3 storage, Dataverse will place a ``temp`` tag on the file being uploaded for an easier cleanup if the file is not added to the dataset after upload (e.g., if the user cancels the operation).
If your S3 store does not support tagging and gives an error when redirecting uploads, you can disable that tag by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For example:

``./asadmin create-jvm-options "-Ddataverse.files.<id>.disable-tagging=true"``

Disabling the ``temp`` tag makes it harder to identify abandoned files that are not used by your Dataverse instance (i.e. one cannot search for the temp tag in a delete script). These should still be removed to avoid wasting storage space. To clean up these files and any other leftover files, regardless of whether the temp tag is applied, you can use the [Cleanup Storage of a Dataset](https://guides.dataverse.org/en/5.13/api/native-api.html#cleanup-storage-of-a-dataset) API endpoint.

In case you would like to configure Dataverse to use a custom S3 service instead of Amazon S3 services, please
add the options for the custom URL and region as documented below. Please read above if your desired combination has
been tested already and what other options have been set for a successful integration.
Expand Down Expand Up @@ -825,6 +832,7 @@ List of S3 Storage Options
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server ``false``
=========================================== ================== =================================================================================== =============

.. table::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
import edu.harvard.iq.dataverse.Dataverse;
import edu.harvard.iq.dataverse.DvObject;
import edu.harvard.iq.dataverse.datavariable.DataVariable;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.util.FileUtil;
import opennlp.tools.util.StringUtil;

Expand Down Expand Up @@ -985,7 +986,10 @@ private String generateTemporaryS3UploadUrl(String key, Date expiration) throws
GeneratePresignedUrlRequest generatePresignedUrlRequest =
new GeneratePresignedUrlRequest(bucketName, key).withMethod(HttpMethod.PUT).withExpiration(expiration);
//Require user to add this header to indicate a temporary file
generatePresignedUrlRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
final boolean taggingDisabled = JvmSettings.DISABLE_S3_TAGGING.lookupOptional(Boolean.class, this.driverId).orElse(false);
if (!taggingDisabled) {
generatePresignedUrlRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
}

URL presignedUrl;
try {
Expand Down Expand Up @@ -1034,7 +1038,10 @@ public JsonObjectBuilder generateTemporaryS3UploadUrls(String globalId, String s
} else {
JsonObjectBuilder urls = Json.createObjectBuilder();
InitiateMultipartUploadRequest initiationRequest = new InitiateMultipartUploadRequest(bucketName, key);
initiationRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
final boolean taggingDisabled = JvmSettings.DISABLE_S3_TAGGING.lookupOptional(Boolean.class, this.driverId).orElse(false);
if (!taggingDisabled) {
initiationRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
}
InitiateMultipartUploadResult initiationResponse = s3.initiateMultipartUpload(initiationRequest);
String uploadId = initiationResponse.getUploadId();
for (int i = 1; i <= (fileSize / minPartSize) + (fileSize % minPartSize > 0 ? 1 : 0); i++) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ public enum JvmSettings {
UPLOADS_DIRECTORY(SCOPE_FILES, "uploads"),
DOCROOT_DIRECTORY(SCOPE_FILES, "docroot"),
GUESTBOOK_AT_REQUEST(SCOPE_FILES, "guestbook-at-request"),

//STORAGE DRIVER SETTINGS
SCOPE_DRIVER(SCOPE_FILES),
DISABLE_S3_TAGGING(SCOPE_DRIVER, "disable-tagging"),

// SOLR INDEX SETTINGS
SCOPE_SOLR(PREFIX, "solr"),
Expand Down
70 changes: 37 additions & 33 deletions src/main/webapp/resources/js/fileupload.js
Original file line number Diff line number Diff line change
Expand Up @@ -192,41 +192,45 @@ var fileUpload = class fileUploadClass {
progBar.html('');
progBar.append($('<progress/>').attr('class', 'ui-progressbar ui-widget ui-widget-content ui-corner-all'));
if(this.urls.hasOwnProperty("url")) {
$.ajax({
url: this.urls.url,
headers: { "x-amz-tagging": "dv-state=temp" },
type: 'PUT',
data: this.file,
context:this,
cache: false,
processData: false,
success: function() {
//ToDo - cancelling abandons the file. It is marked as temp so can be cleaned up later, but would be good to remove now (requires either sending a presigned delete URL or adding a callback to delete only a temp file
if(!cancelled) {
this.reportUpload();
}
},
error: function(jqXHR, textStatus, errorThrown) {
console.log('Failure: ' + jqXHR.status);
console.log('Failure: ' + errorThrown);
uploadFailure(jqXHR, thisFile);
},
xhr: function() {
var myXhr = $.ajaxSettings.xhr();
if (myXhr.upload) {
myXhr.upload.addEventListener('progress', function(e) {
if (e.lengthComputable) {
var doublelength = 2 * e.total;
progBar.children('progress').attr({
value: e.loaded,
max: doublelength
});
}
});
const url = this.urls.url;
const request = {
url: url,
type: 'PUT',
data: this.file,
context:this,
cache: false,
processData: false,
success: function() {
//ToDo - cancelling abandons the file. It is marked as temp so can be cleaned up later, but would be good to remove now (requires either sending a presigned delete URL or adding a callback to delete only a temp file
if(!cancelled) {
this.reportUpload();
}
},
error: function(jqXHR, textStatus, errorThrown) {
console.log('Failure: ' + jqXHR.status);
console.log('Failure: ' + errorThrown);
uploadFailure(jqXHR, thisFile);
},
xhr: function() {
var myXhr = $.ajaxSettings.xhr();
if (myXhr.upload) {
myXhr.upload.addEventListener('progress', function(e) {
if (e.lengthComputable) {
var doublelength = 2 * e.total;
progBar.children('progress').attr({
value: e.loaded,
max: doublelength
});
}
});
}
return myXhr;
}
return myXhr;
};
if (url.includes("x-amz-tagging")) {
request.headers = { "x-amz-tagging": "dv-state=temp" };
}
});
$.ajax(request);
} else {
var loaded=[];
this.etags=[];
Expand Down