Skip to content

Commit 4678330

Browse files
authored
Merge pull request #10029 from ErykKul/10022_upload_redirect_without_tagging
disable s3 tagging JVM option
2 parents d9a7922 + e473d53 commit 4678330

File tree

9 files changed

+103
-37
lines changed

9 files changed

+103
-37
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
If your S3 store does not support tagging and gives an error if you configure direct uploads, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For more details see https://dataverse-guide--10029.org.readthedocs.build/en/10029/developers/big-data-support.html#s3-tags #10022 and #10029.
2+
3+
## New config options
4+
5+
- dataverse.files.<id>.disable-tagging

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2013,7 +2013,7 @@ The fully expanded example above (without environment variables) looks like this
20132013
20142014
.. _cleanup-storage-api:
20152015

2016-
Cleanup storage of a Dataset
2016+
Cleanup Storage of a Dataset
20172017
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20182018

20192019
This is an experimental feature and should be tested on your system before using it in production.

doc/sphinx-guides/source/developers/big-data-support.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,12 @@ with the contents of the file cors.json as follows:
8181
8282
Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above.
8383

84-
Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 Tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and the new file(s) are added in the Dataverse installation. Note that not all S3 implementations support Tags: Minio does not. WIth such stores, direct upload works, but Tags are not used.
84+
.. _s3-tags-and-direct-upload:
85+
86+
S3 Tags and Direct Upload
87+
~~~~~~~~~~~~~~~~~~~~~~~~~
88+
89+
Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and new files are added in the Dataverse installation. Note that not all S3 implementations support tags. Minio, for example, does not. With such stores, direct upload may not work and you might need to disable tagging. For details, see :ref:`s3-tagging` in the Installation Guide.
8590

8691
Trusted Remote Storage with the ``remote`` Store Type
8792
-----------------------------------------------------

doc/sphinx-guides/source/developers/s3-direct-upload-api.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,12 @@ In the single part case, only one call to the supplied URL is required:
7979
8080
curl -i -H 'x-amz-tagging:dv-state=temp' -X PUT -T <filename> "<supplied url>"
8181
82+
Or, if you have disabled S3 tagging (see :ref:`s3-tagging`), you should omit the header like this:
83+
84+
.. code-block:: bash
85+
86+
curl -i -X PUT -T <filename> "<supplied url>"
87+
8288
Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response.
8389

8490
In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a <partSize> slice of the total file, with the last part containing the remaining bytes.

doc/sphinx-guides/source/installation/config.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1189,12 +1189,31 @@ Larger installations may want to increase the number of open S3 connections allo
11891189

11901190
``./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"``
11911191

1192+
.. _s3-tagging:
1193+
1194+
S3 Tagging
1195+
##########
1196+
1197+
By default, when direct upload to an S3 store is configured, Dataverse will place a ``temp`` tag on the file being uploaded for an easier cleanup in case the file is not added to the dataset after upload (e.g., if the user cancels the operation). (See :ref:`s3-tags-and-direct-upload`.)
1198+
If your S3 store does not support tagging and gives an error when direct upload is configured, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For example:
1199+
1200+
``./asadmin create-jvm-options "-Ddataverse.files.<id>.disable-tagging=true"``
1201+
1202+
Disabling the ``temp`` tag makes it harder to identify abandoned files that are not used by your Dataverse instance (i.e. one cannot search for the ``temp`` tag in a delete script). These should still be removed to avoid wasting storage space. To clean up these files and any other leftover files, regardless of whether the ``temp`` tag is applied, you can use the :ref:`cleanup-storage-api` API endpoint.
1203+
1204+
Note that if you disable tagging, you should should omit the ``x-amz-tagging:dv-state=temp`` header when using the :doc:`/developers/s3-direct-upload-api`, as noted in that section.
1205+
1206+
Finalizing S3 Configuration
1207+
###########################
1208+
11921209
In case you would like to configure Dataverse to use a custom S3 service instead of Amazon S3 services, please
11931210
add the options for the custom URL and region as documented below. Please read above if your desired combination has
11941211
been tested already and what other options have been set for a successful integration.
11951212

11961213
Lastly, go ahead and restart your Payara server. With Dataverse deployed and the site online, you should be able to upload datasets and data files and see the corresponding files in your S3 bucket. Within a bucket, the folder structure emulates that found in local file storage.
11971214

1215+
.. _list-of-s3-storage-options:
1216+
11981217
List of S3 Storage Options
11991218
##########################
12001219

@@ -1222,6 +1241,7 @@ List of S3 Storage Options
12221241
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
12231242
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
12241243
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
1244+
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server. ``false``
12251245
=========================================== ================== =================================================================================== =============
12261246

12271247
.. table::

src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
import edu.harvard.iq.dataverse.Dataverse;
4141
import edu.harvard.iq.dataverse.DvObject;
4242
import edu.harvard.iq.dataverse.datavariable.DataVariable;
43+
import edu.harvard.iq.dataverse.settings.JvmSettings;
4344
import edu.harvard.iq.dataverse.util.FileUtil;
4445
import opennlp.tools.util.StringUtil;
4546

@@ -991,7 +992,10 @@ private String generateTemporaryS3UploadUrl(String key, Date expiration) throws
991992
GeneratePresignedUrlRequest generatePresignedUrlRequest =
992993
new GeneratePresignedUrlRequest(bucketName, key).withMethod(HttpMethod.PUT).withExpiration(expiration);
993994
//Require user to add this header to indicate a temporary file
994-
generatePresignedUrlRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
995+
final boolean taggingDisabled = JvmSettings.DISABLE_S3_TAGGING.lookupOptional(Boolean.class, this.driverId).orElse(false);
996+
if (!taggingDisabled) {
997+
generatePresignedUrlRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
998+
}
995999

9961000
URL presignedUrl;
9971001
try {
@@ -1040,7 +1044,10 @@ public JsonObjectBuilder generateTemporaryS3UploadUrls(String globalId, String s
10401044
} else {
10411045
JsonObjectBuilder urls = Json.createObjectBuilder();
10421046
InitiateMultipartUploadRequest initiationRequest = new InitiateMultipartUploadRequest(bucketName, key);
1043-
initiationRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
1047+
final boolean taggingDisabled = JvmSettings.DISABLE_S3_TAGGING.lookupOptional(Boolean.class, this.driverId).orElse(false);
1048+
if (!taggingDisabled) {
1049+
initiationRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp");
1050+
}
10441051
InitiateMultipartUploadResult initiationResponse = s3.initiateMultipartUpload(initiationRequest);
10451052
String uploadId = initiationResponse.getUploadId();
10461053
for (int i = 1; i <= (fileSize / minPartSize) + (fileSize % minPartSize > 0 ? 1 : 0); i++) {

src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,10 @@ public enum JvmSettings {
5151
DOCROOT_DIRECTORY(SCOPE_FILES, "docroot"),
5252
GUESTBOOK_AT_REQUEST(SCOPE_FILES, "guestbook-at-request"),
5353
GLOBUS_CACHE_MAXAGE(SCOPE_FILES, "globus-cache-maxage"),
54+
55+
//STORAGE DRIVER SETTINGS
56+
SCOPE_DRIVER(SCOPE_FILES),
57+
DISABLE_S3_TAGGING(SCOPE_DRIVER, "disable-tagging"),
5458

5559
// SOLR INDEX SETTINGS
5660
SCOPE_SOLR(PREFIX, "solr"),

src/main/webapp/resources/js/fileupload.js

Lines changed: 37 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -192,41 +192,45 @@ var fileUpload = class fileUploadClass {
192192
progBar.html('');
193193
progBar.append($('<progress/>').attr('class', 'ui-progressbar ui-widget ui-widget-content ui-corner-all'));
194194
if(this.urls.hasOwnProperty("url")) {
195-
$.ajax({
196-
url: this.urls.url,
197-
headers: { "x-amz-tagging": "dv-state=temp" },
198-
type: 'PUT',
199-
data: this.file,
200-
context:this,
201-
cache: false,
202-
processData: false,
203-
success: function() {
204-
//ToDo - cancelling abandons the file. It is marked as temp so can be cleaned up later, but would be good to remove now (requires either sending a presigned delete URL or adding a callback to delete only a temp file
205-
if(!cancelled) {
206-
this.reportUpload();
207-
}
208-
},
209-
error: function(jqXHR, textStatus, errorThrown) {
210-
console.log('Failure: ' + jqXHR.status);
211-
console.log('Failure: ' + errorThrown);
212-
uploadFailure(jqXHR, thisFile);
213-
},
214-
xhr: function() {
215-
var myXhr = $.ajaxSettings.xhr();
216-
if (myXhr.upload) {
217-
myXhr.upload.addEventListener('progress', function(e) {
218-
if (e.lengthComputable) {
219-
var doublelength = 2 * e.total;
220-
progBar.children('progress').attr({
221-
value: e.loaded,
222-
max: doublelength
223-
});
224-
}
225-
});
195+
const url = this.urls.url;
196+
const request = {
197+
url: url,
198+
type: 'PUT',
199+
data: this.file,
200+
context:this,
201+
cache: false,
202+
processData: false,
203+
success: function() {
204+
//ToDo - cancelling abandons the file. It is marked as temp so can be cleaned up later, but would be good to remove now (requires either sending a presigned delete URL or adding a callback to delete only a temp file
205+
if(!cancelled) {
206+
this.reportUpload();
207+
}
208+
},
209+
error: function(jqXHR, textStatus, errorThrown) {
210+
console.log('Failure: ' + jqXHR.status);
211+
console.log('Failure: ' + errorThrown);
212+
uploadFailure(jqXHR, thisFile);
213+
},
214+
xhr: function() {
215+
var myXhr = $.ajaxSettings.xhr();
216+
if (myXhr.upload) {
217+
myXhr.upload.addEventListener('progress', function(e) {
218+
if (e.lengthComputable) {
219+
var doublelength = 2 * e.total;
220+
progBar.children('progress').attr({
221+
value: e.loaded,
222+
max: doublelength
223+
});
224+
}
225+
});
226+
}
227+
return myXhr;
226228
}
227-
return myXhr;
229+
};
230+
if (url.includes("x-amz-tagging")) {
231+
request.headers = { "x-amz-tagging": "dv-state=temp" };
228232
}
229-
});
233+
$.ajax(request);
230234
} else {
231235
var loaded=[];
232236
this.etags=[];

src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2512,6 +2512,21 @@ static Response getUploadUrls(String idOrPersistentIdOfDataset, long sizeInBytes
25122512
return requestSpecification.get("/api/datasets/" + idInPath + "/uploadurls?size=" + sizeInBytes + optionalQueryParam);
25132513
}
25142514

2515+
/**
2516+
* If you set dataverse.files.localstack1.disable-tagging=true you will see
2517+
* an error like below.
2518+
*
2519+
* To avoid it, don't send the x-amz-tagging header.
2520+
*/
2521+
/*
2522+
<Error>
2523+
<Code>AccessDenied</Code>
2524+
<Message>There were headers present in the request which were not signed</Message>
2525+
<RequestId>25ff2bb0-13c7-420e-8ae6-3d92677e4bd9</RequestId>
2526+
<HostId>9Gjjt1m+cjU4OPvX9O9/8RuvnG41MRb/18Oux2o5H5MY7ISNTlXN+Dz9IG62/ILVxhAGI0qyPfg=</HostId>
2527+
<HeadersNotSigned>x-amz-tagging</HeadersNotSigned>
2528+
</Error>
2529+
*/
25152530
static Response uploadFileDirect(String url, InputStream inputStream) {
25162531
return given()
25172532
.header("x-amz-tagging", "dv-state=temp")

0 commit comments

Comments
 (0)