-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18073. Upgrades Head, Copy, Put & Get operations. #5
Conversation
@@ -173,6 +192,8 @@ protected String getBucket() { | |||
* if the encryption secrets contain the information/settings for this. | |||
* @return an optional set of KMS Key settings | |||
*/ | |||
// TODO: This method can be removed during getObject work, as the key now comes directly from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and generateSEECustomerKey()
will be removed as part of MPU work.
maybeSetHeader(headers, XA_STORAGE_CLASS, | ||
md.getReplicationStatus()); | ||
md.storageClassAsString()); | ||
// TODO: check this, looks wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
} | ||
// TODO: Commenting out temporarily, due to the TM not returning copyObjectResult | ||
// in the response. | ||
// String newRevisionId = policy.getRevisionId(copyObjectResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes TestStreamChangeTracker
to fail.
+ keyToQualifiedPath(key)) | ||
.initCause(e); | ||
} | ||
// } catch (InterruptedException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really not sure what to do here and for copy, any help would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there will be some kind of CompletetionException thrown with cause of Interrupted or something? is this the right direction?
@@ -64,6 +64,9 @@ public ITestXAttrCost() { | |||
@Test | |||
public void testXAttrRoot() throws Throwable { | |||
describe("Test xattr on root"); | |||
// TODO: Previously a call to getObjectMetadata for a base path, ie with an empty key would | |||
// return some metadata. (bucket region, content type). headObject() fails without a key, check | |||
// how this can be fixed. | |||
Path root = new Path("/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that with the V1 Client, when you did s3V1.getObjectMetadata("my-bucket", "")
, so the key was empty as you were probing the root path, this worked ok and returned the bucket region and content-type. With V2, when you do s3v2.headObject(HeadObjectRequest.builder().bucket("my-bucket").build())
, you get an error as the key must be specified. Instead we’ll need to call headBucket() here, but this has other implications. We will need some special code to handle this. Do we need to support XAttr operations on root dirs?
The other case is in testDelegatedFileSystem, where getObjectMetadata is used as a probe to check endpoint correctness. Can this simply be replaced by headBucket()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like we'll have to add checks for if FS root and switch between headObject and headBucket. Appears painful at first but maybe it's better to be more explicit about the operations we're making rather than rely on this behaviour of SDK V1.
Key change: `getObject` now returns a `ResponseInputStream<GetObjectResponse>` rather than a `S3Object`. This makes it simpler to handle the input stream lifetime in various classes such as `S3AInputStream`, `S3ARemoteObject`, or `SDKStreamDrainer`.
570c5a7
to
6a03f91
Compare
More test failures:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a very quick review of the code, skimming over most of it. I wanted to provide early, general feedback rather than wait ages for in-depth feedback.
There's a few comments on the code itself, but core feedback:
- Whenever we are adding methods like "doSomething" (V1) and "doSomethingV2", can we instead use overloading? With this, I hope to reduce the diff for reviewers.
- General thought - if we are changing public methods in classes like S3AFileSystem, can we think about limiting the visibility/scoping? Maybe not public/private but at least the visibility annotations used elsewhere. We are making a breaking change to the interface, so I would argue we are within our right to break it more by moving it out of S3AFileSystem. S3AFileSystem is way too big and this is a chance to clean up the interface a bit. (Something to discuss with community rather than address here perhaps)
- This PR is huge - it's difficult to review it. How can we make these smaller? Can we try and get some milestones which let us get working code into the feature branch, and review operations one at a time? Maybe we do some setup PRs too like adding error handling or auditor methods for both V1 and V2 and then we just focus on operations in each PR.
Let's discuss - I think there's great work in here, it's just difficult to review it in its current size.
throw (SdkBaseException) caught; | ||
} else { | ||
throw (AwsServiceException) caught; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we cast this in the first place? Can we not just do throw caught;
? (not sure)
if (progress != null) { | ||
progress.progress(); | ||
} | ||
public void transferInitiated(TransferListener.Context.TransferInitiated context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: too many spaces before method name
|
||
@Override | ||
public void bytesTransferred(TransferListener.Context.BytesTransferred context) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop empty line
if(progress != null) { | ||
progress.progress(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing space on if statement
long delta = upload.getProgress().getBytesTransferred() - | ||
lastBytesTransferred; | ||
public long uploadCompleted(ObjectTransfer upload) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop empty line
// TODO: This method will be replace isUnkownBucket() during error translation work. | ||
/** | ||
* Does this exception indicate that the AWS Bucket was unknown. | ||
* @param e exception. | ||
* @return true if the status code and error code mean that the | ||
* remote bucket is unknown. | ||
*/ | ||
public static boolean isUnknownBucketV2(AwsServiceException e) { | ||
return e.statusCode() == SC_404 | ||
&& AwsErrorCodes.E_NO_SUCH_BUCKET.equals(e.awsErrorDetails().errorCode()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of methods like these where we have the original and then v2 version.
Can we actually use overloading here instead? i.e. this method would be boolean isUnknownBucket(AwsServiceException e)
.
If we can do that, we avoid a lot of "if instance of this, cast and use v1. else cast and use v2.". We'd just let the methods be polymorphic and reduce the size of the diff.
We can add "todo" to the method above to say "we will remove after". I thought about @deprecated
but I imagine that'll just make Yetus really unhappy.
@@ -159,7 +168,7 @@ private <T extends AmazonWebServiceRequest> T prepareRequest(T t) { | |||
*/ | |||
// TODO: Currently this is a NOOP, will be completed separately as part of auditor work. | |||
@Retries.OnceRaw | |||
private <T extends AwsRequest> T prepareV2Request(T t) { | |||
private <T extends AwsRequest.Builder> T prepareV2Request(T t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use overloading here? (i genuinely have no idea if we can do it with generics. maybe?)
i.e. <T extends AwsRequest.Builder> T prepareRequest(T t)
again, ideally to avoid needing switches and renames in the code base.
HeadObjectRequest.Builder headObjectRequestBuilder = | ||
HeadObjectRequest.builder().bucket(getBucket()).key(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer like
HeadObjectRequest.Builder headObjectRequestBuilder = HeadObjectRequest.builder()
.bucket(getBucket())
.key(key);
// TODO: Temporary change as auditor still expects V1 request, will be updated during auditor work. | ||
protected GetObjectMetadataRequest head() { | ||
return manager.beforeExecution( | ||
requestFactory.newGetObjectMetadataRequest("/")); | ||
// return manager.beforeExecution( | ||
// requestFactory.newGetObjectMetadataRequest("/")); | ||
return manager.beforeExecution(new GetObjectMetadataRequest("test", "/")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add no-op implementations to auditor with todo and accept failing test? and so we can avoid recreating v1 requests here.
@@ -64,6 +64,9 @@ public ITestXAttrCost() { | |||
@Test | |||
public void testXAttrRoot() throws Throwable { | |||
describe("Test xattr on root"); | |||
// TODO: Previously a call to getObjectMetadata for a base path, ie with an empty key would | |||
// return some metadata. (bucket region, content type). headObject() fails without a key, check | |||
// how this can be fixed. | |||
Path root = new Path("/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like we'll have to add checks for if FS root and switch between headObject and headBucket. Appears painful at first but maybe it's better to be more explicit about the operations we're making rather than rely on this behaviour of SDK V1.
*/ | ||
private static URI buildURI(String host, int port) { | ||
try { | ||
return new URIBuilder().setHost(host).setPort(port).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be updated to set scheme!
Description of PR
This PR updates client config, and the list, put, copy, head and get object operations.
When reviewing, it's probably easier to separate by commits.
Still TODO
There are some issues that I'm still looking into:
Transfer Manager Issues:
AWSClienConfig
,org.apache.http.client.utils.URIBuilder;
is no longer available. And get aclass file for org.apache.commons.logging.Log not found
inS3ADelegationTokens
.copyObjectResult
which has the version ID, which causes an issue in the ChangeTracker.Have raised these with the AWS SDK team.
Previously a call to getObjectMetadata for a base path, ie with an empty key would return some metadata. (bucket region, content type). headObject() fails without a key. I'm not sure what to do here, will comment on the code too.
How was this patch tested?
Tested in eu-west-1 by running
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify
The following tests are failing:
CopyObjectResult