Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18073. Upgrades Head, Copy, Put & Get operations. #5

Closed
wants to merge 3 commits into from

Conversation

ahmarsuhail
Copy link
Owner

@ahmarsuhail ahmarsuhail commented Oct 6, 2022

Description of PR

This PR updates client config, and the list, put, copy, head and get object operations.

When reviewing, it's probably easier to separate by commits.

Still TODO

There are some issues that I'm still looking into:

  • Transfer Manager Issues:

    • Something goes wrong when you add in the Transfer Manager dependency. In AWSClienConfig, org.apache.http.client.utils.URIBuilder; is no longer available. And get a class file for org.apache.commons.logging.Log not found in S3ADelegationTokens.
    • When I try to use progress listener with the TM, I get a log4j error.
    • The TM does not return copyObjectResult which has the version ID, which causes an issue in the ChangeTracker.
    • Copy with SSE-C does not work with the TM.

    Have raised these with the AWS SDK team.

  • Previously a call to getObjectMetadata for a base path, ie with an empty key would return some metadata. (bucket region, content type). headObject() fails without a key. I'm not sure what to do here, will comment on the code too.

How was this patch tested?

Tested in eu-west-1 by running mvn -Dparallel-tests -DtestsThreadCount=16 clean verify

The following tests are failing:

Test Failing Reason
ITestS3AAWSCredentialsProvider.testBadCredentialsConstructor Fails because SDKV2 throws SdkException, will work once errors are handled and translated properly.
ITestAuditAccessChecks.testDirAccessDenied Fails because list requests are not wired up to auditor yet
ITestAuditAccessChecks.testFileAccessDenied Fails because list requests are not wired up to auditor yet
ITestAuditAccessChecks.testMissingPathAccessFNFE Fails because list requests are not wired up to auditor yet
ITestAuditManager.testRequestHandlerBinding Fails because list requests are not wired up to auditor yet
ITestAWSStatisticCollection.testCommonCrawlStatistics Fails as not tied up to auditor yet, so does not update stats when getObjectMetadata is called
ITestAWSStatisticCollection.testLandsatStatistics Fails as not tied up to auditor yet, so does not update stats when getObjectMetadata is called
ITestMarkerTool.testRunWrongBucket Throws NoSuchBucketException, which is not yet translated
ITestS3AConfiguration.testAutomaticProxyPortSelection Fails because SDKV2 throws SdkException, will work once errors are handled and translated properly.
ITestS3AConfiguration.testProxyConnection Fails because SDKV2 throws SdkException, will work once errors are handled and translated properly.
ITestS3AConfiguration.shouldBeAbleToSwitchOnS3PathStyleAccessViaConfigProperty Fails due to errors not being translated It throws Software.S3Exception: The bucket you are attempting to access must be addressed using the specified endpoint. will be fixed during error translation
ITestS3AStorageClass.testCreateAndCopyObjectWithStorageClassGlacier Copy works ok, but fails because it expects AccessDeniedException when it tries to get the object from glaciers, and errors are not translated yet. Will be updated during error translation work.
ITestS3ATemporaryCredentials.testInvalidSTSBinding, testSTS Fails because it throws a 400, which is translated to AWSBadRequestException. Will be fixed during error translation.
ITestSessionDelegationInFileystem.testDelegatedFileSystem, ITestXAttrCost.testXAttrRoot Fails because tries to do headObject() with empty key which now fails, need to look at how to fix. Note: ITestSessionDelegationInFileystem.testDelegatedFileSystem does not fail as i've temporarily removed the headobject call from it
ITestS3AEncryptionSSEC Fails due to SSE-C not working with the TM
TestStreamChangeTracker Fails due having to comment out some code as TM response does not yet have CopyObjectResult

@@ -173,6 +192,8 @@ protected String getBucket() {
* if the encryption secrets contain the information/settings for this.
* @return an optional set of KMS Key settings
*/
// TODO: This method can be removed during getObject work, as the key now comes directly from
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and generateSEECustomerKey() will be removed as part of MPU work.

maybeSetHeader(headers, XA_STORAGE_CLASS,
md.getReplicationStatus());
md.storageClassAsString());
// TODO: check this, looks wrong.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

}
// TODO: Commenting out temporarily, due to the TM not returning copyObjectResult
// in the response.
// String newRevisionId = policy.getRevisionId(copyObjectResponse);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes TestStreamChangeTracker to fail.

+ keyToQualifiedPath(key))
.initCause(e);
}
// } catch (InterruptedException e) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not sure what to do here and for copy, any help would be appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there will be some kind of CompletetionException thrown with cause of Interrupted or something? is this the right direction?

@@ -64,6 +64,9 @@ public ITestXAttrCost() {
@Test
public void testXAttrRoot() throws Throwable {
describe("Test xattr on root");
// TODO: Previously a call to getObjectMetadata for a base path, ie with an empty key would
// return some metadata. (bucket region, content type). headObject() fails without a key, check
// how this can be fixed.
Path root = new Path("/");
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that with the V1 Client, when you did s3V1.getObjectMetadata("my-bucket", "") , so the key was empty as you were probing the root path, this worked ok and returned the bucket region and content-type. With V2, when you do s3v2.headObject(HeadObjectRequest.builder().bucket("my-bucket").build()), you get an error as the key must be specified. Instead we’ll need to call headBucket() here, but this has other implications. We will need some special code to handle this. Do we need to support XAttr operations on root dirs?

The other case is in testDelegatedFileSystem, where getObjectMetadata is used as a probe to check endpoint correctness. Can this simply be replaced by headBucket()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like we'll have to add checks for if FS root and switch between headObject and headBucket. Appears painful at first but maybe it's better to be more explicit about the operations we're making rather than rely on this behaviour of SDK V1.

Key change: `getObject` now returns a `ResponseInputStream<GetObjectResponse>` rather than a `S3Object`. This makes it simpler to handle the input stream lifetime in various classes such as `S3AInputStream`, `S3ARemoteObject`, or `SDKStreamDrainer`.
@ahmarsuhail ahmarsuhail force-pushed the HADOOP-18073-sdk-upgrade-head-get branch from 570c5a7 to 6a03f91 Compare October 7, 2022 20:05
@passaro
Copy link
Collaborator

passaro commented Oct 10, 2022

More test failures:

  • ITestS3AFileContextStatistics.testStatistics - Mismatch in stats, likely same cause as ITestAuditAccessChecks.

  • ITestS3AOpenCost.testOpenFileLongerLength - Introduced with the getObject change, but just another issue with exception translation, to be fixed later.

Copy link
Collaborator

@dannycjones dannycjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a very quick review of the code, skimming over most of it. I wanted to provide early, general feedback rather than wait ages for in-depth feedback.

There's a few comments on the code itself, but core feedback:

  • Whenever we are adding methods like "doSomething" (V1) and "doSomethingV2", can we instead use overloading? With this, I hope to reduce the diff for reviewers.
  • General thought - if we are changing public methods in classes like S3AFileSystem, can we think about limiting the visibility/scoping? Maybe not public/private but at least the visibility annotations used elsewhere. We are making a breaking change to the interface, so I would argue we are within our right to break it more by moving it out of S3AFileSystem. S3AFileSystem is way too big and this is a chance to clean up the interface a bit. (Something to discuss with community rather than address here perhaps)
  • This PR is huge - it's difficult to review it. How can we make these smaller? Can we try and get some milestones which let us get working code into the feature branch, and review operations one at a time? Maybe we do some setup PRs too like adding error handling or auditor methods for both V1 and V2 and then we just focus on operations in each PR.

Let's discuss - I think there's great work in here, it's just difficult to review it in its current size.

throw (SdkBaseException) caught;
} else {
throw (AwsServiceException) caught;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we cast this in the first place? Can we not just do throw caught;? (not sure)

if (progress != null) {
progress.progress();
}
public void transferInitiated(TransferListener.Context.TransferInitiated context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: too many spaces before method name


@Override
public void bytesTransferred(TransferListener.Context.BytesTransferred context) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop empty line

Comment on lines +66 to 68
if(progress != null) {
progress.progress();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing space on if statement

long delta = upload.getProgress().getBytesTransferred() -
lastBytesTransferred;
public long uploadCompleted(ObjectTransfer upload) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop empty line

Comment on lines +58 to +68
// TODO: This method will be replace isUnkownBucket() during error translation work.
/**
* Does this exception indicate that the AWS Bucket was unknown.
* @param e exception.
* @return true if the status code and error code mean that the
* remote bucket is unknown.
*/
public static boolean isUnknownBucketV2(AwsServiceException e) {
return e.statusCode() == SC_404
&& AwsErrorCodes.E_NO_SUCH_BUCKET.equals(e.awsErrorDetails().errorCode());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of methods like these where we have the original and then v2 version.

Can we actually use overloading here instead? i.e. this method would be boolean isUnknownBucket(AwsServiceException e).

If we can do that, we avoid a lot of "if instance of this, cast and use v1. else cast and use v2.". We'd just let the methods be polymorphic and reduce the size of the diff.

We can add "todo" to the method above to say "we will remove after". I thought about @deprecated but I imagine that'll just make Yetus really unhappy.

@@ -159,7 +168,7 @@ private <T extends AmazonWebServiceRequest> T prepareRequest(T t) {
*/
// TODO: Currently this is a NOOP, will be completed separately as part of auditor work.
@Retries.OnceRaw
private <T extends AwsRequest> T prepareV2Request(T t) {
private <T extends AwsRequest.Builder> T prepareV2Request(T t) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use overloading here? (i genuinely have no idea if we can do it with generics. maybe?)

i.e. <T extends AwsRequest.Builder> T prepareRequest(T t)

again, ideally to avoid needing switches and renames in the code base.

Comment on lines +484 to +485
HeadObjectRequest.Builder headObjectRequestBuilder =
HeadObjectRequest.builder().bucket(getBucket()).key(key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer like

HeadObjectRequest.Builder headObjectRequestBuilder = HeadObjectRequest.builder()
    .bucket(getBucket())
    .key(key);

Comment on lines +136 to +140
// TODO: Temporary change as auditor still expects V1 request, will be updated during auditor work.
protected GetObjectMetadataRequest head() {
return manager.beforeExecution(
requestFactory.newGetObjectMetadataRequest("/"));
// return manager.beforeExecution(
// requestFactory.newGetObjectMetadataRequest("/"));
return manager.beforeExecution(new GetObjectMetadataRequest("test", "/"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add no-op implementations to auditor with todo and accept failing test? and so we can avoid recreating v1 requests here.

@@ -64,6 +64,9 @@ public ITestXAttrCost() {
@Test
public void testXAttrRoot() throws Throwable {
describe("Test xattr on root");
// TODO: Previously a call to getObjectMetadata for a base path, ie with an empty key would
// return some metadata. (bucket region, content type). headObject() fails without a key, check
// how this can be fixed.
Path root = new Path("/");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like we'll have to add checks for if FS root and switch between headObject and headBucket. Appears painful at first but maybe it's better to be more explicit about the operations we're making rather than rely on this behaviour of SDK V1.

*/
private static URI buildURI(String host, int port) {
try {
return new URIBuilder().setHost(host).setPort(port).build();
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be updated to set scheme!

@ahmarsuhail ahmarsuhail closed this Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants