Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #19619 #19620

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Fixes #19619 #19620

wants to merge 2 commits into from

Conversation

trina242
Copy link
Contributor

@trina242 trina242 commented Jan 31, 2025

Describe your changes:

Fixes #19619

s3fs.S3FileSystem has been replaced with pyarrow.fs.S3FileSystem in ParquetDataFrameReader, as:

  • it allows role_arn as argument, which is not supported by s3fs (hence fixes bug - assumeRoleArn was not passed to the filesystem, so it couldn't access the files),
  • it's more compatible with ParquetDataFrameReader, which is using pyarrow package anyway,
  • s3fs is an unnecessary addition to project dependencies.

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>

Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Member

@ulixius9 ulixius9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trina242 thanks for the contribution 🙇

just one question...

client_kwargs["secret_key"] = self.config_source.securityConfig.awsSecretAccessKey.get_secret_value()
s3_fs = S3FileSystem(**client_kwargs)

bucket_uri = f"{bucket_name}/{key}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this uri not require a s3:// prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually it throws this error when you prefix it with s3://

Copy link
Contributor

github-actions bot commented Feb 3, 2025

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 ingestion fails for parquet files when assumeRoleArn is used
2 participants