Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get delta table files #3595

Open
keenanwells-tatari opened this issue Aug 22, 2024 · 2 comments
Open

Get delta table files #3595

keenanwells-tatari opened this issue Aug 22, 2024 · 2 comments

Comments

@keenanwells-tatari
Copy link

I checked the answer here: #623

but wondering whether the of inputFiles is a reliable enough method of getting all the files? https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.inputFiles.html

Docs are a bit vague on how reliable this is, we hosting our tables on S3

@keenanwells-tatari
Copy link
Author

Came across this answer in regards to reliability with delta: https://stackoverflow.com/a/77107953

@keenanwells-tatari
Copy link
Author

I think this is much more useful for what I need: https://docs.databricks.com/en/ingestion/file-metadata-column.html#metadata-examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant