Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't extract dataset archives twice #488

Open
dkrako opened this issue Sep 1, 2023 · 0 comments · May be fixed by #786
Open

Don't extract dataset archives twice #488

dkrako opened this issue Sep 1, 2023 · 0 comments · May be fixed by #786
Labels
enhancement New feature or request

Comments

@dkrako
Copy link
Contributor

dkrako commented Sep 1, 2023

Description of the problem

Currently calling something like this at the top of your notebook

dataset = pm.Dataset('JuDo1000', path='data/JuDo1000')
dataset.download()

will lead to extracting the archives everytime again.

This was implemented this way before as it was the simplest way without checking that all data was correctly extracted.

Of course calling

dataset = pm.Dataset('JuDo1000', path='data/JuDo1000')
dataset.download(extract=False)

would handle this, but we would need to change the notebook for that.

Description of a solution

It would be much nicer to have some kind of detection in place to check if the dataset archives were already extracted.

There we would need to check at least if all files were extracted.
I personally would feel safer if there would also be some checking if the contents were extracted correctly, because it could be that the extraction was aborted during the last file. In such a case we would have no chance in getting a waning without checking the checksums / contents of each file.

@dkrako dkrako added the enhancement New feature or request label Sep 1, 2023
@SiQube SiQube linked a pull request Aug 24, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant