Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reopening files closed by cache eviction #8

Open
estebanag opened this issue Sep 13, 2019 · 4 comments
Open

Add support for reopening files closed by cache eviction #8

estebanag opened this issue Sep 13, 2019 · 4 comments

Comments

@estebanag
Copy link

estebanag commented Sep 13, 2019

Hi @Exteris,

Thanks for the work on this project. I've forked your repo (see here) and I'm in the process of experimenting with a way to automatically reopen htpy.Files (closed by cache eviction) when a h5py.Dataset is requested. Also note that I've added a lock to synchronize access to the cache, as it may cause problems when using dask. The current code is just a quick test (that works) but would you be interested in having this functionality in a PR? Any ideas or preferences are welcome.

@mrocklin, any feedback from you is also more than welcome!

@DaanVanVugt
Copy link
Owner

Hello @estebanag,

Yes, definitely. The lock is a good addition too, indeed things could go wrong with many threads.

Regarding the automatic re-opening I'm not sure if the h5py.DataSet stays valid after a file has been closed and re-opened. I think the safe way is to always create a new h5py.File, which may then be taken from the cache if already open.

You could of course wrap the DataSet to do this in the background.

@estebanag
Copy link
Author

Great. Regarding recreating the h5py.File and wrapping the Dataset, that's exactly the way I got it to work in the current version (you can see it here).

@avh
Copy link

avh commented Jun 24, 2021

There is a bug on line 149 of init.py.
The value stored in the cache is not self.hsh but self.
The comparison will always fail and therefore a closed file is never evicted from the cache.
Could just do "del cache[self.hsh]" instead?

@avh
Copy link

avh commented Jun 24, 2021

Actually, closing a file that is shared is a bad idea. Maybe this bug was a good thing. Perhaps the cache should use weak references and only close the file when the last reference is removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants