Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyfive doesn't play nicely with s3fs #60

Open
bnlawrence opened this issue Mar 7, 2024 · 2 comments
Open

pyfive doesn't play nicely with s3fs #60

bnlawrence opened this issue Mar 7, 2024 · 2 comments

Comments

@bnlawrence
Copy link

This is probably a problem with s3fs, but for now it's worth noting here. In practice, we cannot read contigous data from a file opened on an s3fs. Here's some toy code to show the problem:

import s3fs
import pyfive as p5
from pathlib import Path
import numpy as np

S3_URL = "redacted"
S3_BUCKET = 'redacted'

def local(fname,v):

    mypath = Path(__file__).parent
    f = p5.File(mypath/fname)
    d = f[v]
    dd = d[:]
    return dd

def s3(fname, v):
        
    fs = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': S3_URL})
    uri = S3_BUCKET + '/' + fname

    with fs.open(uri,'rb') as s3file2:
        f2 = p5.File(s3file2)
        d = f2[v]
        dd = d[:]
        return dd

if __name__=="__main__":
    fn, v = 'common_cl_a.nc', 'cl'
    d1 = local(fn,v)
    d2 = s3(fn,v)
    np.testing.assert_array_equal(d1,d2)

which gives:

Traceback (most recent call last):
  File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 32, in <module>
    d2 = s3(fn,v)
  File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 25, in s3
    dd = d[:]
  File "/Users/bnl28/Repositories/pyfive/pyfive/high_level.py", line 279, in __getitem__
    data = self._dataobjects.get_data(args)
  File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 630, in get_data
    return self._get_contiguous_data(self.property_offset)[args]
  File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 671, in _get_contiguous_data
    return np.memmap(self.fh, dtype=self.dtype, mode='c',
  File "/Users/bnl28/mambaforge/envs/fesom/lib/python3.10/site-packages/numpy/core/memmap.py", line 267, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
io.UnsupportedOperation: fileno

This appears to arise from a failure of the object returned by s3fs.F3FileSystem.open to fully respect the requirement to be "file like" - but to be fair, they don't claim feature completeness.

@bnlawrence
Copy link
Author

(To be fair, that error message is from my branch with the chunking read pull request, here's the same thing on master:

Traceback (most recent call last):
  File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 32, in <module>
    d2 = s3(fn,v)
  File "/Users/bnl28/Repositories/pyfive/bnl/mytest_s3_etc.py", line 25, in s3
    dd = d[:]
  File "/Users/bnl28/Repositories/pyfive/pyfive/high_level.py", line 279, in __getitem__
    data = self._dataobjects.get_data()[args]
  File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 420, in get_data
    return self._get_contiguous_data(property_offset)
  File "/Users/bnl28/Repositories/pyfive/pyfive/dataobjects.py", line 454, in _get_contiguous_data
    return np.memmap(self.fh, dtype=self.dtype, mode='c',
  File "/Users/bnl28/mambaforge/envs/fesom/lib/python3.10/site-packages/numpy/core/memmap.py", line 267, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
io.UnsupportedOperation: fileno

@bnlawrence
Copy link
Author

bnlawrence commented Mar 8, 2024

I have a solution in a branch on my fork, I'll not generate a pull request here til we've worked through the issue6 pull request, as it builds on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant