Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclassed ValueError with BlockSizeError in http.py #646

Merged
merged 3 commits into from
Jun 3, 2021

Conversation

cisaacstern
Copy link
Contributor

The most minimal implementation I could find for the item discussed in pangeo-forge/pangeo-forge-recipes#142 (comment)

This code, adapted from pangeo_forge_recipes/storage.py, reproduces the ValueError in question:

Minimal reproducible example
from contextlib import contextmanager
from typing import Any, Iterator

import fsspec

OpenFileType = Any

@contextmanager
def _fsspec_safe_open(fname: str, **kwargs) -> Iterator[OpenFileType]:
    # workaround for inconsistent behavior of fsspec.open
    # https://github.com/intake/filesystem_spec/issues/579
    with fsspec.open(fname, **kwargs) as fp:
        with fp as fp2:
            yield fp2

base = 'https://podaac-opendap.jpl.nasa.gov/opendap/allData/'
fname = base + 'smap/L3/JPL/V5.0/8day_running/2015/120/SMAP_L3_SSS_20150504_8DAYS_V5.0.nc'

# open_kwargs = {'block_size': 0}

input_opener = _fsspec_safe_open(fname, mode="rb") #, **open_kwargs)

BLOCK_SIZE=10_000_000

with input_opener as source:
    data = source.read(BLOCK_SIZE)

With fsspec=2021.5.0 installed, here's the Traceback:

Traceback: before
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-5247671e2992> in <module>
     24 
     25 with input_opener as source:
---> 26     data = source.read(BLOCK_SIZE)

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/fsspec/implementations/http.py in read(self, length)
    482         else:
    483             length = min(self.size - self.loc, length)
--> 484         return super().read(length)
    485 
    486     async def async_fetch_all(self):

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/fsspec/spec.py in read(self, length)
   1447             # don't even bother calling fetch
   1448             return b""
-> 1449         out = self.cache._fetch(self.loc, self.loc + length)
   1450         self.loc += len(out)
   1451         return out

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/fsspec/caching.py in _fetch(self, start, end)
    374         ):
    375             # First read, or extending both before and after
--> 376             self.cache = self.fetcher(start, bend)
    377             self.start = start
    378         elif start < self.start:

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/fsspec/asyn.py in wrapper(*args, **kwargs)
     70     def wrapper(*args, **kwargs):
     71         self = obj or args[0]
---> 72         return sync(self.loop, func, *args, **kwargs)
     73 
     74     return wrapper

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/fsspec/asyn.py in sync(loop, func, timeout, *args, **kwargs)
     51     event.wait(timeout)
     52     if isinstance(result[0], BaseException):
---> 53         raise result[0]
     54     return result[0]
     55 

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/fsspec/asyn.py in _runner(event, coro, result, timeout)
     18         coro = asyncio.wait_for(coro, timeout=timeout)
     19     try:
---> 20         result[0] = await coro
     21     except Exception as ex:
     22         result[0] = ex

~/.pyenv/versions/anaconda3-2019.10/envs/pangeo-forge3.8/lib/python3.8/site-packages/fsspec/implementations/http.py in async_fetch_range(self, start, end)
    544                         cl += len(chunk)
    545                         if cl > end - start:
--> 546                             raise ValueError(
    547                                 "Got more bytes so far (>%i) than requested (%i)"
    548                                 % (cl, end - start)

ValueError: Got more bytes so far (>15260565) than requested (15242880)

And with the current PR branch installed:

Traceback: after
---------------------------------------------------------------------------
BlockSizeError                            Traceback (most recent call last)
<ipython-input-1-5247671e2992> in <module>
     24 
     25 with input_opener as source:
---> 26     data = source.read(BLOCK_SIZE)

~/Dropbox/pangeo/filesystem_spec/fsspec/implementations/http.py in read(self, length)
    489         else:
    490             length = min(self.size - self.loc, length)
--> 491         return super().read(length)
    492 
    493     async def async_fetch_all(self):

~/Dropbox/pangeo/filesystem_spec/fsspec/spec.py in read(self, length)
   1447             # don't even bother calling fetch
   1448             return b""
-> 1449         out = self.cache._fetch(self.loc, self.loc + length)
   1450         self.loc += len(out)
   1451         return out

~/Dropbox/pangeo/filesystem_spec/fsspec/caching.py in _fetch(self, start, end)
    374         ):
    375             # First read, or extending both before and after
--> 376             self.cache = self.fetcher(start, bend)
    377             self.start = start
    378         elif start < self.start:

~/Dropbox/pangeo/filesystem_spec/fsspec/asyn.py in wrapper(*args, **kwargs)
     79     def wrapper(*args, **kwargs):
     80         self = obj or args[0]
---> 81         return sync(self.loop, func, *args, **kwargs)
     82 
     83     return wrapper

~/Dropbox/pangeo/filesystem_spec/fsspec/asyn.py in sync(loop, func, timeout, *args, **kwargs)
     60         raise FSTimeoutError
     61     if isinstance(result[0], BaseException):
---> 62         raise result[0]
     63     return result[0]
     64 

~/Dropbox/pangeo/filesystem_spec/fsspec/asyn.py in _runner(event, coro, result, timeout)
     20         coro = asyncio.wait_for(coro, timeout=timeout)
     21     try:
---> 22         result[0] = await coro
     23     except Exception as ex:
     24         result[0] = ex

~/Dropbox/pangeo/filesystem_spec/fsspec/implementations/http.py in async_fetch_range(self, start, end)
    552                         cl += len(chunk)
    553                         if cl > end - start:
--> 554                             raise BlockSizeError(
    555                                 "Got more bytes so far (>%i) than requested (%i)"
    556                                 % (cl, end - start)

BlockSizeError: Got more bytes so far (>15252381) than requested (15242880)

@cisaacstern
Copy link
Contributor Author

@martindurant, I note that a number of checks failed.

From reviewing the logs, my impression (though I may be mistaken) is that these errors are not specific to this PR, but wanted to check to see if there's anything I can do to help the process along?

@martindurant
Copy link
Member

The failures are due to a release of libarchive - please merge from master

@cisaacstern
Copy link
Contributor Author

👍 should be good to go

@martindurant
Copy link
Member

Will merge on green

@martindurant
Copy link
Member

(fails lint by black)

@cisaacstern
Copy link
Contributor Author

(fails lint by black)

pushed a new commit which should resolve this

@martindurant martindurant merged commit 3911896 into fsspec:master Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants