Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with running algorithms through latest close #2024

Closed
ChrisPappalardo opened this issue Nov 26, 2017 · 7 comments
Closed

Issues with running algorithms through latest close #2024

ChrisPappalardo opened this issue Nov 26, 2017 · 7 comments

Comments

@ChrisPappalardo
Copy link
Contributor

Two more issues for you guys:

  • running seemingly any algorithm through the latest available data date (11/24 as of 11/25) results in a KeyError
  • KeyError traceback does not identify where in the algorithm the error occurs

Again, I'm doing all of this inside docker containers which I build and run with:

$ docker build -t quantopian/zipline https://github.com/quantopian/zipline.git
$ docker run --rm -ti quantopian/zipline:latest /bin/bash

You can reproduce the issues as follows:

$ zipline ingest -b quantopian-quandl
$ zipline run -b quantopian-quandl -f test.py --start=2017-1-1 --end=2017-12-31

Where test.py is a do-nothing simple algorithm:

def initialize(context):
    pass

def handle_data(context, data):
    pass

When I run that code inside a zipline container I get:

[2017-11-25 23:38:22.281060] INFO: Loader: Cache at /root/.zipline/data/SPY_benchmark.csv does not have data from 1990-01-02 00:00:00+00:00 to 2017-11-21 00:00:00+00:00.

[2017-11-25 23:38:22.281483] INFO: Loader: Downloading benchmark data for 'SPY' from 1989-12-29 00:00:00+00:00 to 2017-11-21 00:00:00+00:00
[2017-11-25 23:38:22.645542] WARNING: Loader: Still don't have expected benchmark data for 'SPY' from 1989-12-29 00:00:00+00:00 to 2017-11-21 00:00:00+00:00 after redownload!
[2017-11-25 23:38:22.645942] INFO: Loader: Cache at /root/.zipline/data/treasury_curves.csv does not have data from 1990-01-02 00:00:00+00:00 to 2017-11-21 00:00:00+00:00.

[2017-11-25 23:38:22.646074] INFO: Loader: Downloading treasury data for 'SPY' from 1990-01-02 00:00:00+00:00 to 2017-11-21 00:00:00+00:00
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1395, in _has_valid_type
    error()
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2017-11-22 00:00:00+00:00] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/zipline", line 11, in <module>
    load_entry_point('zipline', 'console_scripts', 'zipline')()
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/zipline/zipline/__main__.py", line 101, in _
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/zipline/zipline/__main__.py", line 255, in run
    environ=os.environ,
  File "/zipline/zipline/utils/run_algo.py", line 185, in _run
    overwrite_sim_params=False,
  File "/zipline/zipline/algorithm.py", line 720, in run
    for perf in self.get_generator():
  File "/zipline/zipline/gens/tradesimulation.py", line 225, in transform
    handle_benchmark(normalize_date(dt))
  File "/zipline/zipline/gens/tradesimulation.py", line 181, in handle_benchmark
    benchmark_source.get_value(date)
  File "/zipline/zipline/sources/benchmark_source.py", line 75, in get_value
    return self._precalculated_series.loc[dt]
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1296, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1466, in _getitem_axis
    self._has_valid_type(key, axis)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1403, in _has_valid_type
    error()
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2017-11-22 00:00:00+00:00] is not in the [index]'

The traceback in the first exception doesn't give any clues as to where this is happening in the algorithm or int the zipline code:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1395, in _has_valid_type
    error()
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2017-11-22 00:00:00+00:00] is not in the [index]'

Am I missing something?

@freddiev4
Copy link
Contributor

freddiev4 commented Nov 26, 2017

Hey @ChrisPappalardo thanks for opening this. Hmm...looks like this is a benchmark issue where we have missing data from the API endpoint we're hitting

@ChrisPappalardo
Copy link
Contributor Author

I thought that too. I verified that the SPY_benchmark.csv file in the container had data through 11/24. I noticed that the treasury_curves.csv file did not (latest was 11/21). I manually added rows for 11/22 and 11/24 to that file but I still get the same KeyError after re-running.

@amarin15
Copy link

Unless you have cached benchmark data here

if data is not None:

when you try to load the benchmark data from the google API
data = pd_reader.DataReader(

you get an error because it has been offline since Oct 1
pydata/pandas-datareader@08a700e

@amarin15
Copy link

Actually, the API still works, it's just that the URL has changed. I've managed to make the DataReader part of the code work by doing a pip install -e . on the pandas-datareader source code, instead of using the latest PyPI 0.5.0 release. pydata/pandas-datareader#404 fixes this issue (pydata/pandas-datareader@7d8803d more specifically) and someone already asked for a new release in pydata/pandas-datareader#420, so feel free to upvote it if you're also interested.

@ChrisPappalardo
Copy link
Contributor Author

@alexukf Thanks for the reply. Your fix seems to work for the API issue, but doesn't fix the latest close issue. I believe it's due to the Fed H15 report being lagged a day (see here).

@freddiev4
Copy link
Contributor

Closing this as this should be fixed in the latest release of zipline. Feel free to update to 1.2.0 with either:

pip install -U zipline

or

conda update zipline -c quantopian

If you're still experiencing issues, please reopen this or open a new issue 🙂

@mihirkel
Copy link

The issue is still there as of Dec 2019. Cannot exactly recall. But someone metioned on a Google group that IEX no longer provides data for free, need to sign up with a cloud account. Is this the reason why this is failing again now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants