Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]' #1957

Closed
kerwinxu opened this issue Sep 20, 2017 · 11 comments
Closed

Comments

@kerwinxu
Copy link

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

  • Operating System: (win 10`)
  • Python Version: python35
  • Python Bitness: $ python -c 'import math, sys;print(int(math.log(sys.maxsize + 1, 2) + 1))'
  • How did you install Zipline: (conda)
  • Python packages: $ pip freeze or $ conda list

Now that you know a little about me, let me tell you about the issue I am
having:

Description of Issue

  • What did you expect to happen?
  • What happened instead?

Here is how you can reproduce this issue on your machine:

Reproduction Steps

1.i install "conda install -n python35 -c Quantopian zipline"
2.zipline ingest
3.zipline run -f dual_moving_average.py --start 2011-1-1 --end 2012-1-1 -o dma.pickle
4.error:
[2017-09-20 02:40:15.276265] WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-09-20 02:19:52.057758+00:00.
Traceback (most recent call last):
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1395, in _has_valid_type
error()
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1390, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "d:\Anaconda3\envs\python35\Scripts\zipline-script.py", line 11, in
load_entry_point('zipline==1.1.1', 'console_scripts', 'zipline')()
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 722, in call
return self.main(*args, **kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 1066, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 535, in invoke
return callback(*args, **kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline_main
.py", line 97, in _
return f(*args, **kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline_main_.py", line 240, in run
environ=os.environ,
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\utils\run_algo.py", line 179, in _run
overwrite_sim_params=False,
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\algorithm.py", line 709, in run
for perf in self.get_generator():
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\gens\tradesimulation.py", line 230, in transform
handle_benchmark(normalize_date(dt))
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\gens\tradesimulation.py", line 190, in handle_benchmark
benchmark_source.get_value(date)
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\sources\benchmark_source.py", line 75, in get_value
return self._precalculated_series.loc[dt]
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1296, in getitem
return self._getitem_axis(key, axis=0)
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1466, in _getitem_axis
self._has_valid_type(key, axis)
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1403, in _has_valid_type
error()
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1390, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]'
...

What steps have you taken to resolve this already?

...

Anything else?

...

Sincerely,
$ whoami

@freddiev4
Copy link
Contributor

freddiev4 commented Sep 20, 2017

Hi @kerwinxu I'm currently looking at #1953 #1950 #1949 #1947 and it looks like these are all the same problem; they're coming from Google no longer giving us enough benchmark data. Before, we could get up to 4000 days of data for SPY, but it seems we can only get about 251 days now; most likely due to some changes in the Google Finance API

@freddiev4
Copy link
Contributor

I think until it is fixed, an idea might be to copy https://github.com/quantopian/zipline/blob/master/zipline/resources/market_data/SPY_benchmark.csv to your ~/.zipline/data/ directory and then try running again

@QuantGuy01
Copy link

Thanks about the pointers to benchmarks. I found the code doing this and it looks like Google (not Yahoo) is returning just the last year's worth of data, no matter what dates you pass it. I see other people have since commented on the same.

The latest pandas_reader version also has this same behavior. I modified the benchmarks.py code to use Yahoo and print the data to STDOUT and I then fetched the data as a one-off. I then saved the data into SPY_benchmarks.csv.

I tried just leaving Yahoo in there permanently, but it comes back with errors and I think it has something to do with it rate limiting connections. So doing a one-off grab and saving it into the csv and then changing it back to google worked for me.

Thanks for the help everyone.

@ezfine
Copy link

ezfine commented Sep 20, 2017

As I mentioned in #1950 the copy from a prepared SPY_benchmark.csv without up-to-date does not work because zipline will compare the latest date and download from Google.

I think currently the better work-around is using the yahoo data with a yahoo-fix-patch for pandas Datareader, here is the reference and see the comment by @edmunch. It does work for me.

@kerwinxu
Copy link
Author

this patch seems ok .

edmunch commented on 30 Jun • edited
Solution for me using YAHOO... quick and dirty

install pandas_datareader
install fix_yahoo_finance from here: https://pypi.python.org/pypi/fix-yahoo-finance

patch Benchmarks.py with:

import pandas as pd

from six.moves.urllib_parse import urlencode

import pandas_datareader as pdr #NEW
import fix_yahoo_finance as yf #NEW
yf.pdr_override()#NEW

def get_benchmark_returns(symbol, start_date, end_date):
print('NEW')
df = pdr.data.get_data_yahoo(symbol, start=start_date, end=end_date)
df.to_csv('{}_D1.csv'.format(symbol))
return pd.read_csv('{}_D1.csv'.format(symbol),
parse_dates=['Date'],
index_col='Date',
usecols=["Adj Close", "Date"],
squeeze=True, # squeeze tells pandas to make this a Series
# instead of a 1-column DataFrame
).sort_index().tz_localize('UTC').pct_change(1).iloc[1:]

@zxweed
Copy link

zxweed commented Sep 29, 2017

no, even if I put the correct SPY_benchmark.csv, call to TradingAlgorithm overwrite it with the wrong version! Please, reopen the issue...

@ezfine
Copy link

ezfine commented Sep 29, 2017

@zxweed
Please see my comment above. It doesn't work by just correcting the spy_benchmark.csv file. You should patch the yahoo download module of pandas DataReader.

@zxweed
Copy link

zxweed commented Sep 29, 2017

@ezfine I have not used the yahoo download because it's closed by yahoo couple of months ago. I have used the quandl as a source.

@ezfine
Copy link

ezfine commented Sep 29, 2017

Yes, yahoo made changes of its api several months ago and that's why we need a patch for pandas DataReader. I didn't try quandl data on zipline because it doesn't provide adjust close data.

@JoaoAparicio
Copy link
Contributor

JoaoAparicio commented Oct 2, 2017

@ezfine @zxweed The original issue at the top of this thread (to be clear, the one with the warning message "WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-09-20 02:19:52.057758+00:00.") has nothing to do with recent changes to google API. data/loader.py has hard-coded a cooldown between downloads of one hour.

@kerwinxu Please read previous paragraph. Perhaps an optional flag to force downloads despite cooldown would be a good idea? Would you like me to PR this?

@freddiev4
Copy link
Contributor

The reason for this is because Google has now limited users to about 251 days worth of data per request, so you can't run backtests over a year. There is a fix currently being worked on.

There are duplicates of this issue so I'm just going to direct everyone to this issue: #1965. I'll comment there when there is a fix on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants