Benchmark downloading is broken #1965

dmichalowicz · 2017-09-27T13:16:08Z

Fix benchmark downloading from Google with pandas-datareader. This issue was originally brought up here.

We now get benchmark data from Google instead of Yahoo, as seen here.

However, it appears that as of only a week or two ago, Google changed the URL from which they are serving their financial data, causing pandas datareader to break. This is also preventing us from rebuilding the test_examples data. (For more info see the original post above).

The text was updated successfully, but these errors were encountered:

cyniphile · 2017-09-27T15:16:50Z

Had this issue too. Just changed back to yahoo in the meantime in get_benchmark_returns which seems to work.

MBattagl · 2017-09-27T20:34:53Z

Same issue here #1953

freddiev4 · 2017-09-28T17:33:03Z

x-ref pydata/pandas-datareader#395

yiorgosn · 2017-10-04T07:25:58Z

Quick solution: Use a manually downloaded local copy of SPY (yahoo lets you download manually the entire history). I modified the benchmarks.py to look for a local csv copy instead. I attach the modified benchmarks.py file it should replace the existing one (so make a a copy of the original first before you overwrite it). The benchmarks.py file is usually found in: %USERPROFILE%\Anaconda3\envs\py34\Lib\site-packages\zipline\data. If you didn't create a unique environment for it then don't specify py34 after envs.

Also make sure that your local directory is reflected in this line in the code:
new_dir = 'c:/Downloaded_csv'

benchmarks.txt

tanaytrivedi · 2017-10-23T21:42:11Z

Hi,
Is there an official solution out there for running backtests and not having the system break every time because of this bechmark issue? @yiorgosn solution doesn't work for me, I think you have to do more than just replace the file. I have the exact same failure even with his file. Of course, I have replaced the file directory to make sure it looks in the right place for the csv.

Is there a way I can run the backtest without doing a benchmark until it is fixed? Without, that is, ripping up the code and removing any mention of benchmarks.
Thanks

brian-from-quantrocket · 2017-10-23T22:10:59Z

You can try setting the benchmark to an asset that's already in your bundle. For example if running the example algos with AAPL, tell Zipline to use AAPL as your benchmark.

from zipline.api import symbol, set_benchmark

def initialize(context):
    set_benchmark(symbol("AAPL"))

My experience has been that Zipline still downloads the SPY data (limited to a year) but at least refrains from using it in the backtest, and thus the backtest doesn't fail.

edwardlun · 2017-10-30T16:29:07Z

I have the same problem as @tanaytrivedi. Tried @yiorgosn solution but it still doesn't work. Are there any extra steps needed in addition to replacing benchmark.py? thanks a lot..

Steven-Sakurai · 2017-11-01T13:13:06Z

Thanks! @yiorgosn
I simply replaced the file and the benchmark is working fine now.

alexkojin · 2017-12-19T11:06:57Z

Google has changed the url for a finance data. Instead of http://www.google.com/ need to use https://finance.google.com/. Open a source code of pandas-datareader package and change urls.

scotthuang1989 · 2018-01-11T06:44:36Z

I have similar issue when I try to run example: buyapple.py
the error message is :

pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google.com/finance/historical?q=SPY&output=csv&startdate=Dec+29%2C+1989&enddate=Jan+09%2C+2018

I try to access the URL in webbrowser, google give following message:

... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.

It seems google have some anti-crawler method to prevent automatically get data.

Anyone have similar issue?

alexkojin · 2018-01-11T06:51:45Z

@scotthuang1989 as I wrote above you need change the url to https://finance.google.com/finance/historical?q=SPY&output=csv&startdate=Dec+29%2C+1989&enddate=Jan+09%2C+2018

scotthuang1989 · 2018-01-11T06:57:24Z

@alexkojin , I put this url into chrome. get a 404 error.

BTW, you mean I need change pandas source code and reinstall to override the official release?

alexkojin · 2018-01-11T07:03:37Z

@scotthuang1989 Sorry, the url is fixed now.
Yes, you can just change the source code. Or you can make a fork of panda-datareader, apply the fix, and install panda-reader from your fork.

scotthuang1989 · 2018-01-11T07:44:21Z

@alexkojin , afer dig into a little. I took @yiorgosn solution. i change benchmarks.py to read data from another source.
And I think next release will fix this issue. because master branch already have modified benchmarks.py.

beevor · 2018-01-31T11:43:07Z

@scotthuang1989, the fix proposed by @alexkojin is rather simple. Edit ~/anaconda3/envs/zipline/lib/python3.5/site-packages/pandas_datareader/google/daily.py. Change the url from 'http://www.google.com/finance/historical' to 'https://finance.google.com/finance/historical' and you should be good. Works on zipline=1.1.1-np1111py35, pandas_datareader=0.5.0 and pandas=0.18.1. Or, fork pandas-datareader.

dannypurcell · 2018-02-17T17:31:38Z

Why are we doing this in the first place when the benchmark symbol should be the quandl wiki bundle?

seanfuture · 2018-03-17T07:30:13Z

Thank you @yiorgosn .. For myself, the Mac OS X path was /usr/local/lib/python3.4/site-packages/zipline/data and the URL used to download all historical SPY data was https://finance.yahoo.com/quote/SPY/history?period1=728283600&period2=1521259200&interval=1d&filter=history&frequency=1d .. Once downloaded and your updated benchmarks.txt code was put in place, worked fine. Much appreciated. Aggravating when open source software doesn't work out of the box.

niklasamslgruber · 2018-03-17T13:43:16Z

Is there a solution yet? I tried changing the google url, but I get a "max retires exceeded with url" error, when running the program.

freddiev4 · 2018-03-17T22:30:54Z

@niklas-amslgruber there's a fix on master that uses IEX. You should be able to run a backtest up to 5 years from the current date using the zipline master branch, which you can install using:

git clone git@github.com:quantopian/zipline.git
pip install zipline/

or fork it and then do the same steps above, replacing quantopian with your-github-username.

Hoping to do a release of zipline in the next week or two as well so people can just pip install without cloning.

Also doing work here #2107 for a more permanent fix, but haven't had the chance to finish it.

niklasamslgruber · 2018-03-19T16:22:54Z

Pip install doesn't work for me ( I don't have the right to read from the remote repository). I can only install via Conda where the latest version on Github master is not available

freddiev4 · 2018-03-19T16:35:45Z

Hi @niklas-amslgruber you should be able to fork zipline and then run pip install/

The latest master is also available via conda by running:

conda install -c quantopian/label/ci -c quantopian zipline

niklasamslgruber · 2018-03-19T16:51:34Z

I always get this error message (installing with pip)

Command "/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-build-kzyizl42/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-wm71_e5s-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-build-kzyizl42/pandas/

freddiev4 · 2018-03-19T16:55:07Z

@niklas-amslgruber the reason for that is because Zipline we only build packages for Py27 and Py35 (you can see the badge in the README).

For conda, can create a new conda env for Python 3.5 using

conda create -n py35 python=3.5

Then run

conda install -c quantopian/label/ci -c quantopian zipline

Or create a Python 3.5 virtualenv and then run pip install zipline/.

niklasamslgruber · 2018-03-24T16:57:02Z

This error still exists even though I followed your instructions and installed it on Python 3.5 with Anaconda.

pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google.com/finance/historical?output=csv&q=SPY&enddate=Mar+21%2C+2018&startdate=Dec+29%2C+1989

Sentdex · 2018-03-25T15:27:20Z

Not only does this problem still exist, even after fixing the url to be finance.google.com, you still get an error that you're sending automated requests. We can overcome this, but the google finance api is just plain unstable anyway. Better off using quandl.......or custom bundle symbol.

What I am failing to understand is why we're downloading a benchmark from any website when we have a bundle? set_benchmark doesn't seem to care at all, which is very strange. Should be able to use benchmarking symbol from our custom set.

xrvo · 2018-03-25T18:40:54Z

Changing the benchmark data source to morningstar worked for me.

To do this, in [your_env]/lib/python3.5/site-packages/zipline/data/benchmarks.py make the 2 changes marked by # NEW

data = pd_reader.DataReader(
        symbol,
        'morningstar', # NEW
        first_date,
        last_date
    )

    data = data.reset_index(0, drop=True) # NEW
    data = data['Close']

However, I agree with @Sentdex: fetching the benchmark data from the local bundle would be an improvement -- both in speed and stability.

Edit: Morningstar data was new for pandas-datareader v0.6.0, so a version upgrade may be necessary.

kelvinho8 · 2018-04-02T15:27:52Z

@xrvo Hi, Matt. I made the 2 changes in benchmark.py but it's not working. The error message is as follows:

======================================
File "C:\Users\Kelvin\AppData\Local\conda\conda\envs\py35\lib\site-packages\pandas_datareader\data.py", line 175, in DataReader
raise NotImplementedError(msg)

NotImplementedError: data_source='morningstar' is not implemented

=================================
Am I missing something that needs to be changed too?

Thanks.
Kelvin

xrvo · 2018-04-02T15:51:33Z

@kelvinho8: It looks like the Morningstar data connector is a fairly recent addition to pandas-datareader. It was added in v0.6.0.

You should be able to resolve this error by upgrading your pandas-datareader to v0.6.0, which is currently the latest release.

phlsmk · 2018-04-04T11:26:32Z

thanks @xrvo #1965 (comment) worked for me too with pandas-datareader v0.6.0.

blackcabbage1023 · 2018-04-05T02:57:47Z

I tried to upgrade to pandas-datareader v0.6.0. but it still does not work
I came across with this problem.

File "/Applications/anaconda3/envs/introduction_programming/lib/python3.5/site-packages/pandas_datareader/compat/init.py", line 8, in
import pandas.io.common as com

AttributeError: module 'pandas.io' has no attribute 'common'

blackcabbage1023 · 2018-04-05T03:09:04Z

Update: I then tried to change pandas.io into pandas_datareader as mentioned below.
https://github.com/pydata/pandas-datareader

But still it does not work.

xrvo · 2018-04-05T03:29:04Z

@blackcabbage1023 it seems like this error would only happen if there's either a problem with your environment or you have a really old version of pandas. Your pandas version should be v0.18.1 as per the zipline requirements.
If this doesn't work, I suggest you try creating a new virtual environment from scratch.

freddiev4 · 2018-04-05T05:18:36Z

We recently released Zipline 1.2.0 on PyPI, as well as conda packages for Linux and Windows (macOS soon); please try installing the latest release

You can see the release notes here Feel free to update to 1.2.0 with either:

pip install -U zipline

or

conda update zipline -c quantopian

niklasamslgruber · 2018-04-27T15:35:25Z

When will you release the macOS version? @freddiev4

freddiev4 · 2018-04-27T15:53:50Z

@niklay14 I currently don't have a time-line in mind as exams are coming up. I believe if you have conda you can also just pip install zipline as well

niklasamslgruber · 2018-04-27T16:05:28Z

I get this error message while installing zipline with pip: @freddiev4

Command "/Users/niklasamslgruber/anaconda3/envs/py35/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-req-build-qb36zhft/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-record-fcez5zho/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-req-build-qb36zhft/

dmichalowicz added Benchmark Data Bundle labels Sep 27, 2017

dmichalowicz mentioned this issue Sep 27, 2017

ENH: Change default commission to .001 #1946

Merged

JoaoAparicio mentioned this issue Oct 4, 2017

ENH: --force-redownload optional parameter. #1973

Closed

Peque mentioned this issue Oct 5, 2017

ENH: Fix Pandas 0.19.2 compatibility issues #1975

Closed

freddiev4 mentioned this issue Nov 30, 2017

ENH: Use IEX Trading data instead of pandas-datareader #2031

Merged

alexkojin mentioned this issue Dec 19, 2017

Unable to run zipline example with fresh install #2002

Closed

freddiev4 added the Close on Next Release label Dec 27, 2017

freddiev4 mentioned this issue Jan 2, 2018

RemoteDataError #2070

Closed

freddiev4 mentioned this issue Jan 23, 2018

use data from local csv file #2088

Closed

freddiev4 closed this as completed Apr 9, 2018

freddiev4 mentioned this issue Apr 27, 2018

Unable to ingest default data bundle #2156

Closed

Benchmark downloading is broken #1965

Benchmark downloading is broken #1965

Comments

dmichalowicz commented Sep 27, 2017

cyniphile commented Sep 27, 2017

MBattagl commented Sep 27, 2017

freddiev4 commented Sep 28, 2017

yiorgosn commented Oct 4, 2017 • edited Loading

tanaytrivedi commented Oct 23, 2017

brian-from-quantrocket commented Oct 23, 2017 • edited Loading

edwardlun commented Oct 30, 2017

Steven-Sakurai commented Nov 1, 2017

alexkojin commented Dec 19, 2017 • edited Loading

scotthuang1989 commented Jan 11, 2018

alexkojin commented Jan 11, 2018 • edited Loading

scotthuang1989 commented Jan 11, 2018

alexkojin commented Jan 11, 2018

scotthuang1989 commented Jan 11, 2018

beevor commented Jan 31, 2018

dannypurcell commented Feb 17, 2018

seanfuture commented Mar 17, 2018

niklasamslgruber commented Mar 17, 2018

freddiev4 commented Mar 17, 2018 • edited Loading

niklasamslgruber commented Mar 19, 2018

freddiev4 commented Mar 19, 2018

niklasamslgruber commented Mar 19, 2018

freddiev4 commented Mar 19, 2018 • edited Loading

niklasamslgruber commented Mar 24, 2018

Sentdex commented Mar 25, 2018

xrvo commented Mar 25, 2018 • edited Loading

kelvinho8 commented Apr 2, 2018

xrvo commented Apr 2, 2018

phlsmk commented Apr 4, 2018

blackcabbage1023 commented Apr 5, 2018

blackcabbage1023 commented Apr 5, 2018

xrvo commented Apr 5, 2018

freddiev4 commented Apr 5, 2018 • edited Loading

niklasamslgruber commented Apr 27, 2018

freddiev4 commented Apr 27, 2018

niklasamslgruber commented Apr 27, 2018

yiorgosn commented Oct 4, 2017 •

edited

Loading

brian-from-quantrocket commented Oct 23, 2017 •

edited

Loading

alexkojin commented Dec 19, 2017 •

edited

Loading

alexkojin commented Jan 11, 2018 •

edited

Loading

freddiev4 commented Mar 17, 2018 •

edited

Loading

freddiev4 commented Mar 19, 2018 •

edited

Loading

xrvo commented Mar 25, 2018 •

edited

Loading

freddiev4 commented Apr 5, 2018 •

edited

Loading