Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark downloading is broken #1965

Closed
dmichalowicz opened this issue Sep 27, 2017 · 36 comments
Closed

Benchmark downloading is broken #1965

dmichalowicz opened this issue Sep 27, 2017 · 36 comments

Comments

@dmichalowicz
Copy link
Contributor

Fix benchmark downloading from Google with pandas-datareader. This issue was originally brought up here.

We now get benchmark data from Google instead of Yahoo, as seen here.

However, it appears that as of only a week or two ago, Google changed the URL from which they are serving their financial data, causing pandas datareader to break. This is also preventing us from rebuilding the test_examples data. (For more info see the original post above).

@cyniphile
Copy link
Contributor

Had this issue too. Just changed back to yahoo in the meantime in get_benchmark_returns which seems to work.

@MBattagl
Copy link

Same issue here #1953

@freddiev4
Copy link
Contributor

x-ref pydata/pandas-datareader#395

@yiorgosn
Copy link

yiorgosn commented Oct 4, 2017

Quick solution: Use a manually downloaded local copy of SPY (yahoo lets you download manually the entire history). I modified the benchmarks.py to look for a local csv copy instead. I attach the modified benchmarks.py file it should replace the existing one (so make a a copy of the original first before you overwrite it). The benchmarks.py file is usually found in: %USERPROFILE%\Anaconda3\envs\py34\Lib\site-packages\zipline\data. If you didn't create a unique environment for it then don't specify py34 after envs.

Also make sure that your local directory is reflected in this line in the code:
new_dir = 'c:/Downloaded_csv'

benchmarks.txt

@tanaytrivedi
Copy link

Hi,
Is there an official solution out there for running backtests and not having the system break every time because of this bechmark issue? @yiorgosn solution doesn't work for me, I think you have to do more than just replace the file. I have the exact same failure even with his file. Of course, I have replaced the file directory to make sure it looks in the right place for the csv.

Is there a way I can run the backtest without doing a benchmark until it is fixed? Without, that is, ripping up the code and removing any mention of benchmarks.
Thanks

@brian-from-quantrocket
Copy link

brian-from-quantrocket commented Oct 23, 2017

You can try setting the benchmark to an asset that's already in your bundle. For example if running the example algos with AAPL, tell Zipline to use AAPL as your benchmark.

from zipline.api import symbol, set_benchmark

def initialize(context):
    set_benchmark(symbol("AAPL"))

My experience has been that Zipline still downloads the SPY data (limited to a year) but at least refrains from using it in the backtest, and thus the backtest doesn't fail.

@edwardlun
Copy link

I have the same problem as @tanaytrivedi. Tried @yiorgosn solution but it still doesn't work. Are there any extra steps needed in addition to replacing benchmark.py? thanks a lot..

@Steven-Sakurai
Copy link

Thanks! @yiorgosn
I simply replaced the file and the benchmark is working fine now.

@alexkojin
Copy link

alexkojin commented Dec 19, 2017

Google has changed the url for a finance data. Instead of http://www.google.com/ need to use https://finance.google.com/. Open a source code of pandas-datareader package and change urls.

@scotthuang1989
Copy link

I have similar issue when I try to run example: buyapple.py
the error message is :

pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google.com/finance/historical?q=SPY&output=csv&startdate=Dec+29%2C+1989&enddate=Jan+09%2C+2018

I try to access the URL in webbrowser, google give following message:

... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.

It seems google have some anti-crawler method to prevent automatically get data.

Anyone have similar issue?

@alexkojin
Copy link

alexkojin commented Jan 11, 2018

@scotthuang1989
Copy link

@alexkojin , I put this url into chrome. get a 404 error.

BTW, you mean I need change pandas source code and reinstall to override the official release?

@alexkojin
Copy link

@scotthuang1989 Sorry, the url is fixed now.
Yes, you can just change the source code. Or you can make a fork of panda-datareader, apply the fix, and install panda-reader from your fork.

@scotthuang1989
Copy link

@alexkojin , afer dig into a little. I took @yiorgosn solution. i change benchmarks.py to read data from another source.
And I think next release will fix this issue. because master branch already have modified benchmarks.py.

@beevor
Copy link

beevor commented Jan 31, 2018

@scotthuang1989, the fix proposed by @alexkojin is rather simple. Edit ~/anaconda3/envs/zipline/lib/python3.5/site-packages/pandas_datareader/google/daily.py. Change the url from 'http://www.google.com/finance/historical' to 'https://finance.google.com/finance/historical' and you should be good. Works on zipline=1.1.1-np1111py35, pandas_datareader=0.5.0 and pandas=0.18.1. Or, fork pandas-datareader.

@dannypurcell
Copy link

Why are we doing this in the first place when the benchmark symbol should be the quandl wiki bundle?

@seanfuture
Copy link

Thank you @yiorgosn .. For myself, the Mac OS X path was /usr/local/lib/python3.4/site-packages/zipline/data and the URL used to download all historical SPY data was https://finance.yahoo.com/quote/SPY/history?period1=728283600&period2=1521259200&interval=1d&filter=history&frequency=1d .. Once downloaded and your updated benchmarks.txt code was put in place, worked fine. Much appreciated. Aggravating when open source software doesn't work out of the box.

@niklasamslgruber
Copy link

Is there a solution yet? I tried changing the google url, but I get a "max retires exceeded with url" error, when running the program.

@freddiev4
Copy link
Contributor

freddiev4 commented Mar 17, 2018

@niklas-amslgruber there's a fix on master that uses IEX. You should be able to run a backtest up to 5 years from the current date using the zipline master branch, which you can install using:

git clone git@github.com:quantopian/zipline.git
pip install zipline/

or fork it and then do the same steps above, replacing quantopian with your-github-username.

Hoping to do a release of zipline in the next week or two as well so people can just pip install without cloning.

Also doing work here #2107 for a more permanent fix, but haven't had the chance to finish it.

@niklasamslgruber
Copy link

Pip install doesn't work for me ( I don't have the right to read from the remote repository). I can only install via Conda where the latest version on Github master is not available

@freddiev4
Copy link
Contributor

Hi @niklas-amslgruber you should be able to fork zipline and then run pip install/

The latest master is also available via conda by running:

conda install -c quantopian/label/ci -c quantopian zipline

@niklasamslgruber
Copy link

I always get this error message (installing with pip)

Command "/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-build-kzyizl42/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-wm71_e5s-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-build-kzyizl42/pandas/

@freddiev4
Copy link
Contributor

freddiev4 commented Mar 19, 2018

@niklas-amslgruber the reason for that is because Zipline we only build packages for Py27 and Py35 (you can see the badge in the README).

For conda, can create a new conda env for Python 3.5 using

conda create -n py35 python=3.5

Then run

conda install -c quantopian/label/ci -c quantopian zipline

Or create a Python 3.5 virtualenv and then run pip install zipline/.

@niklasamslgruber
Copy link

This error still exists even though I followed your instructions and installed it on Python 3.5 with Anaconda.

pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google.com/finance/historical?output=csv&q=SPY&enddate=Mar+21%2C+2018&startdate=Dec+29%2C+1989

@Sentdex
Copy link

Sentdex commented Mar 25, 2018

Not only does this problem still exist, even after fixing the url to be finance.google.com, you still get an error that you're sending automated requests. We can overcome this, but the google finance api is just plain unstable anyway. Better off using quandl.......or custom bundle symbol.

What I am failing to understand is why we're downloading a benchmark from any website when we have a bundle? set_benchmark doesn't seem to care at all, which is very strange. Should be able to use benchmarking symbol from our custom set.

@xrvo
Copy link

xrvo commented Mar 25, 2018

Changing the benchmark data source to morningstar worked for me.

To do this, in [your_env]/lib/python3.5/site-packages/zipline/data/benchmarks.py make the 2 changes marked by # NEW

data = pd_reader.DataReader(
        symbol,
        'morningstar', # NEW
        first_date,
        last_date
    )

    data = data.reset_index(0, drop=True) # NEW
    data = data['Close']

However, I agree with @Sentdex: fetching the benchmark data from the local bundle would be an improvement -- both in speed and stability.

Edit: Morningstar data was new for pandas-datareader v0.6.0, so a version upgrade may be necessary.

@kelvinho8
Copy link

@xrvo Hi, Matt. I made the 2 changes in benchmark.py but it's not working. The error message is as follows:

======================================
File "C:\Users\Kelvin\AppData\Local\conda\conda\envs\py35\lib\site-packages\pandas_datareader\data.py", line 175, in DataReader
raise NotImplementedError(msg)

NotImplementedError: data_source='morningstar' is not implemented

=================================
Am I missing something that needs to be changed too?

Thanks.
Kelvin

@xrvo
Copy link

xrvo commented Apr 2, 2018

@kelvinho8: It looks like the Morningstar data connector is a fairly recent addition to pandas-datareader. It was added in v0.6.0.

You should be able to resolve this error by upgrading your pandas-datareader to v0.6.0, which is currently the latest release.

@phlsmk
Copy link

phlsmk commented Apr 4, 2018

thanks @xrvo #1965 (comment) worked for me too with pandas-datareader v0.6.0.

@blackcabbage1023
Copy link

I tried to upgrade to pandas-datareader v0.6.0. but it still does not work
I came across with this problem.

File "/Applications/anaconda3/envs/introduction_programming/lib/python3.5/site-packages/pandas_datareader/compat/init.py", line 8, in
import pandas.io.common as com

AttributeError: module 'pandas.io' has no attribute 'common'

@blackcabbage1023
Copy link

Update: I then tried to change pandas.io into pandas_datareader as mentioned below.
https://github.com/pydata/pandas-datareader

But still it does not work.

@xrvo
Copy link

xrvo commented Apr 5, 2018

@blackcabbage1023 it seems like this error would only happen if there's either a problem with your environment or you have a really old version of pandas. Your pandas version should be v0.18.1 as per the zipline requirements.
If this doesn't work, I suggest you try creating a new virtual environment from scratch.

@freddiev4
Copy link
Contributor

freddiev4 commented Apr 5, 2018

We recently released Zipline 1.2.0 on PyPI, as well as conda packages for Linux and Windows (macOS soon); please try installing the latest release

You can see the release notes here Feel free to update to 1.2.0 with either:

pip install -U zipline

or

conda update zipline -c quantopian

@niklasamslgruber
Copy link

When will you release the macOS version? @freddiev4

@freddiev4
Copy link
Contributor

@niklay14 I currently don't have a time-line in mind as exams are coming up. I believe if you have conda you can also just pip install zipline as well

@niklasamslgruber
Copy link

I get this error message while installing zipline with pip: @freddiev4

Command "/Users/niklasamslgruber/anaconda3/envs/py35/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-req-build-qb36zhft/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-record-fcez5zho/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/1n/525648kx213bn0q2bhn3tjrh0000gn/T/pip-req-build-qb36zhft/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests