Skip to content

Commit

Permalink
Sorr task522 follow up for pb api (#574)
Browse files Browse the repository at this point in the history
* Google Oauth setting

* added empty phantom api file

* fixed linter error

* SorrTask522_Follow_up_for_PB_API

* SorrTask522_Follow_up_for_PB_API_refined

* SorrTask522_Follow_up_for_PB_API_service_account

* SorrTask522_Follow_up_for_PB_API_service_account

* SorrTask522_Follow_up_for_PB_API_service_account

* Editing

---------

Co-authored-by: yiyunlei <im.yiyun.lei@gmail.com>
Co-authored-by: GP Saggese <saggese@gmail.com>
  • Loading branch information
3 people authored Nov 2, 2023
1 parent 1d28663 commit 99ee4d6
Show file tree
Hide file tree
Showing 9 changed files with 801 additions and 757 deletions.
8 changes: 7 additions & 1 deletion devops/env/default.env
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# TODO(gp): Not sure about how useful is this file.
# This file defines the default type of some env variables.
# These variables will later be changed by Docker command builder methods, but
# need to be set prior to the command, otherwise the command will fail due to
# an invalid YAML detected.
# E.g.: If `PORT` is deleted, it will be initialized as a string, causing
# failures in `i docker_bash`
# See https://stackoverflow.com/questions/64499521/docker-compose-services-vulcain-ports-contains-an-invalid-type-it-should-be-a
PORT=9999
#AWS_ACCESS_KEY_ID=''
#AWS_DEFAULT_REGION=''
Expand Down
74 changes: 56 additions & 18 deletions docs/Gsheet_into_pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,66 @@

# Connecting Google Sheets to Pandas

- In order to load a google sheet into a pandas dataframe (or the other way
around), one can use a library called `gspread-pandas`.
- In order to load a Google sheet into a Pandas dataframe (or the other way
around), you can use a library called `gspread-pandas`.
- Documentation for the package is
[here](https://gspread-pandas.readthedocs.io/en/latest/index.html)


## Installing gspread-pandas

- The library should be automatically installed in your conda env
- The detailed instructions on how to install the library are located here:
[Installation/Usage](https://gspread-pandas.readthedocs.io/en/latest/getting_started.html#installation-usage).
- The library should be automatically installed in the Dev container
- If not you can install it in the notebook with
```
notebook> !pip install gspread-pandas
```
- Or in the Docker container with:
```
docker> sudo /bin/bash -c "(source /venv/bin/activate; pip install gspread)"
```

- To check that the library is installed
- In a notebook
```
notebook> import gspread; print(gspread.__version__)
```
- In the dev container
```
docker> python -c "import gspread; print(gspread.__version__)"
5.10.0
```

## Configuring gspread-pandas

- Client credentials need to be generated by each user independently.
- The instructions are provided
[here](https://gspread-pandas.readthedocs.io/en/latest/getting_started.html#client-credentials).
- You need to have a service account key that has access to the Google drive
space for modification
- Normally the default one `helpers/.google_credentials/service.json` would
work.
- If you need to modify a Google Drive space where the default service account
does not have access to, follow the instruction
[here](https://gspread-pandas.readthedocs.io/en/latest/getting_started.html#client-credentials)
to get your own `your_service.json`, store it and use it as the service
account key path in `hgoogle_file_api.py`.

- The process is not complicated but it's not obvious since you need to click
around in the GUI
- The credentials file is a JSON downloaded from Google.
- `gspread-pandas` leverages `gspread`

- Following the process in https://docs.gspread.org/en/latest/oauth2.html
- Create a project using a name like "gp_gspread"
- Search for "Drive API" and click on Enable API
- Search for "Sheets API" and click on Enable API
- On top click on "+ Create Credentials" and select OAuth client ID
- Then you are going to get a pop up with "OAuth client created"
- Click "Download JSON" at the bottom
- The file downloaded is like "client_secret_42164...-00pdvmfnf3lrda....apps.googleusercontent.com"
- Move the file to `helpers/.google_credentials/client_secrets.json` (Overwrite
the existing placeholder file).
```
> mv ~/Downloads/client_secret_421642061916-00pdvmfnf3lrdasoh2ccsnqb5akr4v9f.apps.googleusercontent.com.json ~/src/sorrentum1/helpers/.google_credentials/client_secrets.json
> chmod 600 ~/src/sorrentum1/helpers/.google_credentials/client_secrets.json
```

- Some gotchas:
- Make sure to act only under your `...` account.
Expand All @@ -36,10 +80,12 @@
- When you are given a choice of time periods to create something for, choose
the longest one.
- When you are given a choice between `OAuth client ID` and `Service account`,
choose `OAuth client ID`.
choose `Service account`.

## Using `gspread` on the server
- TODO(gp): Check if this flow works
- To use the library on the server, the downloaded JSON with the credentials
needs to be stored on your laptop
needs to be stored on the server
```bash
> export SRC_FILE="~/Downloads/client_secret_4642711342-ib06g3lbv6pa4n622qusqrjk8j58o8k6.apps.googleusercontent.com.json"
> export DST_DIR="~/.config/gspread_pandas"
Expand All @@ -60,14 +106,6 @@
- The notebook with the usage example is located at
`amp/core/notebooks/gsheet_into_pandas_example.ipynb`.

- The first time the library is used, it will asks the user for an authorization
code.

![image](https://user-images.githubusercontent.com/22771988/78498562-4e695580-774b-11ea-9f4e-08a413567e24.png)

- After the authorization code is provided for the first time, it won't be asked
again.

- The official use documentation is provided
[here](https://gspread-pandas.readthedocs.io/en/latest/using.html).

Expand Down
Loading

0 comments on commit 99ee4d6

Please sign in to comment.