Skip to content

Simplify specification of FROM table in query_to_pandas_safe #14

@dansbecker

Description

@dansbecker

As I understand it, query_to_pandas_safe requires FROM fields to be specified in a somewhat cumbersome way as shown in the following query

SELECT license, COUNT(1) num_repos 
FROM `bigquery-public-data.github_repos.licenses` 
GROUP BY license 

The bq_helper object used to run this query already knows the query is being called on bigquery-public-data.github_repos. This could be programmatically added, so the user can run the query as

SELECT license, COUNT(1) num_repos 
FROM licenses 
GROUP BY license 

This query looks much nicer. I see two approaches to implement this change and maintain backwards compatibility:

  1. Use a regex or python string functions to determine whether the helper needs to add \self.active_project + '.' + self.dataset_name + ...` to the table name
  2. Add an optional argument simplified_table_name which determines whether to do the string manipulation described above.

The value of this change may depend on the design of the upcoming BQ integration.
Will Kaggle users continue using bq_helper?
Will the integration do this in any way?
etc.

Maybe @harrisse @mrisdal will have insight on it. If this is going to be an ongoing issue, I can send a PR for one of the two proposals above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions