-
Notifications
You must be signed in to change notification settings - Fork 55
Description
As I understand it, query_to_pandas_safe requires FROM fields to be specified in a somewhat cumbersome way as shown in the following query
SELECT license, COUNT(1) num_repos
FROM `bigquery-public-data.github_repos.licenses`
GROUP BY license
The bq_helper object used to run this query already knows the query is being called on bigquery-public-data.github_repos. This could be programmatically added, so the user can run the query as
SELECT license, COUNT(1) num_repos
FROM licenses
GROUP BY license
This query looks much nicer. I see two approaches to implement this change and maintain backwards compatibility:
- Use a regex or python string functions to determine whether the helper needs to add
\self.active_project + '.' + self.dataset_name + ...` to the table name - Add an optional argument
simplified_table_namewhich determines whether to do the string manipulation described above.
The value of this change may depend on the design of the upcoming BQ integration.
Will Kaggle users continue using bq_helper?
Will the integration do this in any way?
etc.
Maybe @harrisse @mrisdal will have insight on it. If this is going to be an ongoing issue, I can send a PR for one of the two proposals above.