-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance standard templates #75
base: master
Are you sure you want to change the base?
Conversation
Enhance templates to convert bigint to timestamp Also add split-by logic to sqoop statement
Hey @aaverbec looks like this has some conflicts that need to be resolved before the tests can be run |
I tried resolving the conflicts, but it wouldn't let me commit the changes back. So I just copied my resolved code back to the branch. On the remaining conflicts, they should take my changes. I'm not sure how to make github accept my resolved conflicts. (I fix all of the conflicts and then click the merge option and it just spins for an hour+ saying it is trying to merge) |
If you'd prefer, I can re-fork my repository from the current repo, and then re-make my changes, and re-submit a new PR. |
@@ -16,8 +16,7 @@ set -e | |||
# Check parquet table | |||
AVRO=$({{ conf.impala_cmd }} avro-table-rowcount.sql -B 2> /dev/null) | |||
PARQUET=$({{ conf.impala_cmd }} report-table-rowcount.sql -B 2> /dev/null) | |||
SOURCE=$(cat sourceCount.txt) | |||
|
|||
SOURCE=$({{ conf.source_database.cmd }} source-table-rowcount.sql -s -r -N -B 2> /dev/null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is stderr piped to /dev/null
@@ -14,6 +14,6 @@ | |||
|
|||
-- Query Parquet table in Impala | |||
USE {{ conf.staging_database.name }}; | |||
INVALIDATE METADATA {{ table.destination.name }}_parquet; | |||
SELECT COUNT(*) FROM {{ table.destination.name }}_parquet; | |||
INVALIDATE METADATA {{ table.destination.name }}{% if conf.user_defined is defined and conf.user_defined.parquet_suffix is defined %}{{ conf.user_defined.parquet_suffix }}{% endif %}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of how this user_defined
block is used and what it's for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises a flag because it will break existing scripts that are expecting the _parquet
suffix
{% for column in table.columns %} | ||
`{{ column.name.replace('/','_') }}` {{ map_datatypes(column).avro }} COMMENT "{{ column.comment }}" | ||
`{{ cleanse_column(column.name) }}` {{ map_datatypes_v2(column).avro }} COMMENT "{{ column.comment }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good use of cleanse_column
hadoop fs -mkdir -p {{ conf.raw_database.path }}/{{ table.destination.name }}_avro/.meta/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure -p
is the safe thing to do here. If the user has misconfigured something the application will happily keep running. What was your thought process for adding this? Can you manually create the directories first?
USE `{{ conf.staging_database.name }}`; | ||
CREATE EXTERNAL TABLE IF NOT EXISTS `{{ table.destination.name }}` ( | ||
{%- for column in table.columns %} | ||
`{{ cleanse_column(column.name) }}` {{ map_datatypes_v2(column, 'parquet') }} COMMENT "{{ column.comment }}" {%- if not loop.last -%}, {% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed the decimal logic, does map_datatypes_v2 handle the decimal type?
I've never tried the github tools for merging so I don't know about that. I'd recommend merging on the command line or re-forking like you said. |
No description provided.