Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance standard templates #75

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Enhance standard templates #75

wants to merge 11 commits into from

Conversation

aaverbec
Copy link
Contributor

No description provided.

@afoerster
Copy link
Contributor

Hey @aaverbec looks like this has some conflicts that need to be resolved before the tests can be run

@aaverbec
Copy link
Contributor Author

I tried resolving the conflicts, but it wouldn't let me commit the changes back. So I just copied my resolved code back to the branch. On the remaining conflicts, they should take my changes. I'm not sure how to make github accept my resolved conflicts. (I fix all of the conflicts and then click the merge option and it just spins for an hour+ saying it is trying to merge)

@aaverbec
Copy link
Contributor Author

If you'd prefer, I can re-fork my repository from the current repo, and then re-make my changes, and re-submit a new PR.

@@ -16,8 +16,7 @@ set -e
# Check parquet table
AVRO=$({{ conf.impala_cmd }} avro-table-rowcount.sql -B 2> /dev/null)
PARQUET=$({{ conf.impala_cmd }} report-table-rowcount.sql -B 2> /dev/null)
SOURCE=$(cat sourceCount.txt)

SOURCE=$({{ conf.source_database.cmd }} source-table-rowcount.sql -s -r -N -B 2> /dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is stderr piped to /dev/null

@@ -14,6 +14,6 @@

-- Query Parquet table in Impala
USE {{ conf.staging_database.name }};
INVALIDATE METADATA {{ table.destination.name }}_parquet;
SELECT COUNT(*) FROM {{ table.destination.name }}_parquet;
INVALIDATE METADATA {{ table.destination.name }}{% if conf.user_defined is defined and conf.user_defined.parquet_suffix is defined %}{{ conf.user_defined.parquet_suffix }}{% endif %};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of how this user_defined block is used and what it's for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises a flag because it will break existing scripts that are expecting the _parquet suffix

{% for column in table.columns %}
`{{ column.name.replace('/','_') }}` {{ map_datatypes(column).avro }} COMMENT "{{ column.comment }}"
`{{ cleanse_column(column.name) }}` {{ map_datatypes_v2(column).avro }} COMMENT "{{ column.comment }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of cleanse_column

hadoop fs -mkdir -p {{ conf.raw_database.path }}/{{ table.destination.name }}_avro/.meta/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure -p is the safe thing to do here. If the user has misconfigured something the application will happily keep running. What was your thought process for adding this? Can you manually create the directories first?

USE `{{ conf.staging_database.name }}`;
CREATE EXTERNAL TABLE IF NOT EXISTS `{{ table.destination.name }}` (
{%- for column in table.columns %}
`{{ cleanse_column(column.name) }}` {{ map_datatypes_v2(column, 'parquet') }} COMMENT "{{ column.comment }}" {%- if not loop.last -%}, {% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the decimal logic, does map_datatypes_v2 handle the decimal type?

@afoerster
Copy link
Contributor

I've never tried the github tools for merging so I don't know about that. I'd recommend merging on the command line or re-forking like you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants