-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide AWS Glue as an option #267
Comments
Thanks for providing feedback! Can you give us more details on what you would like to see in this construct? Think about your user experience and how this construct can help you as a data engineer (with your preferences). |
A few ideas.
I am in the process of making my own solutions to the above as I haven't heard of data-solutions-framework-on-aws before. I've looked at If there is alignment, I'll be happy to help add my planned features here in this project. |
Bouncing off on your ideas..
Our jobs are structured as follows:
What I did for our deployment was to have 2 config files. One CSV file that contains the JobName, Classification (default/custom), Category (Ingestion, etc.), ConnectionName (since our jobs run in private network). This CSV file will be used by the CDK to loop through and deploy the Glue Jobs. Another config file would be for managing the custom job (Clasification) which were tagged from the CSV file. |
One more point to consider for the feature, provide a way to run unit test, By inferring the arguments from the job construct and running them against the Glue runtime docker container. |
@klescosia Do I understand correctly you have implemented a config-file-based approach on top of CDK and Glue to create Glue jobs in a simpler way than the CDK L1 construct? |
@dashmug I see your tool as an equivalent of the EMR toolkit but for Glue: a packaged solution based on this blog post. Am I correct? What I am thinking of now is to provide as part of DSF:
|
Yes, that is correct. We have many Glue Jobs, each has different functionality and configurations. So I'm looping through the CSV file then executing |
There is already an alpha L2 construct for Glue, we will wait to see its final form before we work on this. In the meantime we will deliver a construct to package dependencies for glue jobs similar to the one we offer for EMR Spark runtime constructs. |
Provide AWS Glue as a processing layer
The text was updated successfully, but these errors were encountered: