This demo is shown how to use the Glue Streaming feature to Manage continuous ingestion pipelines and processing data on-the-fly. The Glue Steaming Jobs is extending AWS Glue jobs, based on Apache Spark, to run continuously and consume data from streaming platforms such as Amazon Kinesis Data Streams and Apache Kafka (including the fully-managed Amazon MSK).
Glue can provision, manage, and scale the infrastructure to ingest data to data lakes on Amazon S3, data warehouses such as Amazon Redshift, or store streaming data in a DynamoDB table for quick lookups, or in Elasticsearch to look for specific operation patterns.
Glue Streaming is based on Spark Structured Streaming to implement data transformations, such as aggregating, partitioning, and formatting as well as joining with other data sets to enrich or cleanse the data for easier analysis.
Please find more details in Adding Streaming ETL Jobs in AWS Glue guide
IoT-Kinesis-GlueStreaming-Demo
This demo is shown how to use the Kinesis Data Anlytics to Manage continuous ingestion pipelines and processing data on-the-fly. Kinesis Data Anlytics can help you run continuously and consume data from streaming platforms such as Amazon Kinesis Data Streams and Apache Kafka (including the fully-managed Amazon MSK).
IoT-Kinesis-KinesisDataAnlytics-Demo
IoT-Kafka-KinesisDataAnlytics-Demo
This demo is shown how to use the Glue to ingest data from RDS database.
Architeture
Glue ingest MySQL5.7 via Glue connector
Glue ingest MySQL8.0 via Glue connector
Connect the RDS which SSL connection enabled
Pyhton-Send-Data-Firefose Demo
Build a business intelligence capability for streaming IoT device data using AWS IoT Core, Amazon Firehose, Amazon S3, Amazon Athena and Amazon QuickSight