Data Ingestion Pipelines

This project demonstrates the implementation of two different data ingestion approaches: batch data migration using Apache Sqoop and real-time data ingestion using Apache Flume and Apache Kafka.

Task 1: Data Migration with Apache Sqoop

Task Overview

This task focuses on migrating historical data from a local relational database to Hadoop's HDFS using Apache Sqoop. It includes periodic updates with incremental imports to handle new data additions.

Requirements

Relational database (MariaDB)
Apache Hadoop (HDFS)
Apache Sqoop

Implementation Steps

Set up the relational database and create a sample table with historical data.
Configure Apache Hadoop and HDFS on your system.
Install and configure Apache Sqoop to connect to the relational database and HDFS.
Perform the initial full data import from the database to HDFS using Sqoop.
Implement a script or job to perform periodic incremental imports, capturing new data additions in the database.
Verify the data in HDFS and compare it with the source database.

Expected Outcome

Successful migration of historical data from the relational database to HDFS using Apache Sqoop.
Implemented periodic incremental data imports to keep the HDFS data up-to-date.
Validated the data integrity between the source database and the HDFS destination.

Task 2: Real-Time Data Ingestion with Apache Flume and Kafka

Task Overview

This task focuses on setting up a real-time data ingestion pipeline using Apache Flume and Apache Kafka. It involves collecting log data from a local directory and streaming it to Kafka for real-time processing.

Requirements

Apache Hadoop (HDFS)
Apache Flume
Apache Kafka
Python

Implementation Steps

Set up Apache Hadoop and HDFS on your system.
Write a Python Script to generate log file data.
Install and configure Apache Flume to collect log data from a local directory.
Set up Apache Kafka and create a Kafka topic to receive the log data.
Configure Flume to use Kafka as the destination for the collected log data.
Test the real-time data ingestion pipeline by generating sample log data in the local directory and verifying the data in the Kafka topic.
Explore options to consume the data from the Kafka topic for real-time processing or further downstream analysis.

Expected Outcome

Successful setup of the real-time data ingestion pipeline using Apache Flume and Apache Kafka.
Collected log data from a local directory and streamed it to a Kafka topic in real-time.
Demonstrated the ability to consume the data from the Kafka topic for real-time processing or further analysis.

Conclusion

This project showcases two distinct data ingestion approaches: batch data migration using Apache Sqoop and real-time data ingestion using Apache Flume and Apache Kafka. By completing these tasks, you will gain hands-on experience in setting up and managing data ingestion pipelines, which are essential for building robust and scalable data processing systems.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
SIC-MINI-PROJECT-1.pptx		SIC-MINI-PROJECT-1.pptx
make_logs.py		make_logs.py
sool_stream.py		sool_stream.py
spooldir.conf.txt		spooldir.conf.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Ingestion Pipelines

Task 1: Data Migration with Apache Sqoop

Task Overview

Requirements

Implementation Steps

Expected Outcome

Task 2: Real-Time Data Ingestion with Apache Flume and Kafka

Task Overview

Requirements

Implementation Steps

Expected Outcome

Conclusion

About

Uh oh!

Releases

Packages

Languages

Salma-Mamdoh/Data-ingestion-pipelines

Folders and files

Latest commit

History

Repository files navigation

Data Ingestion Pipelines

Task 1: Data Migration with Apache Sqoop

Task Overview

Requirements

Implementation Steps

Expected Outcome

Task 2: Real-Time Data Ingestion with Apache Flume and Kafka

Task Overview

Requirements

Implementation Steps

Expected Outcome

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages