This is a simple pipeline that takes in forex data from a csv file, and sends it to a Kafka server. The Kafka server then sends it to a S3 bucket, where it is stored. A crawler is used to crawl the data in the S3 bucket, and a table is created in the Glue Data Catalog. Athena is used to query the data in the S3 bucket.
Would have used an API to get the data, but I didn't want to pay for it. As such I used a csv file instead.
Note:
- Each time you update start and stop the EC2 instance, you will need to change the IP address in the code
- You will have to go to
sudo nano config/server.properties
and change the ADVERTISED_LISTENERS to the EC2 instance's IP address
- Install Kafka on the EC2 instance (make sure you change the security settings)
- Open a new terminal, to run the Zookeeper server
cd kafka_2.12-3.5.1
bin/zookeeper-server-start.sh config/zookeeper.properties
- Open a new terminal to run the Kafka server
4. Allocate memory to the Kafka server
export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M"
5. Start the Kafka server
cd kafka_2.12-3.5.1 bin/kafka-server-start.sh config/server.properties
6. Create a topic in another terminal
bin/kafka-topics.sh --create --topic test1 --bootstrap-server <EC2_ip_address>:9092 --replication-factor 1 --partitions 1
7. Start a producer
bin/kafka-console-producer.sh --topic test1 --bootstrap-server <EC2_ip_address>:9092
8. New terminal, Start a consumer
bin/kafka-console-consumer.sh --topic test1 --bootstrap-server <EC2_ip_address>:9092