GitHub - Thomas-George-T/Twitter-streaming-using-Flume-and-Hive: Streaming / Ingesting tweets using Flume into a hive data lake.

Description

This project streams/ingest Twitter feed using Flume. The tweets are stored in a Hive data lake using Avro format. This data can be cleansed using tools like OpenRefine, Pig etc. The cleansed data can then be used for visualization.

Components

twitter.conf is used to store all the configurations required for ingesting tweets
TwitterDataAvroSchema.avsc contains the avro schema.
avrodataread.q is used to create a staging table with avro Serde.
create_tweets_avro_table.q is used to create a processing table with well defined DDL's.

Prerequisites

To run this software you need the following:

Linux
Hadoop 2.0
Hive 2.0
Flume
Twitter Developer App Credentials

Steps

Get credentials for developing twitter apps.
Write a twitter.conf file and replace the variables with your secret keys given by twitter.

Execute twitter.conf in the terminal

flume-ng agent -n TwitterAgent -f $FLUME_CONF_DIR/twitter.conf

Get the schema from the avro log file

hdfs dfs -cat /user/flume/tweets/FlumeData.* | head

Copy and then save the schema in a file called TwitterDataAvroSchema.avsc
Edit the file for readability.
Write a hql file called avrodataread.q to create table tweets using the AvroSerDe, mention the avro schema file in the tblproperties.

Execute the file in terminal

hive -f FlumeHiveTwitterApp/Hive scripts/avrodataread.q

To create a table for processing or for visualization, use the file named create_tweets_avro_table.q and execute it.
```
hive -f FlumeHiveTwitterApp/Hive scripts/create_tweets_avro_table.q
```
Clean using tools like pig, OpenRefine etc.
Visualize the data into a dashboard using tools like tablaeu, d3.js etc.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Hive scripts		Hive scripts
LICENSE.md		LICENSE.md
README.md		README.md
twitter.conf		twitter.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Components

Prerequisites

Steps

About

Releases

Packages

Languages

License

Thomas-George-T/Twitter-streaming-using-Flume-and-Hive

Folders and files

Latest commit

History

Repository files navigation

Description

Components

Prerequisites

Steps

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages