Cloud Project: Compare performance of Hadoop vs Spark

Rest API to run SQL query on large set of data using Hadoop and Spark.

Requirements:

Python 3
Hadoop
Hadoop Streamer Path
HDFS CSV Files Path

Installation:

Create a virtualenv Cloud and activate it. If you do not have virtualenv installed, install it. (Installation: Windows, Linux & MAC OS)
In the project home folder, Install the necessary packages using the command pip install -r requirements .
Create config.py in cloudproject folder.

Add the following code to config.py

HADOOP_STREAMER_PATH = "<Path of the Hadoop Streamer JAR file>"
HDFS_CSV_PATH = "<Path of the CSV Files in HDFS>"

Execute the following command to apply migrations: python manage.py migrate.
To run the server in localhost with default port, execute the command python manage.py runserver.

Usage:

This REST API doesn't have any authentication. To access this API, send a POST request using any request managers like Postman with data having key query with value as the SQL query.

The output of the query contains two keys: mapreduce and spark. Each key contains the time it took to execute the query on the specified data and the result of the query.

You can learn more about APIs and how to use them here.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
app		app
cloudproject		cloudproject
files		files
mapreduce		mapreduce
spark		spark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Project: Compare performance of Hadoop vs Spark

Rest API to run SQL query on large set of data using Hadoop and Spark.

Requirements:

Installation:

Usage:

About

Releases

Packages

Contributors 3

Languages

License

vishnuys/cloudcomparer

Folders and files

Latest commit

History

Repository files navigation

Cloud Project: Compare performance of Hadoop vs Spark

Rest API to run SQL query on large set of data using Hadoop and Spark.

Requirements:

Installation:

Usage:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages