- Python 3
- Hadoop
- Hadoop Streamer Path
- HDFS CSV Files Path
- Create a virtualenv
Cloud
and activate it. If you do not have virtualenv installed, install it. (Installation: Windows, Linux & MAC OS) - In the project home folder, Install the necessary packages using the command
pip install -r requirements
. - Create
config.py
incloudproject
folder. - Add the following code to
config.py
HADOOP_STREAMER_PATH = "<Path of the Hadoop Streamer JAR file>" HDFS_CSV_PATH = "<Path of the CSV Files in HDFS>"
- Execute the following command to apply migrations:
python manage.py migrate
. - To run the server in localhost with default port, execute the command
python manage.py runserver
.
This REST API doesn't have any authentication. To access this API, send a POST request using any request managers like Postman with data having key query
with value as the SQL query.
The output of the query contains two keys: mapreduce
and spark
. Each key contains the time it took to execute the query on the specified data and the result of the query.
You can learn more about APIs and how to use them here.