Skip to content

Hive HTTP Proxy exposes a single POST endpoint to run Hive SQL DDL/DML.

License

Notifications You must be signed in to change notification settings

SESARLab/hive-http-proxy

Repository files navigation

Hive HTTP Proxy

Hive HTTP Proxy CI Docker Image Version (latest semver)

Hive HTTP Proxy exposes a single POST endpoint to run Hive SQL DDL/DML statements.
This application was primarily designed to run JMeter HTTP load testing against Apache Hive, in order to compare the results with Apache Druid, in the context of my experimental thesis on MIND Foods HUB Data Lake.

Abstract

Sadly, Apache Hive doesn't expose a set of REST API to interact with, similarly to other more recent platforms (like Apache Druid, or Presto).
This means that a client is forced to use Hive JDBC drivers or Hive Thrift drivers to perform TCP connections.
At the time of this writing, the only available REST API for Hive is WebHCat, a web application layer on top of HCatalog, which is a table and storage management layer for Hadoop.
While WebHCat is installed with Hive, starting with Hive release 0.11.0, its capabilities are quite limited;
a client can run a Hive query or set of commands via the http://hostname/templeton/v1/hive endpoint, but the response contains only the ID of a job that runs on background on Hadoop, and an optional callback property, that define a URL to be called upon job completion.
This makes it very hard to test Apache Hive with JMeter unless to configure the latter to connect to Hive via JDBC (a not so viable option due to the configuration difficulty).
To solve this problem I decided to write a simple HTTP layer on top of Javascript Hive driver to run Hive SQL statements via HTTP.

Usage

Install application dependencies and, assuming that you have an Apache Hive instance running on your local host, run the following command:

$ HIVE_HOST=localhost HIVE_PORT=10000 HIVE_USERNAME=user HIVE_PASSWORD=password npm run start

With:

  • HIVE_HOST is the hostname of the Apache Hive server (default to localhost)
  • HIVE_PORT is the network port of the Apache Hive server (defaults to 10000)
  • HIVE_USERNAME is the username used to connect (defaults to admin)
  • HIVE_PASSWORD is the related password

Hive HTTP Proxy runs on port 10001.

A statement can be executed by doing a POST HTTP call with a JSON palyoad, containing the required statement property:

$ curl -X POST -d '{ "statement": "SELECT 1" }' http://localhost:10001

Please note that Hive HTTP Proxy doesn't do any assumption on the statement content, nor does any validation on it; it simply checks for a valid JSON and executes the statement against the configured server.

Run in a Docker container

Hive HTTP Proxy can run as a dockerized application.

To run the container, just type:

$ docker run -e HIVE_HOST=localhost -e HIVE_PORT=10000 -e HIVE_USERNAME=user -e HIVE_PASSWORD=password -p 10001:10001 gabrieledarrigo/hive-http-proxy