RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala. Impala supports jdbc integration and this feature is used by RImpala to establish a connection between R and Impala.
To use this package you must also have access to a Hadoop cluster running Cloudera Impala with at least one populated table defined in the Hive Metastore.
- Clone the repository
- The Impala JDBC zip file present in the repository is required by the client machine to connect to Impala Servers.
- Extract the contents of the zip file to a location of your choosing.
For example:
- On Linux, you might extract this to a location such as /opt/jars/.
- On Windows, you might extract this to a folder such as C:\Program Files\impala-jars.
- We will use this location in
rimpala.init()
- Extract the contents of the zip file to a location of your choosing.
For example:
- Extract the package installer by decompressing the contents of
RImpala-0.1.6.tar.gz
present insideinstall
directorytar -xvf install/RImpala_0.1.6.tar.gz
- Then Install the package using the following command:
R CMD INSTALL ./RImpala
- Find the ip of the machine and the port where the Impala service is running.
- Find the location where you have unziped the JDBC jars in the above section.
- Launch R
-
library("RImpala") rimpala.init(libs="/path/to/JDBC/jars/") result = rimpala.query("your query");
by default rimpala.init() searches "/usr/lib/impala" for the JDBC jars.
Here are links to more information on Cloudera Impala:
- Java (>= 1.5)
- R (>= 2.7.0)
- rJava (>= 0.5-0)
- Impala JDBC driver jars