Experimental set-up to add dependencies onto spark-custom
Docker images.
Builds for both Debian and Alpine.
This adds the following:
- AWS Hadoop SDK JAR
- Appends
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
intospark-defaults.conf
- Appends
- Google Cloud Storage SDK JAR
- MariaDB JDBC Connector JAR
Additionally, all Alpine builds have gcompat
and libc6-compat
installed to
prevent glibc
shared library related issues.
The version of AWS Java SDK is dependent on the Hadoop version. An example of how to derive this version number for Hadoop 3.1.0 is here:
https://github.com/apache/hadoop/blob/release-3.1.0-RC0/hadoop-project/pom.xml#L137
For Linux user, you can download Tera CLI v0.4 at
https://github.com/guangie88/tera-cli/releases and place it in PATH
.
Otherwise, you will need cargo
, which can be installed via
rustup.
Once cargo
is installed, simply run cargo install tera-cli --version=^0.4.0
.