Kustomize manifests and supporting scripts for running TrinoDb and a Hive3 Metastore in Kubernetes using S3 object storage and MySQL (or Postgres).
Author: Timothy C. Arland
Email: tcarland@gmail.com
- Kubernetes >= 1.23 - Suggested version: 1.25+
- Kustomize >= v5 - Suggested version: v5.4.2
The project depends on a number of environment variables for deploying the necessary configuration via the setup script. S3 Credentials are the primary variables required, with others having default values if not provided. The following table defines the list of variables used by the setup script.
Environment Variable | Description | Default Setting |
---|---|---|
S3_ENDPOINT | The S3 endpoint url | http(s)://minio.minio.svc |
S3_ACCESS_KEY | The S3 access key | |
S3_SECRET_KEY | The S3 secret key | |
---------------- | ------------------------- | ------------------- |
TRINO_NAMESPACE | Namespace for deploying the components | trino |
TRINO_DBUSER | Name of the hive backend db user | root |
TRINO_DBPASSWORD | Password for the backend root user | randomized-password |
---------------- | ------------------------- | ------------------- |
TRINO_USER | Name of the admin Trino user | trino |
TRINO_PASSWORD | Password for the trino admin user | trinoadmin |
TRINO_DOMAINNAME | TLS Endpoint used in ingress manifests | -- |
The environment path is supported by the setup script for adding additional catalog configs and support files such as kerberos keytabs for traditional Hadoop-Hive integration, Trino rules customization, and the password database, if applicable. Environments are contained in their own subdirectory to easily support an overlay technique to allow obtaining the assets from a secrets manager. As a result, the env path is masked from git to avoid committing any such secrets to the repository.
mkdir env/envname
cp env/env.template env/envname/name.env
mkdir env/envname/auth
mkdir env/envname/files
The metastore image is based off of Hive version 3.1.3 and can be
built using the provided hive-metastore/resources/Containerfile.
$ cd hive-metastore/resources && docker build . project/hive:3.1.3
Ensure all variables above are defined and exported to the environment. Passing an argument to the script will show the configuration only and can be used to verify the settings.
./bin/trino-k8s-setup.sh -e
Run the setup script to configure the various config templates.
source env/envname/name.env
./bin/trino-k8s-setup.sh <envname>
Using Postgres for the metastore_db follows a slightly different path than MySQL. Rather than using the Hive schematool to initialize the db, a custom postgres container images is built in order to inject admin RBAC and the metastore DDL. Refer to the README.md for details on building the image. The hive-init-schema.yaml is still able to be used when adjusted for postgres, but the postgres image would still need roles applied.
MySQL used to be the default for the TDH platform, but recent directions have put Postgres on top. By making a few changes to the configs, the deployment can easily switch to using MySQL Server. Enable the hive-init-schema.yaml in the hive-metastore kustomization.yaml and deploy via Kustomize .
kustomize build mysql-server/ | kubectl apply -f -
The same Mysql image can be used as a client.
docker run -it --rm mysql mysql -hsome.mysql.host -usome-mysql-user -p
We deploy the metastore in the same manner, using Kustomize.
kustomize build hive-metastore/ | kubectl apply -f -
Verify the parameter substitution is correct in trino/base/trino-configmap.yaml as generated by the trino-k8s-setup.sh script.
Load the Trino manifests.
kustomize build trino/ | kubectl apply -f -
Trino will create mutual TLS connections internally between the Coordinator and the workers, as well as using a randomized PreShared Key to authenticate workers.
By virtue of running in K8s, Trino makes it easier to enable TLS and not have to
configure keys, certifcates, and trust across containers, and supports using an
ingress gateway to terminate TLS. This setup requires configuring Trino to use
forwarded headers to validate that HTTPS was used and terminated by the
controller. This setting is http-server.process-forwarded=true
.
Ingress resources are provided for exposing TLS using either Istio or nginx as the ingress gateway. Refer to the Readme in the corresponding trino/resources directory.
The secrets needed for the components are written to **/base/secrets.env for kustomize
to consume on build and should be cleaned up after deployment by running make clean
.
Trino CLI can be acquired here
trino-cli --server 172.17.0.210:8080 --user trino --password --catalog hive --schema default
The JDBC Driver can be acquired from the Maven Central Repository. The current deployment has been tested with trino-468.
In addition to changing the password-authenticator.properties with the appropriate ldap settings, the truststore file must be added as a kustomize secret and the coordinator deployment must mount the trust store at the path defined below.
export LDAP_SERVER="ldaps://ldap-host.domain.com:689"
export LDAP_USER_BIND_PATTERN="\${USER}@ad.domain.com"
export LDAP_BIND_DN="ldapadmin@ad.domain.com"
export LDAP_BIND_PW="password"
export LDAP_USER_BASE_DN="ou=MyOrg,dc=ad,dc=domain,dc=com"
export LDAP_GROUP_AUTH="(&(objectClass=person)(sAMAccountName=\${USER}(memberOf=CN=TRINO_USERS_GROUPNAME,OU=DataOrgGroups,OU=DataOrg,DC=ad,DC=domain,DC=com))"
export LDAP_TRUSTSTORE_PASSWORD="changeit"
# adjust trino-configmap.yaml.template accordingly
#ldap.url=ldap://ldap-host.domain.com:389
#ldap.allow-insecure=true
ldap.url=ldaps://ldap-host.domain.com:686
ldap.user-bind-pattern=${LDAP_USER_BIND_PATTERN}
ldap.bind-dn=${LDAP_BIND_DN}
ldap.bind-password=${LDAP_BIND_PW}
ldap.user-base-dn=${LDAP_USER_BASE_DN}
ldap.group-auth-pattern=${LDAP_GROUP_AUTH}
For self-signed certificates, one can set a truststore just for LDAP in the authenticator properties.
ldap.ssl.truststore.path=/etc/trino/truststore.jks
ldap.ssl.truststore.password=${LDAP_TRUSTSTORE_PASSWORD}
Alternatively, it may be better to mount the truststore to the various deployments directly as the default java cacerts file. This is useful if, for example, the underlying S3 endpoint is secured with a private CA TLS certificate. Typically this involves mounting a JKS truststore to the hive-metastore and both the trino-coordinator and all workers.
Add the truststore secret to each kustomization.yaml
secretGenerator:
- name: hive-metastore-secrets
envs:
- secrets.env
- name: truststore
file:
- truststore.jks
And add the mounts to the deployments. This is a partial patch demonstrating the volume mount for hive.
spec:
template:
spec:
containers:
- name: hive-metastore
volumeMounts:
- name: truststore-vol
mountPath: /opt/java/openjdk/lib/security/cacerts
subPath: truststore.jks
volumes:
- name: truststore-vol
secret:
secretName: truststore
For Trino, the same would apply to both the deployment manifest and the statefulset. Note that Java path should be verified from the trino image.
spec:
template:
spec:
containers:
- name: trino
volumeMounts:
- name: truststore-vol
mountPath: /usr/lib/jvm/temurin/jdk-23.0.1+11/lib/security/cacerts
subPath: truststore.jks
volumes:
- name: truststore-vol
secret:
secretName: truststore