Skip to content

t-data-h/trino-on-k8s

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trino and Hive on Kubernetes

Kustomize manifests and supporting scripts for running TrinoDb and a Hive3 Metastore in Kubernetes using S3 object storage and MySQL (or Postgres).

Author: Timothy C. Arland
Email: tcarland@gmail.com


Prerequisites:

  • Kubernetes >= 1.23 - Suggested version: 1.25+
  • Kustomize >= v5 - Suggested version: v5.4.2

Configuring the Environment

The project depends on a number of environment variables for deploying the necessary configuration via the setup script. S3 Credentials are the primary variables required, with others having default values if not provided. The following table defines the list of variables used by the setup script.

Environment Variable Description Default Setting
S3_ENDPOINT The S3 endpoint url http(s)://minio.minio.svc
S3_ACCESS_KEY The S3 access key
S3_SECRET_KEY The S3 secret key
---------------- ------------------------- -------------------
TRINO_NAMESPACE Namespace for deploying the components trino
TRINO_DBUSER Name of the hive backend db user root
TRINO_DBPASSWORD Password for the backend root user randomized-password
---------------- ------------------------- -------------------
TRINO_USER Name of the admin Trino user trino
TRINO_PASSWORD Password for the trino admin user trinoadmin
TRINO_DOMAINNAME TLS Endpoint used in ingress manifests --

The environment path is supported by the setup script for adding additional catalog configs and support files such as kerberos keytabs for traditional Hadoop-Hive integration, Trino rules customization, and the password database, if applicable. Environments are contained in their own subdirectory to easily support an overlay technique to allow obtaining the assets from a secrets manager. As a result, the env path is masked from git to avoid committing any such secrets to the repository.

mkdir env/envname
cp env/env.template env/envname/name.env
mkdir env/envname/auth
mkdir env/envname/files

Building the Hive Metastore Image

The metastore image is based off of Hive version 3.1.3 and can be
built using the provided hive-metastore/resources/Containerfile.

$ cd hive-metastore/resources && docker build . project/hive:3.1.3

Setup / Configure the Working Directory.

Ensure all variables above are defined and exported to the environment. Passing an argument to the script will show the configuration only and can be used to verify the settings.

./bin/trino-k8s-setup.sh -e

Run the setup script to configure the various config templates.

source env/envname/name.env
./bin/trino-k8s-setup.sh <envname>

Deploy the Postgresql Server

Using Postgres for the metastore_db follows a slightly different path than MySQL. Rather than using the Hive schematool to initialize the db, a custom postgres container images is built in order to inject admin RBAC and the metastore DDL. Refer to the README.md for details on building the image. The hive-init-schema.yaml is still able to be used when adjusted for postgres, but the postgres image would still need roles applied.

Deploy the MySQL Server

MySQL used to be the default for the TDH platform, but recent directions have put Postgres on top. By making a few changes to the configs, the deployment can easily switch to using MySQL Server. Enable the hive-init-schema.yaml in the hive-metastore kustomization.yaml and deploy via Kustomize .

kustomize build mysql-server/ | kubectl apply -f -

The same Mysql image can be used as a client.

docker run -it --rm mysql mysql -hsome.mysql.host -usome-mysql-user -p

Deploy the Hive Metastore

We deploy the metastore in the same manner, using Kustomize.

kustomize build hive-metastore/ | kubectl apply -f -

TrinoDb

Verify the parameter substitution is correct in trino/base/trino-configmap.yaml as generated by the trino-k8s-setup.sh script.

Load the Trino manifests.

kustomize build trino/ | kubectl apply -f -

Trino will create mutual TLS connections internally between the Coordinator and the workers, as well as using a randomized PreShared Key to authenticate workers.

By virtue of running in K8s, Trino makes it easier to enable TLS and not have to configure keys, certifcates, and trust across containers, and supports using an ingress gateway to terminate TLS. This setup requires configuring Trino to use forwarded headers to validate that HTTPS was used and terminated by the controller. This setting is http-server.process-forwarded=true.

Ingress resources are provided for exposing TLS using either Istio or nginx as the ingress gateway. Refer to the Readme in the corresponding trino/resources directory.

Cleanup

The secrets needed for the components are written to **/base/secrets.env for kustomize to consume on build and should be cleaned up after deployment by running make clean.

Trino CLI

Trino CLI can be acquired here

trino-cli --server 172.17.0.210:8080 --user trino --password --catalog hive --schema default

Trino JDBC

The JDBC Driver can be acquired from the Maven Central Repository. The current deployment has been tested with trino-468.

LDAP

In addition to changing the password-authenticator.properties with the appropriate ldap settings, the truststore file must be added as a kustomize secret and the coordinator deployment must mount the trust store at the path defined below.

export LDAP_SERVER="ldaps://ldap-host.domain.com:689"
export LDAP_USER_BIND_PATTERN="\${USER}@ad.domain.com"
export LDAP_BIND_DN="ldapadmin@ad.domain.com"
export LDAP_BIND_PW="password"
export LDAP_USER_BASE_DN="ou=MyOrg,dc=ad,dc=domain,dc=com"
export LDAP_GROUP_AUTH="(&(objectClass=person)(sAMAccountName=\${USER}(memberOf=CN=TRINO_USERS_GROUPNAME,OU=DataOrgGroups,OU=DataOrg,DC=ad,DC=domain,DC=com))"
export LDAP_TRUSTSTORE_PASSWORD="changeit"

# adjust trino-configmap.yaml.template accordingly
#ldap.url=ldap://ldap-host.domain.com:389
#ldap.allow-insecure=true
ldap.url=ldaps://ldap-host.domain.com:686
ldap.user-bind-pattern=${LDAP_USER_BIND_PATTERN}
ldap.bind-dn=${LDAP_BIND_DN}
ldap.bind-password=${LDAP_BIND_PW}
ldap.user-base-dn=${LDAP_USER_BASE_DN}
ldap.group-auth-pattern=${LDAP_GROUP_AUTH}

Private CA signed TLS Certificates

For self-signed certificates, one can set a truststore just for LDAP in the authenticator properties.

ldap.ssl.truststore.path=/etc/trino/truststore.jks
ldap.ssl.truststore.password=${LDAP_TRUSTSTORE_PASSWORD}

Alternatively, it may be better to mount the truststore to the various deployments directly as the default java cacerts file. This is useful if, for example, the underlying S3 endpoint is secured with a private CA TLS certificate. Typically this involves mounting a JKS truststore to the hive-metastore and both the trino-coordinator and all workers.

Add the truststore secret to each kustomization.yaml

secretGenerator:
- name: hive-metastore-secrets
  envs:
  - secrets.env
- name: truststore
  file:
  - truststore.jks

And add the mounts to the deployments. This is a partial patch demonstrating the volume mount for hive.

  spec:
    template:
      spec:
        containers:
        - name: hive-metastore
          volumeMounts:
          - name: truststore-vol
            mountPath: /opt/java/openjdk/lib/security/cacerts
            subPath: truststore.jks
        volumes:
          - name: truststore-vol
            secret:
              secretName: truststore

For Trino, the same would apply to both the deployment manifest and the statefulset. Note that Java path should be verified from the trino image.

  spec:
    template:
      spec:
        containers:
        - name: trino
          volumeMounts:
          - name: truststore-vol
            mountPath: /usr/lib/jvm/temurin/jdk-23.0.1+11/lib/security/cacerts
            subPath: truststore.jks
        volumes:
          - name: truststore-vol
            secret:
              secretName: truststore