Skip to content

Commit

Permalink
Hive metastore k8s support (#1)
Browse files Browse the repository at this point in the history
* Add hive metastore chart and documentation

* Add github workflow

* Rename folder and fix certificate

* build base hive metastore image

* Fix windows format

* Fix pipelines

* Change image default tag

* Rename sources doc folder

* Fix tag default value in doc
  • Loading branch information
Aakcht authored Jan 23, 2025
1 parent 9290559 commit c245cfc
Show file tree
Hide file tree
Showing 50 changed files with 5,803 additions and 2 deletions.
38 changes: 38 additions & 0 deletions .github/workflows/clean.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Branch Deleted
on: delete

env:
TAG_NAME: ${{ github.event.ref }}

jobs:
delete:
strategy:
fail-fast: false
matrix:
component:
- name: qubership-hive-metastore-transfer
- name: qubership-hive-metastore
if: github.event.ref_type == 'branch'
runs-on: ubuntu-latest
steps:
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${GITHUB_ACTOR}
password: ${{secrets.GITHUB_TOKEN}}
- name: Prepare Tag
run: echo "TAG_NAME=$(echo ${TAG_NAME} | sed 's@refs/heads/@@;s@/@_@g')" >> $GITHUB_ENV
- name: Get package IDs for delete
id: get-ids-for-delete
uses: Netcracker/get-package-ids@v0.0.1
with:
component-name: ${{ matrix.component.name }}
component-tag: ${{ env.TAG_NAME }}
access-token: ${{ secrets.GITHUB_TOKEN }}
- uses: actions/delete-package-versions@v5
with:
package-name: ${{ matrix.component.name }}
package-type: 'container'
package-version-ids: ${{ steps.get-ids-for-delete.outputs.ids-for-delete }}
if: ${{ steps.get-ids-for-delete.outputs.ids-for-delete != '' }}
63 changes: 63 additions & 0 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: Build Artifacts
on:
release:
types: [created]
push:
branches:
- '**'
env:
TAG_NAME: ${{ github.event.release.tag_name || github.ref }}

jobs:
multiplatform_build:
strategy:
fail-fast: false
matrix:
component:
- name: qubership-hive-metastore-transfer
file: docker-transfer/Dockerfile
context: ""
- name: qubership-hive-metastore
file: docker/Dockerfile
context: ""
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${GITHUB_ACTOR}
password: ${{secrets.GITHUB_TOKEN}}
- name: Prepare Tag
run: echo "TAG_NAME=$(echo ${TAG_NAME} | sed 's@refs/tags/@@;s@refs/heads/@@;s@/@_@g')" >> $GITHUB_ENV
- name: Get package IDs for delete
id: get-ids-for-delete
uses: Netcracker/get-package-ids@v0.0.1
with:
component-name: ${{ matrix.component.name }}
component-tag: ${{ env.TAG_NAME }}
access-token: ${{ secrets.GH_ACCESS_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5
with:
no-cache: true
context: ${{ matrix.component.context }}
file: ${{ matrix.component.file }}
platforms: linux/amd64,linux/arm64
push: true
tags: ghcr.io/netcracker/${{ matrix.component.name }}:${{ env.TAG_NAME }}
provenance: false
build-args: |
GH_ACCESS_TOKEN=${{ secrets.GH_ACCESS_TOKEN }}
- uses: actions/delete-package-versions@v5
with:
package-name: ${{ matrix.component.name }}
package-type: 'container'
package-version-ids: ${{ steps.get-ids-for-delete.outputs.ids-for-delete }}
if: ${{ steps.get-ids-for-delete.outputs.ids-for-delete != '' }}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.iml
.idea
73 changes: 73 additions & 0 deletions CODE-OF-CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Code of Conduct

This repository is governed by following code of conduct guidelines.

We put collaboration, trust, respect and transparency as core values for our community.
Our community welcomes participants from all over the world with different experience,
opinion and ideas to share.

We have adopted this code of conduct and require all contributors to agree with that to build a healthy,
safe and productive community for all.

The guideline is aimed to support a community where all people should feel safe to participate,
introduce new ideas and inspire others, regardless of:

* Age
* Gender
* Gender identity or expression
* Family status
* Marital status
* Ability
* Ethnicity
* Race
* Sex characteristics
* Sexual identity and orientation
* Education
* Native language
* Background
* Caste
* Religion
* Geographic location
* Socioeconomic status
* Personal appearance
* Any other dimension of diversity

## Our Standards

We are welcoming the following behavior:

* Be respectful for different ideas, opinions and points of view
* Be constructive and professional
* Use inclusive language
* Be collaborative and show the empathy
* Focus on the best results for the community

The following behavior is unacceptable:

* Violence, threats of violence, or inciting others to commit self-harm
* Personal attacks, trolling, intentionally spreading misinformation, insulting/derogatory comments
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Derogatory language
* Encouraging unacceptable behavior
* Other conduct which could reasonably be considered inappropriate in a professional community

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of the Code of Conduct
and are expected to take appropriate actions in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned
to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors
that they deem inappropriate, threatening, offensive, or harmful.

## Reporting

If you believe you’re experiencing unacceptable behavior that will not be tolerated as outlined above,
please report to `opensourcegroup@netcracker.com`. All complaints will be reviewed and investigated and will result in a response
that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality
with regard to the reporter of an incident.

Please also report if you observe a potentially dangerous situation, someone in distress, or violations of these guidelines,
even if the situation is not happening to you.
12 changes: 12 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Contribution Guide

We'd love to accept patches and contributions to this project.
Please, follow these guidelines to make the contribution process easy and effective for everyone involved.

## Contributor License Agreement

You must sign the [Contributor License Agreement](https://pages.netcracker.com/cla-main.html) in order to contribute.

## Code of Conduct

Please make sure to read and follow the [Code of Conduct](CODE-OF-CONDUCT.md).
3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Expand Down Expand Up @@ -198,4 +199,4 @@
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
limitations under the License.
56 changes: 55 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,55 @@
# qubership-hive-metastore
## Table of Contents

* [Overview](#overview)
* [Architecture](docs/public/architecture.md)
* [Installation Guide](docs/public/installation.md)

## Overview

Qubership Hive-Metastore is a comprehensive solution for deploying [Hive-Metastore](https://hive.apache.org/) in Kubernetes.

In Kubernetes Qubership Hive-Metastore uses Minio S3 storage to use data and PostgreSQL to store metadata.

The table below shows services in K8s that can potentially replaces services in Hadoop cluster.

**Note** [Apache Hive](https://github.com/apache/hive) contains two large parts, hive-metastore and hiveServer2. This project contains only hive-metastore and does not hiveServer2 libraries. Hive-metastore libraries can be found at https://repo1.maven.org/maven2/org/apache/hive/hive-standalone-metastore-server .

## Repository structure

* `chart` - helm charts for Qubership Hive-Metastore.
* `docker` - files for building Qubership Hive-Metastore docker image
* `docs` - Qubership Hive-Metastore documentation.

### How to debug and troubleshoot

After deploying to K8s, `log4j2-properties` configmap is created, where it is possible to change hive logging level.

#### Connecting to hive-metastore

It is possible to connect to deployed Qubership Hive-Metastore using [Trino](https://github.com/trinodb/trino) or [Spark](https://github.com/apache/spark). Example PySpark configuration can be found below

```python
from pyspark.sql import SparkSession
...
spark = SparkSession \
.builder.master("local") \
.appName("MyApp.com") \
.config("spark.sql.warehouse.dir", "s3a://hive/warehouse") \
.config("spark.sql.hive.metastore.version", "3.1.3") \
.config("spark.sql.hive.metastore.jars", "maven") \
.config("spark.hadoop.hive.metastore.uris", "thrift://hive_address") \
.config("spark.hadoop.hive.metastore.schema.verification", "false") \
.config("spark.hadoop.hive.metastore.schema.verification.record.version", "false") \
.config("spark.hadoop.hive.metastore.use.SSL", "false") \
.config('spark.hadoop.fs.s3.buckets.create.enabled', 'true') \
.config('spark.hadoop.fs.s3a.endpoint', 'https://s3.endpoint.address.com') \
.config('spark.hadoop.fs.s3a.access.key', 's3accesskey') \
.config('spark.hadoop.fs.s3a.secret.key', 's3secretkey') \
.config('spark.hadoop.fs.s3a.connection.ssl.enabled', 'false') \
.config('spark.hadoop.fs.s3a.impl', 'org.apache.hadoop.fs.s3a.S3AFileSystem') \
.config('spark.hadoop.fs.s3a.path.style.access', 'true') \
.config('spark.driver.extraJavaOptions', '-Dcom.amazonaws.sdk.disableCertChecking') \
.config('spark.executor.extraJavaOptions', '-Dcom.amazonaws.sdk.disableCertChecking') \
.enableHiveSupport() \
.getOrCreate()
```
15 changes: 15 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Security Reporting Process

Please, report any security issue to `opensourcegroup@netcracker.com` where the issue will be triaged appropriately.

If you know of a publicly disclosed security vulnerability please IMMEDIATELY email `opensourcegroup@netcracker.com`
to inform the team about the vulnerability, so we may start the patch, release, and communication process.

# Security Release Process

If the vulnerability is found in the latest stable release, then it would be fixed in patch version for that release.
E.g., issue is found in 2.5.0 release, then 2.5.1 version with a fix will be released.
By default, older versions will not have security releases.

If the issue doesn't affect any existing public releases, the fix for medium and high issues is performed
in a main branch before releasing a new version. For low priority issues the fix can be planned for future releases.
37 changes: 37 additions & 0 deletions chart/helm/hivemetastore/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Copyright 2024-2025 NetCracker Technology Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
38 changes: 38 additions & 0 deletions chart/helm/hivemetastore/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Copyright 2024-2025 NetCracker Technology Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v2
name: hive-metastore
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"
Loading

0 comments on commit c245cfc

Please sign in to comment.