Merge pull request opendatahub-io#47 from spolti/sync

Sync
spolti · Nov 23, 2023 · c3d4fb8 · c3d4fb8
2 parents 21a8cb1 + f7abb3c
commit c3d4fb8
Show file tree

Hide file tree

Showing 24 changed files with 705 additions and 99 deletions.
diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml
@@ -0,0 +1,87 @@
+# For most projects, this workflow file will not need changing; you simply need
+# to commit it to your repository.
+#
+# You may wish to alter this file to override the set of languages analyzed,
+# or to provide custom queries or build logic.
+#
+# ******** NOTE ********
+# We have attempted to detect the languages in your repository. Please check
+# the `language` matrix defined below to confirm you have the correct set of
+# supported CodeQL languages.
+#
+name: "CodeQL"
+
+on:
+  push:
+    branches: ["main"]
+  pull_request:
+    # The branches below must be a subset of the branches above
+    branches: ["main"]
+  schedule:
+    - cron: '45 8 * * *'
+
+jobs:
+  analyze:
+    name: Analyze
+    # Runner size impacts CodeQL analysis time. To learn more, please see:
+    #   - https://gh.io/recommended-hardware-resources-for-running-codeql
+    #   - https://gh.io/supported-runners-and-hardware-resources
+    #   - https://gh.io/using-larger-runners
+    # Consider using larger runners for possible analysis time improvements.
+    runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
+    timeout-minutes: ${{ (matrix.language == 'swift' && 120) || 360 }}
+    permissions:
+      actions: read
+      contents: read
+      security-events: write
+
+    strategy:
+      fail-fast: false
+      matrix:
+        language: ["java-kotlin", "python"]
+        # CodeQL supports [ 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift' ]
+        # Use only 'java-kotlin' to analyze code written in Java, Kotlin or both
+        # Use only 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
+        # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
+
+    steps:
+    - name: Checkout repository
+      uses: actions/checkout@v3
+
+    - name: Set up Java 17
+      uses: actions/setup-java@v3
+      with:
+        java-version: '17'
+        distribution: 'temurin'
+
+    # Initializes the CodeQL tools for scanning.
+    - name: Initialize CodeQL
+      uses: github/codeql-action/init@v2
+      with:
+        languages: ${{ matrix.language }}
+        # If you wish to specify custom queries, you can do so here or in a config file.
+        # By default, queries listed here will override any specified in a config file.
+        # Prefix the list here with "+" to use these queries and those in the config file.
+
+        # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
+        # queries: security-extended,security-and-quality
+
+    # Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
+    # If this step fails, then you should remove it and run the build manually (see below)
+    - name: Autobuild
+      uses: github/codeql-action/autobuild@v2
+
+    # ℹ️ Command-line programs to run using the OS shell.
+    # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
+
+    #   If the Autobuild fails above, remove it and uncomment the following three lines.
+    #   modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
+
+    # - run: |
+    #     echo "Run, Build Application using script"
+    #     ./location_of_script_within_repo/buildscript.sh
+
+    - name: Perform CodeQL Analysis
+      uses: github/codeql-action/analyze@v2
+      with:
+        category: "/language:${{matrix.language}}"
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -3,6 +3,10 @@
 We'd love to accept your patches and contributions to this project. There are
 just a few small guidelines you need to follow.
 
+## Developer guide
+
+Check out the [developer guide](developer-guide.md) to learn about development practices for the project.
+
 ## Code reviews
 
 All submissions, including submissions by project members, require review. We

diff --git a/README.md b/README.md
@@ -1,50 +1,17 @@
+[![Build](https://github.com/kserve/modelmesh/actions/workflows/build.yml/badge.svg?branch=main)](https://github.com/kserve/modelmesh/actions/workflows/build.yml)
+
 # ModelMesh
 
 The ModelMesh framework is a mature, general-purpose model serving management/routing layer designed for high-scale, high-density and frequently-changing model use cases. It works with existing or custom-built model servers and acts as a distributed LRU cache for serving runtime models.
 
-See these [these charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf) for more information on supported features and design details.
-
 For full Kubernetes-based deployment and management of ModelMesh clusters and models, see the [ModelMesh Serving](https://github.com/kserve/modelmesh-serving) repo. This includes a separate controller and provides K8s custom resource based management of ServingRuntimes and InferenceServices along with common, abstracted handling of model repository storage and ready-to-use integrations with some existing OSS model servers.
 
-### Quick-Start
-
-1. Wrap your model-loading and invocation logic in this [model-runtime.proto](./src/main/proto/current/model-runtime.proto) gRPC service interface
-    - `runtimeStatus()` - called only during startup to obtain some basic configuration parameters from the runtime, such as version, capacity, model-loading timeout
-    - `loadModel()` - load the specified model into memory from backing storage, returning when complete
-    - `modelSize()` - determine size (mem usage) of previously-loaded model. If very fast, can be omitted and provided instead in the response from `loadModel`
-    - `unloadModel()` - unload previously loaded model, returning when complete
-    - Use a separate, arbitrary gRPC service interface for model inferencing requests. It can have any number of methods and they are assumed to be idempotent. See [predictor.proto](src/test/proto/predictor.proto) for a very simple example.
-    - The methods of your custom applier interface will be called only for already fully-loaded models.
-2. Build a grpc server docker container which exposes these interfaces on localhost port 8085 or via a mounted unix domain socket
-3. Extend the [Kustomize-based Kubernetes manifests](config) to use your docker image, and with appropriate mem and cpu resource allocations for your container
-4. Deploy to a Kubernetes cluster as a regular Service, which will expose [this grpc service interface](./src/main/proto/current/model-mesh.proto) via kube-dns (you do not implement this yourself), consume using grpc client of your choice from your upstream service components
-    - `registerModel()` and `unregisterModel()` for registering/removing models managed by the cluster
-    - Any custom inferencing interface methods to make a runtime invocation of previously-registered model, making sure to set a `mm-model-id` or `mm-vmodel-id` metadata header (or `-bin` suffix equivalents for UTF-8 ids)
-
-### Deployment and Upgrades
-
-Prerequisites:
-
--   An etcd cluster (shared or otherwise)
--   A Kubernetes namespace with the etcd cluster connection details configured as a secret key in [this json format](https://github.com/IBM/etcd-java/blob/master/etcd-json-schema.md)
-    -   Note that if provided, the `root_prefix` attribute _is_ used as a key prefix for all of the framework's use of etcd
-
-From an operational standpoint, ModelMesh behaves just like any other homogeneous clustered microservice. This means it can be deployed, scaled, migrated and upgraded as a regular Kubernetes deployment without any special coordination needed, and without any impact to live service usage.
-
-In particular the procedure for live upgrading either the framework container or service runtime container is the same: change the image version in the deployment config yaml and then update it `kubectl apply -f model-mesh-deploy.yaml`
+For more information on supported features and design details, see [these charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf).
 
-### Build
+## Get Started
 
-Sample build:
+To learn more about and get started with the ModelMesh framework, check out [the documentation](/docs).
 
-```bash
-GIT_COMMIT=$(git rev-parse HEAD)
-BUILD_ID=$(date '+%Y%m%d')-$(git rev-parse HEAD | cut -c -5)
-IMAGE_TAG_VERSION="dev"
-IMAGE_TAG=${IMAGE_TAG_VERSION}-$(git branch --show-current)_${BUILD_ID}
+## Developer guide
 
-docker build -t modelmesh:${IMAGE_TAG} \
-    --build-arg imageVersion=${IMAGE_TAG} \
-    --build-arg buildId=${BUILD_ID} \
-    --build-arg commitSha=${GIT_COMMIT} .
-```
+Use the [developer guide](developer-guide.md) to learn about development practices for the project.
diff --git a/developer-guide.md b/developer-guide.md
@@ -0,0 +1,220 @@
+# Developer Guide
+
+## Prerequisites
+
+You need [Java](https://openjdk.org/) and [Maven](https://maven.apache.org/guides/getting-started/maven-in-five-minutes.html#running-maven-tools)
+to build ModelMesh from source and [`etcd`](https://etcd.io/) to run the unit tests.
+To build your custom `modelmesh` container image and deploy it to a ModelMesh Serving installation on a Kubernetes cluster,
+you need the [`docker`](https://docs.docker.com/engine/reference/commandline/cli/) and
+[`kubectl`](https://kubectl.docs.kubernetes.io/references/kubectl/) CLIs. 
+On `macOS` you can install the required CLIs with [Homebrew](https://brew.sh/):
+
+- Java: `brew install java`
+- Maven: `brew install maven`
+- Etcd: `brew install etcd`
+- Docker: `brew install docker`
+- Kubectl: `brew install kubectl`
+
+## Generating sources
+
+The gRPC stubs like the `ModelMeshGrpc` class have to be generated by the gRPC proto compiler from
+the `.proto` source files under `src/main/proto`.
+The generated sources should be created in the target directory `target/generated-sources/protobuf/grpc-java`.
+
+To generate the sources run either of the following commands:
+
+```shell
+mvn package -DskipTests
+mvn install -DskipTests
+```
+
+## Project setup using an IDE
+
+If you are using an IDE like [IntelliJ IDEA](https://www.jetbrains.com/idea/) or [Eclipse](https://eclipseide.org/)
+to help with your code development you should set up source and target folders so that the IDE's compiler can find all
+the source code including the generated sources (after running `mvn install -DskipTests`).
+
+For IntelliJ this can be done by going to **File > Project Structure ... > Modules**:
+
+- **Source Folders**
+    - src/main/java
+    - src/main/proto
+    - target/generated-sources/protobuf/grpc-java (generated)
+    - target/generated-sources/protobuf/java (generated)
+- **Test Source Folders**
+    - src/test/java
+    - target/generated-test-sources/protobuf/grpc-java (generated)
+    - target/generated-test-sources/protobuf/java (generated)
+- **Resource Folders**
+    - src/main/resources
+- **Test Resource Folders**
+    - src/test/resources
+- **Excluded Folders**
+    - target
+
+You may also want to increase your Java Heap size to at least 1.5 GB.
+
+## Testing code changes
+
+**Note**, before running the test cases, make sure you have `etcd` installed (see #prerequisites):
+
+```Bash
+$ etcd --version
+
+etcd Version: 3.5.5
+Git SHA: 19002cfc6
+Go Version: go1.19.1
+Go OS/Arch: darwin/amd64
+```
+
+You can either run all test suites at once. You can use the `-q` flag to reduce noise:
+
+```Bash
+mvn test -q
+```
+
+Or you can run individual test cases:
+
+```Bash
+mvn test -Dtest=ModelMeshErrorPropagationTest
+mvn test -Dtest=SidecarModelMeshTest,ModelMeshFailureExpiryTest
+```
+
+It can be handy to use `grep` to reduce output noise:
+
+```Bash
+mvn test -Dtest=SidecarModelMeshTest,ModelMeshFailureExpiryTest | \
+  grep -E " Running |\[ERROR\]|Failures|SUCCESS|Skipp|Total time|Finished"
+
+[INFO] Running com.ibm.watson.modelmesh.ModelMeshFailureExpiryTest
+[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.257 s - in com.ibm.watson.modelmesh.ModelMeshFailureExpiryTest
+[INFO] Running com.ibm.watson.modelmesh.SidecarModelMeshTest
+[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.302 s - in com.ibm.watson.modelmesh.SidecarModelMeshTest
+[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
+[INFO] BUILD SUCCESS
+[INFO] Total time:  39.916 s
+[INFO] Finished at: 2022-11-01T14:33:33-07:00
+```
+
+## Building the container image
+
+After testing your code changes locally, it's time to build a new `modelmesh` container image. Replace the value of the
+`DOCKER_USER` environment variable to your DockerHub user ID and change the `IMAGE_TAG` to something meaningful.
+
+```bash
+export DOCKER_USER="<your-docker-userid>"
+export IMAGE_NAME="${DOCKER_USER}/modelmesh"
+export IMAGE_TAG="dev"
+export GIT_COMMIT=$(git rev-parse HEAD)
+export BUILD_ID=$(date '+%Y%m%d')-$(git rev-parse HEAD | cut -c -5)
+
+docker build -t ${IMAGE_NAME}:${IMAGE_TAG} \
+    --build-arg imageVersion=${IMAGE_TAG} \
+    --build-arg buildId=${BUILD_ID} \
+    --build-arg commitSha=${GIT_COMMIT} .
+
+docker push ${IMAGE_NAME}:${IMAGE_TAG}
+```
+
+## Updating the ModelMesh Serving deployment
+
+In order to test the code changes in an existing [ModelMesh Serving](https://github.com/kserve/modelmesh-serving) deployment,
+the newly built container image needs to be added to the `model-serving-config` ConfigMap.
+
+First, check if your ModelMesh Serving deployment already has an existing `model-serving-config` ConfigMap:
+
+```Shell
+kubectl get configmap
+
+NAME                            DATA   AGE
+kube-root-ca.crt                1      4d2h
+model-serving-config            1      4m14s
+model-serving-config-defaults   1      4d2h
+tc-config                       2      4d2h
+```
+
+If the ConfigMap list contains `model-serving-config`, save the contents of your existing configuration
+in a local temp file:
+
+```Bash
+mkdir -p temp
+kubectl get configmap model-serving-config -o yaml > temp/model-serving-config.yaml
+```
+
+And add the `modelMeshImage` property to the `config.yaml` string property:
+```YAML
+      modelMeshImage:
+        name: <your-docker-userid>/modelmesh
+        tag: dev
+```
+
+Replace the `<your-docker-userid>` placeholder with your Docker username/login.
+
+The complete ConfigMap YAML file might look like this:
+
+```YAML
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: model-serving-config
+  namespace: modelmesh-serving
+data:
+  config.yaml: |
+    podsPerRuntime: 1
+    restProxy:
+      enabled: true
+    scaleToZero:
+      enabled: false
+      gracePeriodSeconds: 5
+    modelMeshImage:
+      name: <your-docker-userid>/modelmesh
+      tag: dev
+```
+
+Apply the ConfigMap to your cluster:
+
+```Bash
+kubectl apply -f temp/model-serving-config.yaml
+```
+
+If you are comfortable using vi, you can forgo creating a temp file and edit the ConfigMap directly in the terminal:
+
+```Shell
+kubectl edit configmap model-serving-config
+```
+
+If you did not already have a `model-serving-config` ConfigMap on your cluster, you can create one like this:
+
+```shell
+# export DOCKER_USER="<your-docker-userid>"
+# export IMAGE_NAME="${DOCKER_USER}/modelmesh"
+# export IMAGE_TAG="dev"
+
+kubectl apply -f - <<EOF
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: model-serving-config
+data:
+  config.yaml: |
+    modelMeshImage:
+      name: ${IMAGE_NAME}
+      tag: ${IMAGE_TAG}
+EOF
+```
+
+The `modelmesh-controller` watches the ConfigMap and responds to updates by automatically restarting the serving runtime
+pods using the newly built `modelmesh` container image.
+
+You can check which container images are used by running the following command:
+
+```Shell
+kubectl get pods -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{"\t"}{range .spec.containers[*]}{.image}{", "}{end}{end}' | sort | column -ts $'\t' | sed 's/, *$//g'
+
+etcd-78ff7867d5-45svw                            quay.io/coreos/etcd:v3.5.4
+minio-6ddbfc9665-gtf7x                           kserve/modelmesh-minio-examples:latest
+modelmesh-controller-64f5c8d6d6-k6rzc            kserve/modelmesh-controller:latest
+modelmesh-serving-mlserver-1.x-84884c6849-s8dw6  kserve/rest-proxy:latest, seldonio/mlserver:1.3.2, kserve/modelmesh-runtime-adapter:latest, kserve/modelmesh:dev
+modelmesh-serving-mlserver-1.x-84884c6849-xpdw4  kserve/rest-proxy:latest, seldonio/mlserver:1.3.2, kserve/modelmesh-runtime-adapter:latest, kserve/modelmesh:dev
+```