Skip to content

Commit

Permalink
Merge pull request #10 from RedHatQuickCourses/updates_beta
Browse files Browse the repository at this point in the history
Updates beta
  • Loading branch information
kknoxrht authored Sep 5, 2024
2 parents 5e41b65 + ebd0fff commit c65403b
Show file tree
Hide file tree
Showing 24 changed files with 1,940 additions and 139 deletions.
Binary file added .DS_Store
Binary file not shown.
638 changes: 638 additions & 0 deletions downloads/fraud_detection.yaml

Large diffs are not rendered by default.

255 changes: 255 additions & 0 deletions downloads/get_data_train_upload.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
# PIPELINE DEFINITION
# Name: 7-get-data-train-upload
components:
comp-get-data:
executorLabel: exec-get-data
outputDefinitions:
artifacts:
data_output_path:
artifactType:
schemaTitle: system.Artifact
schemaVersion: 0.0.1
comp-train-model:
executorLabel: exec-train-model
inputDefinitions:
artifacts:
data_input_path:
artifactType:
schemaTitle: system.Artifact
schemaVersion: 0.0.1
outputDefinitions:
artifacts:
model_output_path:
artifactType:
schemaTitle: system.Artifact
schemaVersion: 0.0.1
comp-upload-model:
executorLabel: exec-upload-model
inputDefinitions:
artifacts:
input_model_path:
artifactType:
schemaTitle: system.Artifact
schemaVersion: 0.0.1
deploymentSpec:
executors:
exec-get-data:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- get_data
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.5.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef get_data(data_output_path: OutputPath()):\n import urllib.request\n\
\ print(\"starting download...\")\n url = \"https://raw.githubusercontent.com/rh-aiservices-bu/fraud-detection/main/data/card_transdata.csv\"\
\n urllib.request.urlretrieve(url, data_output_path)\n print(\"done\"\
)\n\n"
image: quay.io/modh/runtime-images:runtime-cuda-tensorflow-ubi9-python-3.9-2023b-20240301
exec-train-model:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- train_model
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.5.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' &&\
\ python3 -m pip install --quiet --no-warn-script-location 'tf2onnx' 'seaborn'\
\ && \"$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef train_model(data_input_path: InputPath(), model_output_path:\
\ OutputPath()):\n import numpy as np\n import pandas as pd\n from\
\ keras.models import Sequential\n from keras.layers import Dense, Dropout,\
\ BatchNormalization, Activation\n from sklearn.model_selection import\
\ train_test_split\n from sklearn.preprocessing import StandardScaler\n\
\ from sklearn.utils import class_weight\n import tf2onnx\n import\
\ onnx\n import pickle\n from pathlib import Path\n\n # Load the\
\ CSV data which we will use to train the model.\n # It contains the\
\ following fields:\n # distancefromhome - The distance from home where\
\ the transaction happened.\n # distancefromlast_transaction - The\
\ distance from last transaction happened.\n # ratiotomedianpurchaseprice\
\ - Ratio of purchased price compared to median purchase price.\n # \
\ repeat_retailer - If it's from a retailer that already has been purchased\
\ from before.\n # used_chip - If the (credit card) chip was used.\n\
\ # usedpinnumber - If the PIN number was used.\n # online_order\
\ - If it was an online order.\n # fraud - If the transaction is fraudulent.\n\
\ Data = pd.read_csv(data_input_path)\n\n # Set the input (X) and\
\ output (Y) data.\n # The only output data we have is if it's fraudulent\
\ or not, and all other fields go as inputs to the model.\n\n X = Data.drop(columns\
\ = ['repeat_retailer','distance_from_home', 'fraud'])\n y = Data['fraud']\n\
\n # Split the data into training and testing sets so we have something\
\ to test the trained model with.\n\n # X_train, X_test, y_train, y_test\
\ = train_test_split(X,y, test_size = 0.2, stratify = y)\n X_train, X_test,\
\ y_train, y_test = train_test_split(X,y, test_size = 0.2, shuffle = False)\n\
\n X_train, X_val, y_train, y_val = train_test_split(X_train,y_train,\
\ test_size = 0.2, stratify = y_train)\n\n # Scale the data to remove\
\ mean and have unit variance. This means that the data will be between\
\ -1 and 1, which makes it a lot easier for the model to learn than random\
\ potentially large values.\n # It is important to only fit the scaler\
\ to the training data, otherwise you are leaking information about the\
\ global distribution of variables (which is influenced by the test set)\
\ into the training set.\n\n scaler = StandardScaler()\n\n X_train\
\ = scaler.fit_transform(X_train.values)\n\n Path(\"artifact\").mkdir(parents=True,\
\ exist_ok=True)\n with open(\"artifact/test_data.pkl\", \"wb\") as handle:\n\
\ pickle.dump((X_test, y_test), handle)\n with open(\"artifact/scaler.pkl\"\
, \"wb\") as handle:\n pickle.dump(scaler, handle)\n\n # Since\
\ the dataset is unbalanced (it has many more non-fraud transactions than\
\ fraudulent ones), we set a class weight to weight the few fraudulent transactions\
\ higher than the many non-fraud transactions.\n\n class_weights = class_weight.compute_class_weight('balanced',classes\
\ = np.unique(y_train),y = y_train)\n class_weights = {i : class_weights[i]\
\ for i in range(len(class_weights))}\n\n\n # Build the model, the model\
\ we build here is a simple fully connected deep neural network, containing\
\ 3 hidden layers and one output layer.\n\n model = Sequential()\n \
\ model.add(Dense(32, activation = 'relu', input_dim = len(X.columns)))\n\
\ model.add(Dropout(0.2))\n model.add(Dense(32))\n model.add(BatchNormalization())\n\
\ model.add(Activation('relu'))\n model.add(Dropout(0.2))\n model.add(Dense(32))\n\
\ model.add(BatchNormalization())\n model.add(Activation('relu'))\n\
\ model.add(Dropout(0.2))\n model.add(Dense(1, activation = 'sigmoid'))\n\
\ model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])\n\
\ model.summary()\n\n\n # Train the model and get performance\n\n\
\ epochs = 2\n history = model.fit(X_train, y_train, epochs=epochs,\
\ \\\n validation_data=(scaler.transform(X_val.values),y_val),\
\ \\\n verbose = True, class_weight = class_weights)\n\
\n # Save the model as ONNX for easy use of ModelMesh\n\n model_proto,\
\ _ = tf2onnx.convert.from_keras(model)\n print(model_output_path)\n\
\ onnx.save(model_proto, model_output_path)\n\n"
image: quay.io/modh/runtime-images:runtime-cuda-tensorflow-ubi9-python-3.9-2023b-20240301
exec-upload-model:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- upload_model
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.5.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' &&\
\ python3 -m pip install --quiet --no-warn-script-location 'boto3' 'botocore'\
\ && \"$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef upload_model(input_model_path: InputPath()):\n import os\n\
\ import boto3\n import botocore\n\n aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')\n\
\ aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')\n \
\ endpoint_url = os.environ.get('AWS_S3_ENDPOINT')\n region_name =\
\ os.environ.get('AWS_DEFAULT_REGION')\n bucket_name = os.environ.get('AWS_S3_BUCKET')\n\
\n s3_key = os.environ.get(\"S3_KEY\")\n\n session = boto3.session.Session(aws_access_key_id=aws_access_key_id,\n\
\ aws_secret_access_key=aws_secret_access_key)\n\
\n s3_resource = session.resource(\n 's3',\n config=botocore.client.Config(signature_version='s3v4'),\n\
\ endpoint_url=endpoint_url,\n region_name=region_name)\n\n\
\ bucket = s3_resource.Bucket(bucket_name)\n\n print(f\"Uploading\
\ {s3_key}\")\n bucket.upload_file(input_model_path, s3_key)\n\n"
env:
- name: S3_KEY
value: models/fraud/1/model.onnx
image: quay.io/modh/runtime-images:runtime-cuda-tensorflow-ubi9-python-3.9-2023b-20240301
pipelineInfo:
name: 7-get-data-train-upload
root:
dag:
tasks:
get-data:
cachingOptions:
enableCache: true
componentRef:
name: comp-get-data
taskInfo:
name: get-data
train-model:
cachingOptions:
enableCache: true
componentRef:
name: comp-train-model
dependentTasks:
- get-data
inputs:
artifacts:
data_input_path:
taskOutputArtifact:
outputArtifactKey: data_output_path
producerTask: get-data
taskInfo:
name: train-model
upload-model:
cachingOptions:
enableCache: true
componentRef:
name: comp-upload-model
dependentTasks:
- train-model
inputs:
artifacts:
input_model_path:
taskOutputArtifact:
outputArtifactKey: model_output_path
producerTask: train-model
taskInfo:
name: upload-model
schemaVersion: 2.1.0
sdkVersion: kfp-2.5.0
---
platforms:
kubernetes:
deploymentSpec:
executors:
exec-upload-model:
secretAsEnv:
- keyToEnv:
- envVar: AWS_ACCESS_KEY_ID
secretKey: AWS_ACCESS_KEY_ID
- envVar: AWS_SECRET_ACCESS_KEY
secretKey: AWS_SECRET_ACCESS_KEY
- envVar: AWS_DEFAULT_REGION
secretKey: AWS_DEFAULT_REGION
- envVar: AWS_S3_BUCKET
secretKey: AWS_S3_BUCKET
- envVar: AWS_S3_ENDPOINT
secretKey: AWS_S3_ENDPOINT
secretName: aws-connection-my-storage
6 changes: 3 additions & 3 deletions modules/LABENV/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,17 @@ Red Hat OpenShift Operators automate the creation, configuration, and management

Included in Red Hat OpenShift is the Embedded OperatorHub, a registry of certified Operators from software vendors and open source projects. Within the Embedded OperatorHub you can browse and install a library of Operators that have been verified to work with Red Hat OpenShift and that have been packaged for easy lifecycle management.

== Lab: Installation of Red Hat OpenShift AI
== Lab Exercise: Installation of Red Hat OpenShift AI

This section will discuss the process for installing the dependent operators using the OpenShift Web Console.
This section will discuss the process for installing the dependent operators using the OpenShift Web Console. ( ~15 minutes )

IMPORTANT: The installation requires a user with the _cluster-admin_ role

This exercise uses the Red Hat Demo Platform; specifically the OpenShift Container Cluster Platform Resource. If you haven't already you'll need to launch the lab environment before continuing.

. Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned.

. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators.
. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. For this lab you can skip the installation of the optional operators.

[*] You do not have to wait for the previous Operator to complete before installing the next. For this lab you can skip the installation of the optional operators as there is no accelerator required.
// Should this be a note?
Expand Down
4 changes: 2 additions & 2 deletions modules/LABENV/pages/minio-install.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ image::minio_setup.gif[width=600]

From the OCP Dashboard:

. Select Networking / Routes from the navigation menu.
. Select Networking / Routes from the navigation menu.

. This will display two routes, one for the UI & another for the API. (if the routes are not visible, make sure you have the project selected that matches your data sicence project created earlier)

Expand All @@ -223,7 +223,7 @@ Once logged into the MinIO Console:

.. *pipelines*

.. *storage*
.. *my-storage*

.. *models* (optional)

Expand Down
2 changes: 1 addition & 1 deletion modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

Data science pipelines can be a game-changer for AI model development. By breaking down complex tasks into smaller, manageable steps, we can optimize each part of the process, ensuring that our models are trained and validated. Additionally, pipelines can help us maintain consistent results by versioning inputs and outputs, allowing us to track changes and identify potential issues.

This course is tailored for infrastructure solution architects and engineers who are tasked with deploying and managing data science pipelines on the OpenShift AI platform. By the end of this course, learners will have a solid understanding of how to deploy and support data scientists who will use RHOAI to design, build, and maintain efficient and effective data science pipelines in an OpenShift AI environment.
This course is tailored for infrastructure solution architects and engineers who are tasked with deploying and managing data science pipelines on the OpenShift AI platform. By the end of this course, learners will have a solid understanding of how to deploy resources and support data scientists who will use RHOAI to design, build, and maintain efficient and effective data science pipelines in an OpenShift AI environment.

Let's explore how pipelines can help us optimize training tasks, manage caching steps, and create more maintainable and reusable workloads.

Expand Down
4 changes: 2 additions & 2 deletions modules/chapter1/pages/dsp-concepts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@ Data science pipelines may consists of several key activities that are performed

A single pipeline may include the ability to train multiple models, complete complex hyperparameter searches, or more. Data Scientists can use a well crafted pipeline to quickly iterate on a model, adjust how data is transformed, test different algorithms, and more. While the steps described above describe a common pattern for model training, different use cases and projects may have vastly different requirements and the tools and framework selected for creating a data science pipeline should help to enable a flexible design.

=== Technical Knowledge
=== RHOAI Data Science Pipeline Engine

OpenShift AI uses Kubeflow pipelines with Argo workflows as the engine. Kubeflow provides a rich set of tools for managing ML workloads, while Argo workflows offer powerful automation capabilities. Together, they enable us to create robust, scalable, and manageable pipelines for AI model development and serving.

Pipelines can include various components, such as data ingestion, data preprocessing, model training, evaluation, and deployment. These components can be configured to run in a specific order, and the pipeline can be executed multiple times to produce different versions of models or artifacts.
Pipelines can include various components, such as data ingestion, data preprocessing, model training, evaluation, and deployment. _These components can be configured to run in a specific order, and the pipeline can be executed multiple times to produce different versions of models or artifacts._

Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple RUN command, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully.

Expand Down
Loading

0 comments on commit c65403b

Please sign in to comment.