Skip to content

Commit

Permalink
Merge pull request #11 from RedHatQuickCourses/kfp_section
Browse files Browse the repository at this point in the history
Kfp section
  • Loading branch information
kknoxrht authored Sep 6, 2024
2 parents c65403b + d60b340 commit affa341
Show file tree
Hide file tree
Showing 8 changed files with 115 additions and 193 deletions.
17 changes: 16 additions & 1 deletion modules/appendix/pages/appendix.adoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
= Appendix A

Content for Appendix A...
Additional training examples to understand the resources available in RHOAI.

== Fraud Detection workshop with Red Hat OpenShift AI

Thanks to the Red Hat Developer team for an excellent workshop. Parts of this workshop where used in the Elyra section of this course. If you would like the full experience, you can use the RHOAI environment from this course to complete the workshop.

https://rh-aiservices-bu.github.io/fraud-detection/fraud-detection-workshop/index.html[In this workshop, window=_blank], you learn how to incorporate data science and artificial intelligence and machine learning (AI/ML) into an OpenShift development workflow.

You will use an example fraud detection model to complete the following tasks:

. Explore a pre-trained fraud detection model by using a Jupyter notebook.

. Deploy the model by using OpenShift AI model serving.

. Refine and train the model by using automated pipelines.

4 changes: 2 additions & 2 deletions modules/chapter1/pages/dsp-concepts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ image::pipeline_dag_overview.gif[width=600]

. *Task* - is a self-contained pipeline component that represents an execution stage in the pipeline.

. *Artifact* - Steps have the ability to create artifacts, which are objects that can be persisted after the execution of the step completes. Other steps may use those artifacts as inputs and some artifacts may be useful references after a pipeline run has completed. Artifacts automatically stored by Data Science Pipelines in S3 compatible storage.
. *Artifact* - _Tasks_ have the ability to create artifacts, which are objects that can be persisted after the execution of the task completes. Other tasks may use those artifacts as inputs and some artifacts may be useful references after a pipeline run has completed. Artifacts automatically stored by Data Science Pipelines in S3 compatible storage.

. *Experiment* - is a logical grouping of runs for the purpose of comparing different pipelines

Expand Down Expand Up @@ -52,7 +52,7 @@ OpenShift AI uses Kubeflow pipelines with Argo workflows as the engine. Kubeflow

Pipelines can include various components, such as data ingestion, data preprocessing, model training, evaluation, and deployment. _These components can be configured to run in a specific order, and the pipeline can be executed multiple times to produce different versions of models or artifacts._

Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple RUN command, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully.
Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple `Run command`, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully.

In summary, data science pipelines are an essential tool for automating and managing the ML lifecycle, enabling data scientists to create end-to-end workflows, reduce human error, and ensure consistent, high-quality results.

Expand Down
15 changes: 9 additions & 6 deletions modules/chapter3/pages/elyra-pipelines.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ image::pipeline_runtime_config.gif[width=600]

If the pipeline server is deployed post workbench creation, the runtime configuration will not appear in Jupyter notebooks and there are two options to establish the pipeline.

. Make a change to the workbench by adding an environment variable
. Make a change to the workbench by adding an irrelevant environment variable (recommended)

. Manually create the runtime configuration

Expand All @@ -37,6 +37,7 @@ Refer to the https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html
In order to create Elyra pipelines with the visual pipeline editor:

* Launch JupyterLab with the Elyra extension installed.
** select a workbench / notebook image with Elyra installed
* Create a new pipeline by clicking on the Elyra `Pipeline Editor` icon.
* Add each node to the pipeline by dragging and dropping notebooks or scripts from the file browser onto the pipeline editor canvas.
* Connect the nodes to define the flow of execution.
Expand Down Expand Up @@ -92,14 +93,14 @@ image::elyra_pipeline_nodes.gif[width=600]
+
//image::pipeline-3.png[]
+
You should now see the two nodes connected through a solid line. We have now defined a simple pipeline with two tasks, which are executed sequentially, first experiment_train to product a model articfact, then save-model to move the model to workbench S3 storage.
You should now see the two nodes connected through a solid line. We have now defined a simple pipeline with two tasks, which are executed sequentially, first experiment_train to produce a model articfact, then save-model to move the model to workbench S3 storage.
+
[NOTE]
====
By visually defining pipeline tasks and connections, we can define _graphs_ spanning many nodes and interconnections. Elyra and Data Science Pipelines support the creation and execution of arbitrary _directed acyclic graphs_ (DAGs), i.e. graphs with a sequential order of nodes and without loops.
====

We have now created the final graph representation of the fraud detection pipeline using the two of five available notebooks. With this we have fully defined the full pipeline code and its order of execution.
We have now created the final graph representation of the fraud detection pipeline using the two of five available notebooks. With this we have fully defined the pipeline code and its order of execution.


==== Configuring the pipeline
Expand All @@ -122,7 +123,9 @@ image::experiment_node_config.gif[width=600]
+
NOTE: Do not select any of the nodes in the canvas when you open the panel. You will see the `PIPELINE PROPERTIES` tab only when none of the nodes are selected. Click anywhere on the canvas and then open the panel.

. Next we will configure the data connection to the `my-storage` bucket as a Kubernetes secret. By default these secrets are created in the environment variable of pipeline properties, but need to be located in the Kubernetes secrets to be used in the pipeline. Copy entries from the environment variables section; add these in the kubernetes secrets for save_model (node2) task in Elyra.
. Next we will configure the data connection to the `my-storage` bucket as a Kubernetes secret.
.. By default these secrets are created in the environment variable section in pipeline properties
.. They need to be located in the Kubernetes secrets section of pipeline properties.

. In the `PIPELINE PROPERTIES` section, click `Add` beneath the `Kubernetes Secrets` section and add the following five entries:
+
Expand All @@ -134,7 +137,7 @@ NOTE: Do not select any of the nodes in the canvas when you open the panel. You
* `AWS_DEFAULT_REGION`
--
+
Each parameter will include the following options:
Each Kubernetes Secret parameter will include the following options:
+
--
* `Environment Variable`: *the parameter name*
Expand All @@ -146,7 +149,7 @@ image::save_model_storage.gif[width=600]
+
[NOTE]
====
The AWS default region is another parameter in the data connection, which is used for AWS S3-based connections.
The AWS default region is another parameter in the data connection, which is used for AWS S3-based connections. My experience is that if this field is missing the pipeline will fail to connect regardless of the storage system used.
====

. Next we will configure the data to be passed between the nodes. Click on the `experiment_train` node. If you're still in the configuration menu, you should now see the `NODE PROPERTIES` tab. If not, right-click on the node and select `Open Properties`.
Expand Down
Binary file added modules/chapter4/images/create_pipeline_run.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added modules/chapter4/images/import_pipeline_yaml.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added modules/chapter4/images/pipeline_artifacts.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion modules/chapter4/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ The second mechanism, and the one discussed here is based on the *Kubeflow Pipel

While the Elyra extension offers an easy to use visual editor to compose pipelines, and is generally used for simple workflows, the Kubeflow Pipelines SDK (*kfp*) offers a flexible Python Domain Specific Language (DSL) API to create pipelines from Python code. This approach offers you flexibility in composing complex workflows and has the added benefit of offering all the Python tooling, frameworks and developer experience that comes with writing Python code.

OpenShift AI uses the *_Argo Wotkflows_* runtime to execute pipelines, which is why your Kubeflow pipeline containing Python code needs to be compiled into a yaml definition before it can be submitted to the runtime. Steps in the pipeline are executed as ephemeral pods (one per step).
OpenShift AI uses the *_Argo Wotkflows_* runtime to execute pipelines, which is why your Kubeflow pipeline containing Python code needs to be compiled into a compatible yaml definition before it can be submitted to the runtime. Tasks in the pipeline are executed as ephemeral pods (one per task).
Loading

0 comments on commit affa341

Please sign in to comment.