Merge pull request #11 from RedHatQuickCourses/kfp_section

Kfp section
RedHatQuickCourses · Sep 6, 2024 · affa341 · affa341
2 parents c65403b + d60b340
commit affa341
Show file tree

Hide file tree

Showing 8 changed files with 115 additions and 193 deletions.
diff --git a/modules/appendix/pages/appendix.adoc b/modules/appendix/pages/appendix.adoc
@@ -1,3 +1,18 @@
 = Appendix A
 
-Content for Appendix A...
+Additional training examples to understand the resources available in RHOAI. 
+
+== Fraud Detection workshop with Red Hat OpenShift AI
+
+Thanks to the Red Hat Developer team for an excellent workshop.  Parts of this workshop where used in the Elyra section of this course. If you would like the full experience, you can use the RHOAI environment from this course to complete the workshop.  
+
+https://rh-aiservices-bu.github.io/fraud-detection/fraud-detection-workshop/index.html[In this workshop, window=_blank], you learn how to incorporate data science and artificial intelligence and machine learning (AI/ML) into an OpenShift development workflow.
+
+You will use an example fraud detection model to complete the following tasks:
+
+ . Explore a pre-trained fraud detection model by using a Jupyter notebook.
+
+ . Deploy the model by using OpenShift AI model serving.
+
+ . Refine and train the model by using automated pipelines.
+
diff --git a/modules/chapter1/pages/dsp-concepts.adoc b/modules/chapter1/pages/dsp-concepts.adoc
@@ -17,7 +17,7 @@ image::pipeline_dag_overview.gif[width=600]
 
  . *Task* - is a self-contained pipeline component that represents an execution stage in the pipeline.
 
- . *Artifact* - Steps have the ability to create artifacts, which are objects that can be persisted after the execution of the step completes. Other steps may use those artifacts as inputs and some artifacts may be useful references after a pipeline run has completed. Artifacts automatically stored by Data Science Pipelines in S3 compatible storage.
+ . *Artifact* - _Tasks_ have the ability to create artifacts, which are objects that can be persisted after the execution of the task completes. Other tasks may use those artifacts as inputs and some artifacts may be useful references after a pipeline run has completed. Artifacts automatically stored by Data Science Pipelines in S3 compatible storage.
 
  . *Experiment* - is a logical grouping of runs for the purpose of comparing different pipelines
 
@@ -52,7 +52,7 @@ OpenShift AI uses Kubeflow pipelines with Argo workflows as the engine. Kubeflow
 
 Pipelines can include various components, such as data ingestion, data preprocessing, model training, evaluation, and deployment. _These components can be configured to run in a specific order, and the pipeline can be executed multiple times to produce different versions of models or artifacts._
 
-Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple RUN command, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully.
+Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple `Run command`, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully.
 
 In summary, data science pipelines are an essential tool for automating and managing the ML lifecycle, enabling data scientists to create end-to-end workflows, reduce human error, and ensure consistent, high-quality results. 
 

diff --git a/modules/chapter3/pages/elyra-pipelines.adoc b/modules/chapter3/pages/elyra-pipelines.adoc
@@ -14,7 +14,7 @@ image::pipeline_runtime_config.gif[width=600]
 
 If the pipeline server is deployed post workbench creation, the runtime configuration will not appear in Jupyter notebooks and there are two options to establish the pipeline. 
 
-  . Make a change to the workbench by adding an environment variable
+  . Make a change to the workbench by adding an irrelevant environment variable (recommended)
 
   . Manually create the runtime configuration
 
@@ -37,6 +37,7 @@ Refer to the https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html
 In order to create Elyra pipelines with the visual pipeline editor:
 
 * Launch JupyterLab with the Elyra extension installed.
+** select a workbench / notebook image with Elyra installed
 * Create a new pipeline by clicking on the Elyra `Pipeline Editor` icon.
 * Add each node to the pipeline by dragging and dropping notebooks or scripts from the file browser onto the pipeline editor canvas.
 * Connect the nodes to define the flow of execution.
@@ -92,14 +93,14 @@ image::elyra_pipeline_nodes.gif[width=600]
 +
 //image::pipeline-3.png[]
 +
-You should now see the two nodes connected through a solid line. We have now defined a simple pipeline with two tasks, which are executed sequentially, first experiment_train to product a model articfact, then save-model to move the model to workbench S3 storage.
+You should now see the two nodes connected through a solid line. We have now defined a simple pipeline with two tasks, which are executed sequentially, first experiment_train to produce a model articfact, then save-model to move the model to workbench S3 storage.
 +
 [NOTE]
 ====
 By visually defining pipeline tasks and connections, we can define _graphs_ spanning many nodes and interconnections. Elyra and Data Science Pipelines support the creation and execution of arbitrary _directed acyclic graphs_ (DAGs), i.e. graphs with a sequential order of nodes and without loops.
 ====
 
-We have now created the final graph representation of the fraud detection pipeline using the two of five available notebooks. With this we have fully defined the full pipeline code and its order of execution. 
+We have now created the final graph representation of the fraud detection pipeline using the two of five available notebooks. With this we have fully defined the pipeline code and its order of execution. 
 
 
 ==== Configuring the pipeline
@@ -122,7 +123,9 @@ image::experiment_node_config.gif[width=600]
 +
 NOTE: Do not select any of the nodes in the canvas when you open the panel. You will see the `PIPELINE PROPERTIES` tab only when none of the nodes are selected. Click anywhere on the canvas and then open the panel.
 
-. Next we will configure the data connection to the `my-storage` bucket as a Kubernetes secret.  By default these secrets are created in the environment variable of pipeline properties, but need to be located in the Kubernetes secrets to be used in the pipeline. Copy entries from the environment variables section; add these in the kubernetes secrets for save_model (node2) task in Elyra.  
+. Next we will configure the data connection to the `my-storage` bucket as a Kubernetes secret.  
+.. By default these secrets are created in the environment variable section in pipeline properties 
+.. They need to be located in the Kubernetes secrets section of pipeline properties.   
 
 . In the `PIPELINE PROPERTIES` section, click `Add` beneath the `Kubernetes Secrets` section and add the following five entries:
 +
@@ -134,7 +137,7 @@ NOTE: Do not select any of the nodes in the canvas when you open the panel. You
 * `AWS_DEFAULT_REGION`
 --
 + 
-Each parameter will include the following options:
+Each Kubernetes Secret parameter will include the following options:
 +
 --
 * `Environment Variable`: *the parameter name*
@@ -146,7 +149,7 @@ image::save_model_storage.gif[width=600]
 +
 [NOTE]
 ====
-The AWS default region is another parameter in the data connection, which is used for AWS S3-based connections. 
+The AWS default region is another parameter in the data connection, which is used for AWS S3-based connections. My experience is that if this field is missing the pipeline will fail to connect regardless of the storage system used.
 ====
 
 . Next we will configure the data to be passed between the nodes. Click on the `experiment_train` node. If you're still in the configuration menu, you should now see the `NODE PROPERTIES` tab. If not, right-click on the node and select `Open Properties`.

diff --git a/modules/chapter4/images/create_pipeline_run.gif b/modules/chapter4/images/create_pipeline_run.gif
diff --git a/modules/chapter4/images/import_pipeline_yaml.gif b/modules/chapter4/images/import_pipeline_yaml.gif
diff --git a/modules/chapter4/images/pipeline_artifacts.gif b/modules/chapter4/images/pipeline_artifacts.gif
diff --git a/modules/chapter4/pages/index.adoc b/modules/chapter4/pages/index.adoc
@@ -8,4 +8,4 @@ The second mechanism, and the one discussed here is based on the *Kubeflow Pipel
 
 While the Elyra extension offers an easy to use visual editor to compose pipelines, and is generally used for simple workflows, the Kubeflow Pipelines SDK (*kfp*) offers a flexible Python Domain Specific Language (DSL) API to create pipelines from Python code. This approach offers you flexibility in composing complex workflows and has the added benefit of offering all the Python tooling, frameworks and developer experience that comes with writing Python code.
 
-OpenShift AI uses the *_Argo Wotkflows_* runtime to execute pipelines, which is why your Kubeflow pipeline containing Python code needs to be compiled into a yaml definition before it can be submitted to the runtime. Steps in the pipeline are executed as ephemeral pods (one per step).
+OpenShift AI uses the *_Argo Wotkflows_* runtime to execute pipelines, which is why your Kubeflow pipeline containing Python code needs to be compiled into a compatible yaml definition before it can be submitted to the runtime. Tasks in the pipeline are executed as ephemeral pods (one per task).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -8,4 +8,4 @@ The second mechanism, and the one discussed here is based on the *Kubeflow Pipel

		While the Elyra extension offers an easy to use visual editor to compose pipelines, and is generally used for simple workflows, the Kubeflow Pipelines SDK (kfp) offers a flexible Python Domain Specific Language (DSL) API to create pipelines from Python code. This approach offers you flexibility in composing complex workflows and has the added benefit of offering all the Python tooling, frameworks and developer experience that comes with writing Python code.

		OpenShift AI uses the _Argo Wotkflows_ runtime to execute pipelines, which is why your Kubeflow pipeline containing Python code needs to be compiled into a yaml definition before it can be submitted to the runtime. Steps in the pipeline are executed as ephemeral pods (one per step).
		OpenShift AI uses the _Argo Wotkflows_ runtime to execute pipelines, which is why your Kubeflow pipeline containing Python code needs to be compiled into a compatible yaml definition before it can be submitted to the runtime. Tasks in the pipeline are executed as ephemeral pods (one per task).