Skip to content

Commit

Permalink
spelling and grammer edits
Browse files Browse the repository at this point in the history
  • Loading branch information
kknoxrht committed Sep 8, 2024
1 parent 33b5e1d commit d88c328
Show file tree
Hide file tree
Showing 8 changed files with 26 additions and 109 deletions.
72 changes: 0 additions & 72 deletions modules/LABENV/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -191,77 +191,5 @@ Navigate to & select the Data Science Projects section.
. Select Create.






Once complete, you should be on the landing page of the "fraud-detection" Data Science Project section of the OpenShift AI Console / Dashboard.



//image::create_workbench.png[width=640]

// . Select the WorkBench button, then click create workbench

// .. Name: `fraud-detection`

// .. Notebook Image: `standard data science`

// .. Leave the remaining options default.

// .. Optionally, scroll to the bottom, check the `Use data connection box`.

// .. Select *storage* from the dropdown to attach the storage bucket to the workbench.

// . Select the Create Workbench option.

//[NOTE]
// Depending on the notebook image selected, it can take between 2-20 minutes for the container image to be fully deployed. The Open Link will be available when our container is fully deployed.



//== Jupyter Notebooks

// video::llm_jupyter_v3.mp4[width=640]

//== Open JupyterLab

//JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. For a demonstration of JupyterLab and its features, https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html#what-will-happen-to-the-classic-notebook[you can view this video.]


//Return to the fraud detection workbench dashboard in the OpenShift AI console.

// . Select the *Open* link to the right of the status section.

//image::oai_open_jupyter.png[width=640]

// . When the new window opens, use the OpenShift admin user & password to login to JupyterLab.

// . Click the *Allow selected permissions* button to complete login to the notebook.


//[NOTE]
//If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image we opted to choose.


//== Inside JupyterLab

//This takes us to the JupyterLab screen where we can select multiple options / tools / to work to begin our data science experimentation.

//Our first action is to clone a git repository that contains a collection of LLM projects including the notebook we are going to use to interact with the LLM.

//Clone the github repository to interact with the Ollama Framework from this location:
//https://github.com/rh-aiservices-bu/llm-on-openshift.git

// . Copy the URL link above

// . Click on the Clone a Repo Icon above explorer section window.

//image::clone_a_repo.png[width=640]

// . Paste the link into the *clone a repo* pop up, make sure the *included submodules are checked*, then click the clone.


//image::navigate_ollama_notebook.png[width=640]

// . Explore the notebook, and then continue.
4 changes: 2 additions & 2 deletions modules/LABENV/pages/minio-install.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ From the OCP Dashboard:

. Select Networking / Routes from the navigation menu.

. This will display two routes, one for the UI & another for the API. (if the routes are not visible, make sure you have the project selected that matches your data sicence project created earlier)
. This will display two routes, one for the UI & another for the API. (if the routes are not visible, make sure you have the project selected that matches your data science project created earlier)


. For the first step, select the UI route and paste it or open in a new browser tab or window.
Expand All @@ -228,4 +228,4 @@ Once logged into the MinIO Console:
.. *models* (optional)


This completes the pre-work to configure the data scicence pipeline lab environment. With our S3 Compatible storage ready to go, let's head to next section of the course and learn more about DSP concepts.
This completes the pre-work to configure the data science pipeline lab environment. With our S3 Compatible storage ready to go, let's head to the next section of the course and learn more about DSP concepts.
4 changes: 2 additions & 2 deletions modules/chapter1/pages/dsp-concepts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ image::pipeline_dag_overview.gif[width=600]

A data science pipeline is typically implemented to improve the repeatability of a data science experiment. While the larger experimentation process may include steps such as data exploration, where data scientists seek to create a fundamental understanding of the characteristics of the data, data science pipelines tend to focus on turning a viable experiment into a repeatable solution that can be iterated on.

A data science pipeline, may also fit within the context of a larger pipeline that manages the complete lifecycle of an application, and the data science pipeline is responsible for the process of training the machine learning model.
A data science pipeline may also fit within the context of a larger pipeline that manages the complete lifecycle of an application, and the data science pipeline is responsible for the process of training the machine learning model.

Data science pipelines may consists of several key activities that are performed in a structured sequence to train a machine learning model. These activities may include:
Data science pipelines may consist of several key activities that are performed in a structured sequence to train a machine learning model. These activities may include:

* *Data Collection*: Gathering the data from various sources, such as databases, APIs, spreadsheets, or external datasets.

Expand Down
4 changes: 2 additions & 2 deletions modules/chapter1/pages/dsp-intro.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ Enabling data scientists and data engineers manage the complexity of the end-to-

. *Version control and documentation:* You can use version control systems to track changes in your pipeline's code and configuration, ensuring that you can roll back to previous versions if needed. A well-structured pipeline encourages better documentation of each step.

=== Machine learning lifecycles & DevOps
=== Machine learning life cycles & DevOps

Machine learning lifecycles can vary in complexity and may involve additional steps depending on the use case, such as hyperparameter optimization, cross-validation, and feature selection. The goal of a machine learning pipeline is to automate and standardize these processes, making it easier to develop and maintain ML models for various applications.
Machine learning life cycles can vary in complexity and may involve additional steps depending on the use case, such as hyperparameter optimization, cross-validation, and feature selection. The goal of a machine learning pipeline is to automate and standardize these processes, making it easier to develop and maintain ML models for various applications.

Machine learning pipelines started to be integrated with DevOps practices to enable continuous integration and deployment (CI/CD) of machine learning models. This integration emphasized the need for reproducibility, version control and monitoring in ML pipelines. This integration is referred to as machine learning operations, or *MLOps*, which helps data science teams effectively manage the complexity of managing ML orchestration. In a real-time deployment, the pipeline replies to a request within milliseconds of the request.

Expand Down
13 changes: 3 additions & 10 deletions modules/chapter2/pages/data-science-pipeline-app.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ This multi-tenancy capability does require that each user or group needs their o

While a *DataSciencePipelineApplication* is a namespace scoped object, workbenches and pods running in other namespaces can still interact with the pipeline instance if they have the correct permissions.

Only one dpsa deployment can exist per data science project. (nampespace)
Only one dpsa deployment can exist per data science project. (namespace)

== Lab Exercise: Create a Data Science Pipeline Instance

Expand Down Expand Up @@ -75,26 +75,19 @@ image::pipeline_server_setup.gif[width=600]
=== Create a Data Science Pipeline Application

. A new Data connection should now be listed in the `Data connections` section.
//+
//image::create-dspa-verify-data-connection.png[]

. Switch to the pipelines tab in the data science project.

. Click on the `Configure pipeline server` in the `Pipelines` section of the Data Science Project view.
//+
//image::create-dspa-create-pipeline-server.png[]

. Click the key icon in the right side of the `Access Key` field, and select the `pipelines` data connection. The fields in the form are automatically populated.
//+
//image::create-dspa-configure-pipeline-server.png[]

. Select the option to use the default database stored in the cluster

.. There is an option to specifiy the details of an external database required for the datasciencepipelineapplication.
.. There is an option to specify the details of an external database required for the datasciencepipelineapplication.

. Click `Configure pipeline server`. After several seconds, the loading icon should complete and the `Pipelines` section will now show an option to `Import pipeline`.
//+
//image::create-dspa-verify-pipeline-server.png[]


The *DataSciencePipelineApplication* has now successfully been configured and is ready for use.

Expand Down
4 changes: 2 additions & 2 deletions modules/chapter2/pages/rhoai-resources.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ If the *OPEN* link for the notebook is grayed out, the notebook container is sti

This takes us to the JupyterLab screen where we can select multiple options / tools / to work to begin our data science experimentation.

Our first action is to clone a git repository that contains notebooks including an sample notebook to familize yourself with the Jupiter notebook environment.
Our first action is to clone a git repository that contains notebooks including a sample notebook to familiarize yourself with the Jupyter notebook environment.

image::clone_repo_jupyter.gif[width=600]

Expand All @@ -67,7 +67,7 @@ https://github.com/rh-aiservices-bu/fraud-detection.git

. Copy the URL link above

. Click on the Clone a Repo Icon above explorer section window.
. Click on the Clone a Repo Icon above the explorer section window.
//+
//image::clone_a_repo.png[width=640]

Expand Down
24 changes: 10 additions & 14 deletions modules/chapter3/pages/elyra-pipelines.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ If the pipeline server is deployed post workbench creation, the runtime configur
.. you will need the following required fields:

... Runtime: *Display Name*
... Data Science Pipelines API Endpoint: found in tne Network / Routes section of OCP Dashboard
... Data Science Pipelines API Endpoint: found in the Network / Routes section of OCP Dashboard
... Data Science Pipeline Engine Type: *Argo* (pre-configured)
... Cloud Object Storage Endpoint: S3 compatible storage (same as data connection endpoint)
... Cloud Object Storage Bucket Name: Name of the S3 bucket
Expand Down Expand Up @@ -55,7 +55,7 @@ Let's now use Elyra to package the nodes into a pipeline and submit it to the Da

=== Review opening JupyterLab

Once the `fraud-detection` workbench has successfully started, we will being the process of exploring and building our pipeline.
Once the `fraud-detection` workbench has successfully started, we will begin the process of exploring and building our pipeline.

. Ensure that the `fraud-detection` workbench is in `Running` state. Click the `Open` link on the far right of the work bench menu. Log in to the workbench as the `admin` user. If you are running the workbench for the first time, click `Allow selected permissions` in the `Authorize Access` page to open the Jupyter Notebook interface.

Expand All @@ -75,7 +75,7 @@ image::elyra_pipeline_nodes.gif[width=600]

. Click on the `Pipeline Editor` tile in the launcher menu. This opens up Elyra's visual pipeline editor. You will use the visual pipeline editor to drag-and-drop files from the file browser onto the canvas area. These files then define the individual tasks of your pipeline.

. Rename the pipeline file to `fraud-detection-elyra.pipeline: Right click the untiled pipeline name, choose rename, and then select `Save Pipeline` in the top toolbar.
. Rename the pipeline file to `fraud-detection-elyra.pipeline: Right click the untitled pipeline name, choose rename, and then select `Save Pipeline` in the top toolbar.

. Drag the `experiment_train.ipynb` notebook onto the empty canvas. This will allow the pipeline to ingest the data we want to classify, pre-process the data, train a model, and run a sample test to validate the model is working as intended.
+
Expand All @@ -89,7 +89,7 @@ image::elyra_pipeline_nodes.gif[width=600]
+
//image::pipeline-3.png[]
+
You should now see the two nodes connected through a solid line. We have now defined a simple pipeline with two tasks, which are executed sequentially, first experiment_train to produce a model articfact, then save-model to move the model to workbench S3 storage.
You should now see the two nodes connected through a solid line. We have now defined a simple pipeline with two tasks, which are executed sequentially, first experiment_train to produce a model artifact, then save-model to move the model to workbench S3 storage.
+
[NOTE]
====
Expand Down Expand Up @@ -130,14 +130,13 @@ image::experiment_node_config_2.gif[width=600]

. In the `File Dependencies` section, you can declare one or more _input files_. These input files are consumed by this pipeline task as the data needed to train to the model.

. Under file dependencies *click add*, next select browse and choose the data/card_transdata.csv file which provides a sampling of credit card to be used.
. Under file dependencies *click add*, then select browse and choose the data/card_transdata.csv file which provides a sampling of credit card transaction data to be used to train the model.

. In the `Outputs` section, you can declare one or more _output files_. These output files are created by this pipeline task and are made available to all subsequent tasks.

. Click `Add` in the `Outputs` section and input `models/fraud/1/model.onnx`. This ensures that the downloaded model artifact is available to downstream tasks, including the `save_models` task.
+
//image::pipeline-config-5.png[]
+


[NOTE]
====
By default, all files within a containerized task are removed after its execution, so declaring files explicitly as output files is one way to ensure that they can be reused in downstream tasks.
Expand All @@ -147,7 +146,7 @@ Output files are automatically managed by Data Science Pipelines, and stored in

=== Set Kubernetes Secrets for Storage Access

. Click on the `save_model` node. Then select open panel to view the "Node Properties" configuration panel. If not, right-click on the node and select `Open Properties`.
. Click on the `save_model` node. Then select the open panel to view the "Node Properties" configuration panel. If not, right-click on the node and select `Open Properties`.

. Next we will configure the data connection to the `my-storage` bucket as a Kubernetes secret.
.. By default these secrets are created in the environment variable section in pipeline properties
Expand Down Expand Up @@ -254,7 +253,7 @@ In the `Scheduled` tab you're able to view runs of the fraud-detection-elyra pip
[WARNING]
====
Pipeline versioning implemented in Data Science Pipelines.
If you change or resumit an Elyra pipeline that you have already submitted before, a new version is automatically created and executed.
If you change or sumit an Elyra pipeline that you have already submitted before, a new version is automatically created and executed.
====

Expand All @@ -268,7 +267,7 @@ Let's finally peek behind the scenes and inspect the S3 bucket that Elyra and Da
--
* `pipelines`: A folder used by Data Science Pipelines to store all pipeline definitions in YAML format.
* `artifacts`: A folder used by Data Science Pipelines to store the metadata of each pipeline task for each pipeline run.
* One folder for each pipeline run with name `[pipeline-name]-[timestamp]`. These folders are managed by Elyra and contain all file dependencies, log files, and output files of each task.
* A folder for each pipeline run with name `[pipeline-name]-[timestamp]`. These folders are managed by Elyra and contain all file dependencies, log files, and output files of each task.
--

[NOTE]
Expand All @@ -278,8 +277,5 @@ The logs from the Pipeline submitted from Elyra will show generic task informati
To view logs from the execution of our code, you can find the log files from our tasks in the runs in the Data Science Pipelines bucket.
====

//image::pipelines-bucket.png[title=Data Science Pipeline Bucket contents]

//image::pipeline-artifacts.png[title=Data Science Pipeline Run Artifacts]

Now that we have seen how to work with Data Science Pipelines through Elyra, let's take a closer look at the Kubeflow Pipelines SDK.
10 changes: 5 additions & 5 deletions modules/chapter4/pages/kfp-import.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This course does not delve into the details of how to use the SDK. Instead, it p

=== Prerequisites

* Continue to use the `fraud-detection` Data Science Project that you created in the previous section. We won't need a workbench in this section, but you should have completed all lab exercises up through Data Science Pipelines / OpenShift AI Resources section of the course in order for the environment to support pipeline creation.
* Continue to use the `fraud-detection` Data Science Project that you created in the previous section. We won't need a workbench in this section, but you should have completed all lab exercises up through the Data Science Pipelines / OpenShift AI Resources section of the course in order for the environment to support pipeline creation.

image::import_pipeline_yaml.gif[width=600]

Expand Down Expand Up @@ -41,7 +41,7 @@ There are three tabs in DSP dashboard:

* Graph - the DAG view of the pipeline defined
* Summary - shows the pipeline spec and version IDs
* Pipeline spec - is the import yaml file that defined the pipeline.
* Pipeline spec - is the import yaml file that defines the pipeline.

The pipeline is now available to be executed, but currently there have been no *one-off or scheduled runs* for this pipeline, it has only been defined in the system.

Expand Down Expand Up @@ -73,9 +73,9 @@ To create a _run_ for the fraud-detection-example pipeline we just imported.

. Select the option to create a new run.

To execute the run, we need to imput some information:
To execute the run, we need to input some information:

. Run Type: Scheduled runs are exectued from different dashboard - skip this step
. Run Type: Scheduled runs are executed from different dashboard - skip this step

. Define the project and experiment name:

Expand All @@ -101,7 +101,7 @@ In this case there are two required parameters, which allow this pipeline to hav

. The first parameter is a url location of the data file to be imported during the Run.

. The second paramenter is number of epochs.
. The second parameter is the number of epochs.
+
[NOTE]
This epoch's number is an important hyperparameter for the algorithm. It specifies the number of epochs or complete passes of the entire training dataset passing through the training or learning process of the algorithm.
Expand Down

0 comments on commit d88c328

Please sign in to comment.