diff --git a/modules/LABENV/pages/index.adoc b/modules/LABENV/pages/index.adoc index f8b3c29..dd50f5a 100644 --- a/modules/LABENV/pages/index.adoc +++ b/modules/LABENV/pages/index.adoc @@ -21,9 +21,9 @@ When ordering this catalog item in RHDP: . Click order -For Red Hat partners who do not have access to RHDP, you need to provision an OpenShift AI cluster on-premises, or in the supported cloud environments by following the product documentation. at https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.12/html/installing_and_uninstalling_openshift_ai_self-managed/index[Product Documentation for installing Red Hat OpenShift AI 2.12]. +For Red Hat partners who do not have access to RHDP, you need to provision an OpenShift AI cluster on-premises, or in the supported cloud environments by following the product documentation at https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.12/html/installing_and_uninstalling_openshift_ai_self-managed/index[Product Documentation for installing Red Hat OpenShift AI 2.12]. -The OCP environment will provide the foundation infrastructure for RHOAI. Once logged into the OCP dashboard, we need to install the Operators to enable RHOAI components in the OCP platform. +The OCP environment will provide the foundation infrastructure for Red Hat OpenShift AI (RHOAI). Once logged into the OCP dashboard, we need to install the Operators to enable RHOAI components in the OCP platform. == Operators and Red Hat OpenShift Container Platform @@ -41,8 +41,6 @@ This exercise uses the Red Hat Demo Platform; specifically the OpenShift Contain . Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned. -. It’s sufficient to install all prerequisite operators with default settings, no additional configuration is necessary. - . Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators. [*] You do not have to wait for the previous Operator to complete before installing the next. For this lab you can skip the installation of the optional operators as there is no accelerator required. diff --git a/modules/LABENV/pages/minio-install.adoc b/modules/LABENV/pages/minio-install.adoc index caeba3e..32ace95 100644 --- a/modules/LABENV/pages/minio-install.adoc +++ b/modules/LABENV/pages/minio-install.adoc @@ -19,13 +19,12 @@ To Deploy MinIO, we will utilize the OpenShift Dashboard. image::minio_install.gif[width=600] - - . Click on the Project Selection list dropdown and select the "fraud-detection" project or the data science project you created in the previous step. - . Then Select the + (plus) icon from the top right of the dashboard. . In the new window, we will paste the following YAML file. In the YAML below its recommended to change the default user name & password. + . Click on the Project Selection list dropdown and select the "fraud-detection" project or the data science project you created in the previous step. + ```yaml --- @@ -207,6 +206,8 @@ From the OCP Dashboard: . For the first step, select the UI route and paste it or open in a new browser tab or window. + + . If you see a landing page with the message *application not available*, refresh the page a few times as the service is still loading. . The displayed page is the MinIO Dashboard. Log in with the username/password combination you set, or the defaults listed below. diff --git a/modules/chapter1/images/pipeline_dag_overview.gif b/modules/chapter1/images/pipeline_dag_overview.gif new file mode 100644 index 0000000..cc2031f Binary files /dev/null and b/modules/chapter1/images/pipeline_dag_overview.gif differ diff --git a/modules/chapter1/nav.adoc b/modules/chapter1/nav.adoc index a130e73..c484d76 100644 --- a/modules/chapter1/nav.adoc +++ b/modules/chapter1/nav.adoc @@ -1,3 +1,3 @@ * xref:dsp-intro.adoc[] ** xref:dsp-concepts.adoc[] -** xref:section1.adoc[] \ No newline at end of file +//** xref:section1.adoc[] \ No newline at end of file diff --git a/modules/chapter1/pages/dsp-concepts.adoc b/modules/chapter1/pages/dsp-concepts.adoc index 57a8cea..09444ca 100644 --- a/modules/chapter1/pages/dsp-concepts.adoc +++ b/modules/chapter1/pages/dsp-concepts.adoc @@ -7,6 +7,23 @@ A pipeline is an execution graph of tasks, commonly known as a _DAG_ (Directed A A DAG is a directed graph without any cycles, i.e. direct loops. ==== +image::pipeline_dag_overview.gif[width=600] + +== Specific Data Science Pipeline terminology in OpenShift AI DSP + + . *Pipeline* - is a workflow definition containing the steps and their input and output artifacts. + + . *Run* - is a single execution of a pipeline. A run can be a one off execution of a pipeline, or pipelines can be scheduled as a recurring run. + + . *Task* - is a self-contained pipeline component that represents an execution stage in the pipeline. + + . *Artifact* - Steps have the ability to create artifacts, which are objects that can be persisted after the execution of the step completes. Other steps may use those artifacts as inputs and some artifacts may be useful references after a pipeline run has completed. Artifacts automatically stored by Data Science Pipelines in S3 compatible storage. + + . *Experiment* - is a logical grouping of runs for the purpose of comparing different pipelines + + . *Execution* - is an instance of a Task/Component + + == Why data science pipelines A data science pipeline is typically implemented to improve the repeatability of a data science experiment. While the larger experimentation process may include steps such as data exploration, where data scientists seek to create a fundamental understanding of the characteristics of the data, data science pipelines tend to focus on turning a viable experiment into a repeatable solution that can be iterated on. @@ -29,3 +46,14 @@ Data science pipelines may consists of several key activities that are performed A single pipeline may include the ability to train multiple models, complete complex hyperparameter searches, or more. Data Scientists can use a well crafted pipeline to quickly iterate on a model, adjust how data is transformed, test different algorithms, and more. While the steps described above describe a common pattern for model training, different use cases and projects may have vastly different requirements and the tools and framework selected for creating a data science pipeline should help to enable a flexible design. +=== Technical Knowledge + +OpenShift AI uses Kubeflow pipelines with Argo workflows as the engine. Kubeflow provides a rich set of tools for managing ML workloads, while Argo workflows offer powerful automation capabilities. Together, they enable us to create robust, scalable, and manageable pipelines for AI model development and serving. + +Pipelines can include various components, such as data ingestion, data preprocessing, model training, evaluation, and deployment. These components can be configured to run in a specific order, and the pipeline can be executed multiple times to produce different versions of models or artifacts. + +Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple RUN command, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully. + +In summary, data science pipelines are an essential tool for automating and managing the ML lifecycle, enabling data scientists to create end-to-end workflows, reduce human error, and ensure consistent, high-quality results. + +Let's explore how to build and deploy these powerful pipelines using OpenShift AI data science pipelines. \ No newline at end of file diff --git a/modules/chapter1/pages/section1.adoc b/modules/chapter1/pages/section1.adoc index d9b82a9..0489189 100644 --- a/modules/chapter1/pages/section1.adoc +++ b/modules/chapter1/pages/section1.adoc @@ -16,51 +16,61 @@ . *Execution* - is an instance of a Task/Component -=== Data Science Pipelines -[cols="1,1,1,1"] -|=== -|OpenShift AI Resource Name | Kubernetes Resource Name | Custom Resource | Description +== Managing Data Science Pipelines 2.0 -|Data Science Pipeline Application -|datasciencepipelinesapplications.datasciencepipelinesapplications.opendatahub.io -|Yes -|DSPA's create an instance of Data Science Pipelines. DSPA's require a data connection and an S3 bucket to create the instance. DSPA's are namespace scoped to prevent leaking data across multiple projects. +=== Configuring a pipeline server -|Pipelines -|N/A -|N/A -|When developing a pipeline, depending on the tool, users may generate a YAML based PipelineRun object that is then uploaded into the Dashboard to create an executable pipeline. Even though this yaml object is a valid Tekton PipelineRun it is intended to be uploaded to the Dashboard, and not applied directly to the cluster. +Before you can successfully create a pipeline in OpenShift AI, you must configure a pipeline server. This task includes configuring where your pipeline artifacts and data are stored. + + * You have an existing S3-compatible object storage bucket and you have configured write access to your S3 bucket on your storage account. + * You have created a data science project that you can add a pipeline server to. + * If you are configuring a pipeline server with an external database + ** Red Hat recommends that you use MySQL version 8.x. + ** Red Hat recommends that you use at least MariaDB version 10.5. -|Pipeline Runs -|pipelineruns.tekton.dev -|Yes -|A pipeline can be executed in a number of different ways, including from the Dashboard, which will result in the creation of a pipelinerun. +=== Defining a Pipeline 2.0 +Use the latest Kubeflow Pipelines 2.0 SDK to build your data science pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. After defining the pipeline, you can import the YAML file to the OpenShift AI dashboard to enable you to configure its execution settings. -|=== +You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information about creating pipelines in JupyterLab, see Creating Pipelines in Elyra in the section below. For more information about the Elyra JupyterLab extension, https://elyra.readthedocs.io/en/v2.0.0/getting_started/overview.html[see Elyra Documentation.] -=== Technical Knowledge -OpenShift AI uses Kubeflow pipelines with Argo workflows as the engine. Kubeflow provides a rich set of tools for managing ML workloads, while Argo workflows offer powerful automation capabilities. Together, they enable us to create robust, scalable, and manageable pipelines for AI model development and serving. +=== Importing a Pipeline 2.0 +To help you begin working with data science pipelines in OpenShift AI, you can import a YAML file containing your pipeline’s code to an active pipeline server, or you can import the YAML file from a URL. +This file contains a Kubeflow pipeline compiled by using the Kubeflow compiler. After you have imported the pipeline to a pipeline server, you can execute the pipeline by creating a pipeline run. -Pipelines can include various components, such as data ingestion, data preprocessing, model training, evaluation, and deployment. These components can be configured to run in a specific order, and the pipeline can be executed multiple times to produce different versions of models or artifacts. -Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple RUN command, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully. +=== Pipeline Actions -In summary, data science pipelines are an essential tool for automating and managing the ML lifecycle, enabling data scientists to create end-to-end workflows, reduce human error, and ensure consistent, high-quality results. +OpenShift AI data science pipelines supports the following actions: -Let's explore how to build and deploy these powerful pipelines using OpenShift AI data science pipelines. + . Creating + . Scheduling + . Executing + . Viewing + . Archiving + . Restoring + . Deleting + . Stopping + . Duplicating -== Specific Data Science Pipeline terminology in OpenShift AI DSP +=== Working with pipeline logs +You can review and analyze step logs for each step in a triggered pipeline run. +To help you troubleshoot and audit your pipelines, you can review and analyze these step logs by using the log viewer in the OpenShift AI dashboard. - . *Pipeline* - is a workflow definition containing the steps and their input and output artifacts. + * Viewing logs + * Downloading logs - . *Run* - is a single execution of a pipeline. A run can be a one off execution of a pipeline, or pipelines can be scheduled as a recurring run. - . *Task* - is a self-contained pipeline component that represents an execution stage in the pipeline. +=== Technical Knowledge - . *Artifact* - Steps have the ability to create artifacts, which are objects that can be persisted after the execution of the step completes. Other steps may use those artifacts as inputs and some artifacts may be useful references after a pipeline run has completed. Artifacts automatically stored by Data Science Pipelines in S3 compatible storage. +OpenShift AI uses Kubeflow pipelines with Argo workflows as the engine. Kubeflow provides a rich set of tools for managing ML workloads, while Argo workflows offer powerful automation capabilities. Together, they enable us to create robust, scalable, and manageable pipelines for AI model development and serving. - . *Experiment* - is a logical grouping of runs for the purpose of comparing different pipelines +Pipelines can include various components, such as data ingestion, data preprocessing, model training, evaluation, and deployment. These components can be configured to run in a specific order, and the pipeline can be executed multiple times to produce different versions of models or artifacts. + +Additionally, pipelines can support control flows to handle complex dependencies between tasks. Once a pipeline is defined, executing it becomes a simple RUN command, and the status of each execution can be tracked and monitored, ensuring that the desired outputs are produced successfully. + +In summary, data science pipelines are an essential tool for automating and managing the ML lifecycle, enabling data scientists to create end-to-end workflows, reduce human error, and ensure consistent, high-quality results. + +Let's explore how to build and deploy these powerful pipelines using OpenShift AI data science pipelines. - . *Execution* - is an instance of a Task/Component diff --git a/modules/chapter2/images/dsp_workbench.gif b/modules/chapter2/images/dsp_workbench.gif new file mode 100644 index 0000000..4093751 Binary files /dev/null and b/modules/chapter2/images/dsp_workbench.gif differ diff --git a/modules/chapter2/nav.adoc b/modules/chapter2/nav.adoc index 3b88c77..2ccbe53 100644 --- a/modules/chapter2/nav.adoc +++ b/modules/chapter2/nav.adoc @@ -1,3 +1,4 @@ * xref:index.adoc[] ** xref:managing-dsp-pipelines.adoc[] -** xref:data-science-pipeline-app.adoc[] \ No newline at end of file +** xref:data-science-pipeline-app.adoc[] +** xref:rhoai-resources.adoc[] \ No newline at end of file diff --git a/modules/chapter3/pages/rhoai-resources.adoc b/modules/chapter2/pages/rhoai-resources.adoc similarity index 76% rename from modules/chapter3/pages/rhoai-resources.adoc rename to modules/chapter2/pages/rhoai-resources.adoc index cc7ffe1..d4755bd 100644 --- a/modules/chapter3/pages/rhoai-resources.adoc +++ b/modules/chapter2/pages/rhoai-resources.adoc @@ -38,29 +38,35 @@ Depending on the notebook image selected and the deployment size, it can take be JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. For a demonstration of JupyterLab and its features, https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html#what-will-happen-to-the-classic-notebook[you can view this video.] -Return to the ollama-model workbench dashboard in the OpenShift AI console. +Return to the fraud-detection workbench dashboard in the OpenShift AI console. - . Select the *Open* link to the right of the status section. + . Select the *Open* link to the right of the status section of the fraud-detection workbench + image::oai_open_jupyter.png[width=640] . When the new window opens, use the OpenShift admin user & password to login to JupyterLab. + . A landing page will prompt for Access Authorization. Make sure the boxes are checked for: + .. user:info + .. user:check-access . Click the *Allow selected permissions* button to complete login to the notebook. [NOTE] -If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image we opted to choose. +If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image / resources we opted to choose. == Inside JupyterLab This takes us to the JupyterLab screen where we can select multiple options / tools / to work to begin our data science experimentation. -Our first action is to clone a git repository that contains a collection of LLM projects including the notebook we are going to use to interact with the LLM. +Our first action is to clone a git repository that contains a notebooks including an example notebook to familize yourself with the Jupiter notebook environment. + +[NOTE} +==== +Add github repo here +==== -Clone the github repository to interact with the Ollama Framework from this location: -https://github.com/rh-aiservices-bu/llm-on-openshift.git . Copy the URL link above @@ -70,9 +76,9 @@ image::clone_a_repo.png[width=640] . Paste the link into the *clone a repo* pop up, make sure the *included submodules are checked*, then click the clone. - . Navigate to the llm-on-openshift/examples/notebooks/langchain folder: + . Navigate to the XYZ_ADD_CORRECT_FOLDER_HERE folder: - . Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_ + . Then open the file: ABC_ADD_CORRECT_FILE + image::navigate_ollama_notebook.png[width=640] diff --git a/modules/chapter3/nav.adoc b/modules/chapter3/nav.adoc index 67b8da8..90604a1 100644 --- a/modules/chapter3/nav.adoc +++ b/modules/chapter3/nav.adoc @@ -1,4 +1,4 @@ * xref:index.adoc[] -** xref:rhoai-resources.adoc[] +//** xref:rhoai-resources.adoc[] ** xref:elyra-pipelines.adoc[] //** xref:section3.adoc[] \ No newline at end of file