Skip to content

Commit

Permalink
Setup and Serve part
Browse files Browse the repository at this point in the history
  • Loading branch information
lordofthejars committed Oct 11, 2024
1 parent a0e87c0 commit 7e7cdca
Show file tree
Hide file tree
Showing 4 changed files with 338 additions and 39 deletions.
49 changes: 34 additions & 15 deletions documentation/modules/ROOT/pages/01-setup.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,45 @@ include::_attributes.adoc[]
[#prerequisite]
== Prerequisite CLI tools

include::https://raw.githubusercontent.com/redhat-developer-demos/rhd-tutorial-common/master/prerequisites-kubernetes.adoc[]
|===
For this deep dive you need the `ilab` CLI tool installed.
It handles the main tuning workflow.
Currently, it supports Linux systems and Apple/Silicon Macs (M1/M2/M3), as well as Windows with WSL2.

include::https://raw.githubusercontent.com/redhat-developer-demos/rhd-tutorial-common/master/optional-requisites.adoc[]
|===
The installation instructions are different depending on your operative system and/or if you want to use `ilab` with or without GPU.

[#minikube]
== Setup Kubernetes
Moreover, you install InstructLab CLI using Python (as it is the easiest way), and you might use tools like `pyenv` to isolate the installation.

:profile: my_profile
For this reason, we recommend you take a look at https://github.com/instructlab/instructlab?tab=readme-ov-file#-installing-ilab[Installing ilab] and install ilab in the more convenient.

include::https://raw.githubusercontent.com/redhat-developer-demos/rhd-tutorial-common/master/kubernetes-setup.adoc[]
IMPORTANT: At this time you need to use Python 3.10 or 3.11, not any other Python version is supported.

And then you are ready for start using Kubernetes:
The following snippet shows the installation in a Apple Mac:

image::kubelogo.png[]
[.console-input]
[source, bash,subs="+macros,+attributes"]
----
mkdir instructlab && cd instructlab

[#downloadtutorial]
== Get tutorial sources
python3 -m venv --upgrade-deps venv
source venv/bin/activate

:tutorial-url: https://github.com/redhat-developer-demos/rhd-tutorial-common.git
:folder: my_folder
include::https://raw.githubusercontent.com/redhat-developer-demos/rhd-tutorial-common/master/download-sources.adoc[]
pip cache remove llama_cpp_python

pip install 'instructlab[mps]==0.19.3'
----

To check if `ilab` is installed correctly run the following command:

[.console-input]
[source, bash,subs="+macros,+attributes"]
----
ilab --version
----

[.console-output]
[source, bash,subs="+macros,+attributes"]
----
ilab, version 0.19.3
----

You should see the `ilab` version printed, at this time `version 0.19.3`.
144 changes: 123 additions & 21 deletions documentation/modules/ROOT/pages/02-deploy.adoc
Original file line number Diff line number Diff line change
@@ -1,43 +1,145 @@
= Deploy Service
= Serving Models
include::_attributes.adoc[]

[#service]
== The Service
== Initializing InstructLab

The code:
With ilab installed, we can initialize our tuning environment with the `ilab config init` command.
This will download the Taxonomy repository which contains a default configuration file and community-provided knowledge as examples to train the model.

[.lines_7]
[.console-input]
[source, java,subs="+macros,+attributes"]
[source, bash,subs="+macros,+attributes"]
----
ilab config init
----
public class Main {

public static void main(String[] args) {
TIP: You could scaffold your taxonomy repository with your organization defaults, but for now, we'll stick with the default one.

}
[.console-output]
[source, bash]
----
Welcome to InstructLab CLI. This guide will help you to setup your environment.
Please provide the following values to initiate the environment [press Enter for defaults]:
Path to taxonomy repo [/Users/asotobue/.local/share/instructlab/taxonomy]:
./taxonomy seems to not exist or is empty. Should I clone https://github.com/instructlab/taxonomy.git for you? [Y/n]:
Cloning https://github.com/instructlab/taxonomy.git...
Path to your model [/Users/asotobue/.cache/instructlab/models/merlinite-7b-lab-Q4_K_M.gguf]:
Generating `/Users/asotobue/.config/instructlab/config.yaml`
Please choose a train profile to use.
Train profiles assist with the complexity of configuring InstructLab training for specific GPU hardware.
You can still take advantage of hardware acceleration for training even if your hardware is not listed.
[0] No profile (CPU, Apple Metal, AMD ROCm)
[1] Nvidia A100/H100 x2 (A100_H100_x2.yaml)
[2] Nvidia A100/H100 x4 (A100_H100_x4.yaml)
[3] Nvidia A100/H100 x8 (A100_H100_x8.yaml)
[4] Nvidia L40 x4 (L40_x4.yaml)
[5] Nvidia L40 x8 (L40_x8.yaml)
[6] Nvidia L4 x8 (L4_x8.yaml)
...
----

}
The most important file there is the configuration file, which defines the foundational model we’ll be training and includes defaults such as parameters for training and serving.
File is placed by default at `<home>/.config/instructlab/config.yaml`.

./mvnw compile
----
In this example, we use *merlinite-7b* as a model, but you could use *Granite*, *Mistral*, *Llama*, or any other supported model (`gguf` format).

[#package]
== Packaging the Service
== Downloading, serving, and testing a model with InstructLab

=== Downlaoding a model

Before fine-tuning the model, let's test the model with default training.
To get started, download https://huggingface.co/ibm/merlinite-7b[Merlinite] pre-trained & quantized model with the `ilab model download` command.

[.console-input]
[source, bash,subs="+macros,+attributes"]
----
ilab model download
----

You can package the next bash script:
[.console-output]
[source, bash,subs="+macros,+attributes"]
----
Downloading model from instructlab/merlinite-7b-lab-GGUF@main to models...
Downloading 'merlinite-7b-lab-Q4_K_M.gguf' to 'models/.huggingface/download/merlinite-7b-lab-Q4_K_M.gguf.9ca044d727db34750e1aeb04e3b18c3cf4a8c064a9ac96cf00448c506631d16c.incomplete'
INFO 2024-06-11 23:21:23,255 file_download.py:1877 Downloading 'merlinite-7b-lab-Q4_K_M.gguf' to 'models/.huggingface/download/merlinite-7b-lab-Q4_K_M.gguf.9ca044d727db34750e1aeb04e3b18c3cf4a8c064a9ac96cf00448c506631d16c.incomplete'
merlinite-7b-lab-Q4_K_M.gguf: 2%|▊ | 105M/4.37G [01:23<57:18, 1.24MB/s]
----

Now, let’s serve the model to be inferenced from your local machine.

=== Serving a model

To serve a model with InstructLab, use the `ilab model serve` command.

[source,bash,subs="+macros,+attributes"]
[.console-input]
[source, bash,subs="+macros,+attributes"]
----
ilab model serve
----
include::example$run.sh[]

[.console-output]
[source, bash,subs="+macros,+attributes"]
----
INFO 2024-06-11 23:27:21,994 lab.py:340 Using model 'models/merlinite-7b-lab-Q4_K_M.gguf' with -1 gpu-layers and 4096 max context size.
INFO 2024-06-11 23:27:40,984 server.py:206 Starting server process, press CTRL+C to shutdown server...
INFO 2024-06-11 23:27:40,984 server.py:207 After application startup complete see http://127.0.0.1:8000/docs for API.
----

Now, model is deployed locally and you can interact with it.
You have three options:

[#deploy]
== Deploy the Service
* InstructLab exposes the model using OpenAI API, so you can develop an application using for example LangChain, and interact with it.
* Navigate to http://127.0.0.1:8000/docs to visit the Swagger UI of the model and interact with it.
* Use `ilab model chat` command.

And then you can deploy the service and execute commands inside:
=== Testing a model

Let's use the later approach to interact with the model.

Open a new terminal window, and navigate to your InstructLab directory, and enter your virtual environment again by running `source venv/bin/activate`.

Then run `ilab model` chat to enter a simple interface for conversing with the LLM.

[.console-input]
[source, bash,subs="+macros,+attributes"]
----
source venv/bin/activate
ilab model chat
----

[.console-output]
[source, bash,subs="+macros,+attributes"]
----
╭───────────────────────────────────────────────────────
│ Welcome to InstructLab Chat w/ MODELS/MERLINITE-7B-LAB-Q4_K_M.GGUF (type /h for help) │
╰───────────────────────────────────────────────────────
>>> What languages are spoken in Canada?
╭──────────────────────────────────────── models/merlinite-7b-lab-Q4_K_M.gguf
│ Canadian society is multilingual, with English and French being the two official languages recognized at the federal level.
----

But then query the following question: *what is the price of a new Flux capacitor for DeLorean car*.
You'll receive a polite answer saying that has no knowledge to answe this question.

[.console-input]
[source, bash,subs="+macros,+attributes"]
----
What is the price of a new Flux capacitor for DeLorean car?
----

[.console-output]
[source, bash,subs="+macros,+attributes"]
----
──────────────────────────────────────────────────────────────────────────╮
│ I understand that you re asking about the cost of a flux capacitor for a specific model
....
----

:podname: apps
So, obviously we need to fine-tuning our model to have the konwledge about the Back To the Future movie and De-Lorean car.

include::partial$exec_pod.adoc[]
Then, type `exit` to stop the interactive chat window.
Also, stop serving the model by typing kbd:[Ctrl+C] to stop the process.

:!podname:
Let's move to the next section to learn how to fine-tune a model.
Loading

0 comments on commit 7e7cdca

Please sign in to comment.