Skip to content

Commit

Permalink
English edits (#7)
Browse files Browse the repository at this point in the history
* English edits

* Update README.md

Co-authored-by: guy1992l <83535508+guy1992l@users.noreply.github.com>

* minoe edits

---------

Co-authored-by: guy1992l <83535508+guy1992l@users.noreply.github.com>
  • Loading branch information
jillnogold and guy1992l authored Jun 2, 2024
1 parent 4b54e81 commit e44a87d
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 27 deletions.
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,22 @@

<img src="./images/call-center-readme.png" alt="huggingface-mlrun" style="width: 600px"/>

In this demo we will be showcasing how we used LLMs to turn call center conversation audio files of customers and agents into valueable data in a single workflow orchastrated by MLRun.
This demo showcases how to use LLMs to turn audio files from call center conversations between customers and agents into valuable data, all in a single workflow orchestrated by MLRun.

MLRun will be automating the entire workflow, auto-scale resources as needed and automatically log and parse values between the workflow different steps.
MLRun automates the entire workflow, auto-scales resources as needed, and automatically logs and parses values between the different workflow steps.

By the end of this demo you will see the potential power of LLMs for feature extraction, and how easy it is being done using MLRun!
By the end of this demo you will see the potential power of LLMs for feature extraction, and how easily you can do this with MLRun!

We will use:
* [**OpenAI's Whisper**](https://openai.com/research/whisper) - To transcribe the audio calls into text.
* [**Flair**](https://flairnlp.github.io/) and [**Microsoft's Presidio**](https://microsoft.github.io/presidio/) - To recognize PII for filtering it out.
* [**HuggingFace**](https://huggingface.co/) - as the main machine learning framework to get the model and tokenizer for the features extraction. The demo uses [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as the LLM to asnwer questions.
* and [**MLRun**](https://www.mlrun.org/) - as the orchastraitor to operationalize the workflow.
This demo uses:
* [**OpenAI's Whisper**](https://openai.com/research/whisper) &mdash; To transcribe the audio calls into text.
* [**Flair**](https://flairnlp.github.io/) and [**Microsoft's Presidio**](https://microsoft.github.io/presidio/) - To recognize PII so it can be filtered out.
* [**HuggingFace**](https://huggingface.co/) &mdash; The main machine-learning framework to get the model and tokenizer for the features extraction. The demo uses [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as the LLM to answer questions.
* and [**MLRun**](https://www.mlrun.org/) &mdash; as the orchestrator to operationalize the workflow.

The demo contains a single [notebook](./notebook.ipynb) that covers the entire demo.
The demo contains a single [notebook](./notebook.ipynb) that encompasses the entire demo.

Most of the functions are being imported from [MLRun's hub](https://docs.mlrun.org/en/stable/runtimes/load-from-hub.html) - a wide range of functions that can be used for a variety of use cases. You can find all the python source code under [/src](./src) and links to the used functions from the hub in the notebook.

Most of the functions are imported from [MLRun's function hub](https://docs.mlrun.org/en/stable/runtimes/load-from-hub.html), which contains a wide range of functions that can be used for a variety of use cases. All functions used in the demo include links to their source in the hub. All of the python source code is under [/src](./src).
Enjoy!

___
Expand All @@ -29,25 +29,25 @@ This project can run in different development environments:
* Inside GitHub Codespaces
* Other managed Jupyter environments

### Install the code and mlrun client
### Install the code and the mlrun client

To get started, fork this repo into your GitHub account and clone it into your development environment.

To install the package dependencies (not required in GitHub codespaces) use:

make install-requirements

If you prefer to use Conda use this instead (to create and configure a conda env):
If you prefer to use Conda, use this instead (to create and configure a conda env):

make conda-env

> Make sure you open the notebooks and select the `mlrun` conda environment
### Install or connect to MLRun service/cluster
### Install or connect to the MLRun service/cluster

The MLRun service and computation can run locally (minimal setup) or over a remote Kubernetes environment.

If your development environment support docker and have enough CPU resources run:
If your development environment supports Docker and there are sufficient CPU resources, run:

make mlrun-docker

Expand All @@ -57,10 +57,10 @@ If your environment is minimal, run mlrun as a process (no UI):

[conda activate mlrun &&] make mlrun-api

For MLRun to run properly you should set your client environment, this is not required when using **codespaces**, the mlrun **conda** environment, or **iguazio** managed notebooks.
For MLRun to run properly you should set your client environment. This is not required when using **codespaces**, the mlrun **conda** environment, or **iguazio** managed notebooks.

Your environment should include `MLRUN_ENV_FILE=<absolute path to the ./mlrun.env file> ` (point to the mlrun .env file
in this repo), see [mlrun client setup](https://docs.mlrun.org/en/latest/install/remote.html) instructions for details.
in this repo); see [mlrun client setup](https://docs.mlrun.org/en/latest/install/remote.html) instructions for details.

> Note: You can also use a remote MLRun service (over Kubernetes), instead of starting a local mlrun,
> edit the [mlrun.env](./mlrun.env) and specify its address and credentials
> Note: You can also use a remote MLRun service (over Kubernetes): instead of starting a local mlrun:
> edit the [mlrun.env](./mlrun.env) and specify its address and credentials.
10 changes: 1 addition & 9 deletions notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@
"\n",
"> Note: Multiple GPUs (`gpus` > 1) automatically deploy [OpenMPI](https://www.open-mpi.org/) jobs for **better performance and GPU utilization**.\n",
"\n",
"There are not many functions under the source directory. That's because most of the code in this project is imported from [**MLRun's Functions Hub**](https://www.mlrun.org/hub/) &mdash; a collection of reusable functions and assets that are optimized and tested to simplify and accelate the move to production!"
"There are not many functions under the source directory. That's because most of the code in this project is imported from [**MLRun's Function hub**](https://www.mlrun.org/hub/) &mdash; a collection of reusable functions and assets that are optimized and tested to simplify and accelate the move to production!"
]
},
{
Expand Down Expand Up @@ -1167,14 +1167,6 @@
"* [x] **Anonymization** - Anonymize the text before inferring.\n",
"* [x] **Analysis** - Perform question answering for feature extraction using Falcon-40B."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f13c10d-9f21-4c1a-8c62-b49c31880ca4",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit e44a87d

Please sign in to comment.