From e44a87d85c7864c8cb2a497e0392115f252dc1fd Mon Sep 17 00:00:00 2001
From: jillnogold <88145832+jillnogold@users.noreply.github.com>
Date: Sun, 2 Jun 2024 16:50:35 +0300
Subject: [PATCH] English edits (#7)
* English edits
* Update README.md
Co-authored-by: guy1992l <83535508+guy1992l@users.noreply.github.com>
* minoe edits
---------
Co-authored-by: guy1992l <83535508+guy1992l@users.noreply.github.com>
---
README.md | 36 ++++++++++++++++++------------------
notebook.ipynb | 10 +---------
2 files changed, 19 insertions(+), 27 deletions(-)
diff --git a/README.md b/README.md
index a9822bc..e5ad481 100644
--- a/README.md
+++ b/README.md
@@ -2,22 +2,22 @@
-In this demo we will be showcasing how we used LLMs to turn call center conversation audio files of customers and agents into valueable data in a single workflow orchastrated by MLRun.
+This demo showcases how to use LLMs to turn audio files from call center conversations between customers and agents into valuable data, all in a single workflow orchestrated by MLRun.
-MLRun will be automating the entire workflow, auto-scale resources as needed and automatically log and parse values between the workflow different steps.
+MLRun automates the entire workflow, auto-scales resources as needed, and automatically logs and parses values between the different workflow steps.
-By the end of this demo you will see the potential power of LLMs for feature extraction, and how easy it is being done using MLRun!
+By the end of this demo you will see the potential power of LLMs for feature extraction, and how easily you can do this with MLRun!
-We will use:
-* [**OpenAI's Whisper**](https://openai.com/research/whisper) - To transcribe the audio calls into text.
-* [**Flair**](https://flairnlp.github.io/) and [**Microsoft's Presidio**](https://microsoft.github.io/presidio/) - To recognize PII for filtering it out.
-* [**HuggingFace**](https://huggingface.co/) - as the main machine learning framework to get the model and tokenizer for the features extraction. The demo uses [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as the LLM to asnwer questions.
-* and [**MLRun**](https://www.mlrun.org/) - as the orchastraitor to operationalize the workflow.
+This demo uses:
+* [**OpenAI's Whisper**](https://openai.com/research/whisper) — To transcribe the audio calls into text.
+* [**Flair**](https://flairnlp.github.io/) and [**Microsoft's Presidio**](https://microsoft.github.io/presidio/) - To recognize PII so it can be filtered out.
+* [**HuggingFace**](https://huggingface.co/) — The main machine-learning framework to get the model and tokenizer for the features extraction. The demo uses [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as the LLM to answer questions.
+* and [**MLRun**](https://www.mlrun.org/) — as the orchestrator to operationalize the workflow.
-The demo contains a single [notebook](./notebook.ipynb) that covers the entire demo.
+The demo contains a single [notebook](./notebook.ipynb) that encompasses the entire demo.
-Most of the functions are being imported from [MLRun's hub](https://docs.mlrun.org/en/stable/runtimes/load-from-hub.html) - a wide range of functions that can be used for a variety of use cases. You can find all the python source code under [/src](./src) and links to the used functions from the hub in the notebook.
+Most of the functions are imported from [MLRun's function hub](https://docs.mlrun.org/en/stable/runtimes/load-from-hub.html), which contains a wide range of functions that can be used for a variety of use cases. All functions used in the demo include links to their source in the hub. All of the python source code is under [/src](./src).
Enjoy!
___
@@ -29,7 +29,7 @@ This project can run in different development environments:
* Inside GitHub Codespaces
* Other managed Jupyter environments
-### Install the code and mlrun client
+### Install the code and the mlrun client
To get started, fork this repo into your GitHub account and clone it into your development environment.
@@ -37,17 +37,17 @@ To install the package dependencies (not required in GitHub codespaces) use:
make install-requirements
-If you prefer to use Conda use this instead (to create and configure a conda env):
+If you prefer to use Conda, use this instead (to create and configure a conda env):
make conda-env
> Make sure you open the notebooks and select the `mlrun` conda environment
-### Install or connect to MLRun service/cluster
+### Install or connect to the MLRun service/cluster
The MLRun service and computation can run locally (minimal setup) or over a remote Kubernetes environment.
-If your development environment support docker and have enough CPU resources run:
+If your development environment supports Docker and there are sufficient CPU resources, run:
make mlrun-docker
@@ -57,10 +57,10 @@ If your environment is minimal, run mlrun as a process (no UI):
[conda activate mlrun &&] make mlrun-api
-For MLRun to run properly you should set your client environment, this is not required when using **codespaces**, the mlrun **conda** environment, or **iguazio** managed notebooks.
+For MLRun to run properly you should set your client environment. This is not required when using **codespaces**, the mlrun **conda** environment, or **iguazio** managed notebooks.
Your environment should include `MLRUN_ENV_FILE= ` (point to the mlrun .env file
-in this repo), see [mlrun client setup](https://docs.mlrun.org/en/latest/install/remote.html) instructions for details.
+in this repo); see [mlrun client setup](https://docs.mlrun.org/en/latest/install/remote.html) instructions for details.
-> Note: You can also use a remote MLRun service (over Kubernetes), instead of starting a local mlrun,
-> edit the [mlrun.env](./mlrun.env) and specify its address and credentials
+> Note: You can also use a remote MLRun service (over Kubernetes): instead of starting a local mlrun:
+> edit the [mlrun.env](./mlrun.env) and specify its address and credentials.
diff --git a/notebook.ipynb b/notebook.ipynb
index 2753526..02b6ead 100644
--- a/notebook.ipynb
+++ b/notebook.ipynb
@@ -154,7 +154,7 @@
"\n",
"> Note: Multiple GPUs (`gpus` > 1) automatically deploy [OpenMPI](https://www.open-mpi.org/) jobs for **better performance and GPU utilization**.\n",
"\n",
- "There are not many functions under the source directory. That's because most of the code in this project is imported from [**MLRun's Functions Hub**](https://www.mlrun.org/hub/) — a collection of reusable functions and assets that are optimized and tested to simplify and accelate the move to production!"
+ "There are not many functions under the source directory. That's because most of the code in this project is imported from [**MLRun's Function hub**](https://www.mlrun.org/hub/) — a collection of reusable functions and assets that are optimized and tested to simplify and accelate the move to production!"
]
},
{
@@ -1167,14 +1167,6 @@
"* [x] **Anonymization** - Anonymize the text before inferring.\n",
"* [x] **Analysis** - Perform question answering for feature extraction using Falcon-40B."
]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "2f13c10d-9f21-4c1a-8c62-b49c31880ca4",
- "metadata": {},
- "outputs": [],
- "source": []
}
],
"metadata": {