From aa92d5eaf609c7a8d3937e61794415f598d5c3da Mon Sep 17 00:00:00 2001 From: "Tiancheng Zhao (Tony)" Date: Fri, 10 Jan 2025 15:37:22 -0800 Subject: [PATCH 1/5] Fix Typos in Update README.md --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 5444fab4..4306af43 100644 --- a/README.md +++ b/README.md @@ -90,7 +90,7 @@ For more information about the container.yaml configuration, please refer to the ## 🤖 Example Projects -### Video QA Agents +### 1. Video QA Agents Build a system that can answer any questions about uploaded videos with video understanding agents. See Details [here](examples/video_understanding/README.md). More about the video understanding agent can be found in [paper](https://arxiv.org/abs/2406.16620).

@@ -98,7 +98,7 @@ More about the video understanding agent can be found in [paper](https://arxiv.o

-### Mobile Personal Assistant +### 2. Mobile Personal Assistant Build your personal mulitmodal assistant just like Google Astral in 2 minutes. See Details [here](docs/tutorials/agent_with_app.md).

@@ -106,7 +106,7 @@ Build your personal mulitmodal assistant just like Google Astral in 2 minutes. S ### 3. Agentic Operators -We define reusable agent agentic workflows, e.g. CoT, ReAct, and etc as **agent operators**. This project compares various recently proposed reasoning agent operators with the same LLM choice and test datasets. How do they perform? See details [here](docs/concepts/agent_operators.md). +We define reusable agentic workflows, e.g. CoT, ReAct, and etc as **agent operators**. This project compares various recently proposed reasoning agent operators with the same LLM choice and test datasets. How do they perform? See details [here](docs/concepts/agent_operators.md). | **Algorithm** | **LLM** | **Average** | **gsm8k-score** | **gsm8k-cost($)** | **AQuA-score** | **AQuA-cost($)** | | :-----------------: | :------------: | :-------------: | :---------------: | :-------------------: | :------------------------------------: | :---: | @@ -157,4 +157,4 @@ If you find our repository beneficial, please cite our paper: journal={arXiv preprint arXiv:2406.16620}, year={2024} } -``` \ No newline at end of file +``` From 1e9965e11006d440a34b1439765d6323bfa4056c Mon Sep 17 00:00:00 2001 From: Chaitanya Awasthi Date: Sat, 18 Jan 2025 01:27:16 +0530 Subject: [PATCH 2/5] First commit --- OmAgent | 1 + 1 file changed, 1 insertion(+) create mode 160000 OmAgent diff --git a/OmAgent b/OmAgent new file mode 160000 index 00000000..01d7864b --- /dev/null +++ b/OmAgent @@ -0,0 +1 @@ +Subproject commit 01d7864b9b556f40e9682b4cfd51a16939ef4c0c From 5a49ce6e862493d23f72a457288b7b00a2f97936 Mon Sep 17 00:00:00 2001 From: Chaitanya Awasthi Date: Sat, 18 Jan 2025 04:22:30 +0530 Subject: [PATCH 3/5] adding upstream --- OmAgent | 1 + 1 file changed, 1 insertion(+) create mode 160000 OmAgent diff --git a/OmAgent b/OmAgent new file mode 160000 index 00000000..01d7864b --- /dev/null +++ b/OmAgent @@ -0,0 +1 @@ +Subproject commit 01d7864b9b556f40e9682b4cfd51a16939ef4c0c From 496ab53347d697fe828202131093e8b66b54fd18 Mon Sep 17 00:00:00 2001 From: Chaitanya Awasthi Date: Sat, 18 Jan 2025 23:40:17 +0530 Subject: [PATCH 4/5] Iniital commit --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4306af43..f103e79a 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ ## 📖 Introduction -OmAgent is python library for building multimodal language agents with ease. We try to keep the library **simple** without too much overhead like other agent framework. +OmAgent is python library for building multimodal language agents with ease. We try to keep the library **simple** without too much overhead like other agent frameworks. - We wrap the complex engineering (worker orchestration, task queue, node optimization, etc.) behind the scene and only leave you with a super-easy-to-use interface to define your agent. - We further enable useful abstractions for reusable agent components, so you can build complex agents aggregating from those basic components. - We also provides features required for multimodal agents, such as native support for VLM models, video processing, and mobile device connection to make it easy for developers and researchers building agents that can reason over not only text, but image, video and audio inputs. From 7748684571fe1b582dd1255594e9e722db9c207e Mon Sep 17 00:00:00 2001 From: Chaitanya Awasthi Date: Sun, 19 Jan 2025 03:17:29 +0530 Subject: [PATCH 5/5] Fixing typos in documentation --- OmAgent | 1 - README.md | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) delete mode 160000 OmAgent diff --git a/OmAgent b/OmAgent deleted file mode 160000 index 01d7864b..00000000 --- a/OmAgent +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 01d7864b9b556f40e9682b4cfd51a16939ef4c0c diff --git a/README.md b/README.md index f103e79a..e7b1a4a4 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ ## 📖 Introduction -OmAgent is python library for building multimodal language agents with ease. We try to keep the library **simple** without too much overhead like other agent frameworks. +OmAgent is Python library for building multimodal language agents with ease. We try to keep the library **simple** without too much overhead like other agent frameworks. - We wrap the complex engineering (worker orchestration, task queue, node optimization, etc.) behind the scene and only leave you with a super-easy-to-use interface to define your agent. - We further enable useful abstractions for reusable agent components, so you can build complex agents aggregating from those basic components. - We also provides features required for multimodal agents, such as native support for VLM models, video processing, and mobile device connection to make it easy for developers and researchers building agents that can reason over not only text, but image, video and audio inputs.