Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@


## 📖 Introduction
OmAgent is python library for building multimodal language agents with ease. We try to keep the library **simple** without too much overhead like other agent framework.
OmAgent is Python library for building multimodal language agents with ease. We try to keep the library **simple** without too much overhead like other agent frameworks.
- We wrap the complex engineering (worker orchestration, task queue, node optimization, etc.) behind the scene and only leave you with a super-easy-to-use interface to define your agent.
- We further enable useful abstractions for reusable agent components, so you can build complex agents aggregating from those basic components.
- We also provides features required for multimodal agents, such as native support for VLM models, video processing, and mobile device connection to make it easy for developers and researchers building agents that can reason over not only text, but image, video and audio inputs.
Expand Down Expand Up @@ -90,23 +90,23 @@ For more information about the container.yaml configuration, please refer to the
<img src="docs/images/simpleVQA_webpage.png" width="400"/>

## 🤖 Example Projects
### Video QA Agents
### 1. Video QA Agents
Build a system that can answer any questions about uploaded videos with video understanding agents. See Details [here](examples/video_understanding/README.md).
More about the video understanding agent can be found in [paper](https://arxiv.org/abs/2406.16620).
<p >
<img src="docs/images/OmAgent.png" width="500"/>
</p>


### Mobile Personal Assistant
### 2. Mobile Personal Assistant
Build your personal mulitmodal assistant just like Google Astral in 2 minutes. See Details [here](docs/tutorials/agent_with_app.md).
<p >
<img src="docs/images/readme_app.png" width="200"/>
</p>


### 3. Agentic Operators
We define reusable agent agentic workflows, e.g. CoT, ReAct, and etc as **agent operators**. This project compares various recently proposed reasoning agent operators with the same LLM choice and test datasets. How do they perform? See details [here](docs/concepts/agent_operators.md).
We define reusable agentic workflows, e.g. CoT, ReAct, and etc as **agent operators**. This project compares various recently proposed reasoning agent operators with the same LLM choice and test datasets. How do they perform? See details [here](docs/concepts/agent_operators.md).

| **Algorithm** | **LLM** | **Average** | **gsm8k-score** | **gsm8k-cost($)** | **AQuA-score** | **AQuA-cost($)** |
| :-----------------: | :------------: | :-------------: | :---------------: | :-------------------: | :------------------------------------: | :---: |
Expand Down Expand Up @@ -157,4 +157,4 @@ If you find our repository beneficial, please cite our paper:
journal={arXiv preprint arXiv:2406.16620},
year={2024}
}
```
```