Vision Agent

Introduction

An overview of all the agents, how they communicate, and how human-in-the-loop works.

vision_agent_v2: A conversational agent that performs a single action per response. Actions are predefined JSON commands, executed externally to maintain control over inputs and code execution.

vision_agent_planner_v2: A planning agent capable of executing Python code over multiple turns using available tools. It explores and tests steps to develop a plan.

vision_agent_coder_v2: A coding agent that generates and tests the final code. It can use the planner autonomously or rely on the finalized plan context provided by the planner.

Communication

The agents communicate through AgentMessage's and return PlanContext's and CodeContext's for the planner and coder agent respectively.

_______________
|VisionAgentV2|
---------------
       |                       ____________________
       -----(AgentMessage)---> |VisionAgentCoderV2|
                               --------------------
                                         |                        ______________________
                                         -----(AgentMessage)----> |VisionAgentPlannerV2|
                                                                  ----------------------
                               ____________________                         |
                               |VisionAgentCoderV2| <----(PlanContext)-------
                               --------------------
_______________                          |
|VisionAgentV2|<-----(CodeContext)--------
---------------

AgentMessage and Contexts

The AgentMessage extends basic chat functionality with additional roles such as conversation, planner, and coder, which are subtypes of the assistant role. These roles correspond to VisionAgentV2, VisionAgentPlannerV2, and VisionAgentCoderV2. Observations result from executing Python code internally by the planner. The VisionAgentPlannerV2 produces a PlanContext containing finalized plans, including instructions and code snippets, which the VisionAgentCoderV2 uses to generate a CodeContext with the final code and supplementary details.

Callbacks

If you want to recieve intermediate messages you can use the update_callback argument in all the V2 constructors. This will asynchronously send AgentMessage's to the callback function you provide. You can see an example of how to run this in app.py

Human-in-the-loop

The Human-in-the-Loop (HITL) feature allows users to interact with agents during a conversation using interaction and interaction_response roles in AgentMessage. It can be enabled by setting hil=True in VisionAgentV2, but it requires using the update_callback to collect and pass messages back to VisionAgentV2. When the planner agent needs human input, it generates an InteractionContext that propagates to the user via VisionAgentV2, exiting the planner. The last AgentMessage collected via update_callback will have a role of interaction and contain a JSON string enclosed in <interaction> tags -

AgentMessage(
    role="interaction",
    content="<interaction>{\"prompt\": \"Should I use owl_v2_image or countgd_counting?\"}</interaction>",
    media=None,
)

The user can then add an additional AgentMessage with the role interaction_response and the response they want to give:

AgentMessage(
    role="interaction_response",
    content="{\"function_name\": \"owl_v2_image\"}",
    media=None,
)

You can see an example of how this works in chat-app/src/components/ChatSection.tsx under the handleSubmit function.

Chatbot and Video Streaming

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
chat-app/src		chat-app/src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
components.json		components.json
main.py		main.py
next-env.d.ts		next-env.d.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcs..config.mjs		postcs..config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision Agent

Introduction

Communication

AgentMessage and Contexts

Callbacks

Human-in-the-loop

Chatbot and Video Streaming

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Ruhul-Quddus-Tamim/Vision-Agent

Folders and files

Latest commit

History

Repository files navigation

Vision Agent

Introduction

Communication

AgentMessage and Contexts

Callbacks

Human-in-the-loop

Chatbot and Video Streaming

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages