Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #6361: [Feature]: Document the App Browser Feature in the OpenHands Documentation Page #6362

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/modules/usage/how-to/app-browser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# App Browser

The App Browser is a feature in OpenHands that allows you to monitor and verify the AI agent's web interactions in real-time. When the agent performs actions in a web browser (like navigating to URLs, clicking buttons, or filling forms), the App Browser displays screenshots of what the agent sees, helping you ensure that the agent is interacting with web pages correctly.

## Features

- **URL Display**: Shows the current URL the agent is visiting
- **Live Screenshots**: Displays real-time screenshots of the web pages the agent is interacting with
- **Visual Verification**: Helps you verify that the agent's web interactions are working as intended

## How It Works

1. When the agent performs web interactions using the `browser` tool, it captures screenshots of the web pages
2. These screenshots are displayed in the App Browser panel in real-time
3. You can see exactly what the agent sees, making it easier to debug or verify web interactions

## Use Cases

The App Browser is particularly useful when:

- Debugging web automation tasks
- Verifying that the agent is interacting with the correct elements on a page
- Ensuring web scraping or form filling tasks are working correctly
- Monitoring the agent's progress during web-based tasks

## Location

You can find the App Browser panel in the OpenHands UI. It displays "No page loaded" when the agent is not currently performing any web interactions.
27 changes: 27 additions & 0 deletions docs/modules/usage/how-to/gui-mode.md
Original file line number Diff line number Diff line change
@@ -109,6 +109,33 @@ The main interface consists of several key components:
- **Settings Button**: A gear icon that opens the settings modal, allowing you to adjust your configuration at any time.
- **Workspace Panel**: Displays the files and folders in your workspace, allowing you to navigate and view files, or the agent's past commands or web browsing history.

### App Browser Feature

The App Browser is a feature that allows you to monitor and verify the AI agent's web interactions:

- **Purpose**: Enables human users to see and verify that the AI agent is correctly implementing web-based tasks and interactions.
- **Capabilities**:
- **Real-time Monitoring**: Watch the agent's web interactions as they happen
- **Visual Verification**: See exactly what the agent sees when interacting with web pages
- **Quality Assurance**: Verify that the agent is performing the correct actions on web pages

#### Using the App Browser

1. **Accessing the Browser**:
- The browser view appears in the workspace panel when the agent is performing web interactions
- You can see the current page and the agent's actions in real-time

2. **Common Use Cases**:
- Verifying that the agent is interacting with the correct web elements
- Monitoring web automation tasks for accuracy
- Ensuring web-based tasks are being executed as intended
- Debugging issues when web interactions aren't working as expected

3. **Browser Controls**:
- The browser panel shows live screenshots of web pages the agent is interacting with
- You can see the current URL and page state
- The chat interface allows you to guide or correct the agent if needed

### Interacting with the AI

1. Type your question, request, or task description in the input box.
5 changes: 5 additions & 0 deletions docs/sidebars.ts
Original file line number Diff line number Diff line change
@@ -69,6 +69,11 @@ const sidebars: SidebarsConfig = {
label: 'Github Actions',
id: 'usage/how-to/github-action',
},
{
type: 'doc',
label: 'App Browser',
id: 'usage/how-to/app-browser',
},
],
},
{