Computer Use Agent

An autonomous computer control agent powered by Vision-Language Models

📰 Updates

Date	Version	Changes
Feb 2026	v2.0	Upgraded to Holo2-4B - improved accuracy and reasoning
June 2025	v1.0	Initial release with Holo1.5-3B

Overview

Computer Use Agent is an autonomous system that can control your computer to complete tasks. Give it a high-level goal like "Search for flights to New York" and watch it navigate, click, type, and interact with your desktop - all powered by a local Vision-Language Model.

The agent uses a tri-role architecture:

Navigator - Analyzes screenshots and decides the next action
Localizer - Finds exact coordinates of UI elements
Validator - Confirms actions were successful (optional)

🚀 Features

Feature	Description
Autonomous Mode	Set a goal and let the agent work independently
Manual Testing	Test localization, navigation, and validation separately
Speed Presets	Quality, Balanced, Fast, Fastest - trade accuracy for speed
Real-time Streaming	Watch the model think and reason live
Stop Control	Interrupt the agent at any time
Thinking Mode	Enable/disable chain-of-thought reasoning

🏗️ Architecture

⚡ Tech Stack

Library	Purpose
PyTorch	Deep learning framework
Transformers	Model loading and inference
Gradio	Web interface
PyAutoGUI	Mouse and keyboard control
mss	Fast screen capture
Pydantic	Data validation
bitsandbytes	8-bit quantization

Requirements

Hardware

NVIDIA GPU with 8GB+ VRAM (tested on RTX 4070 Laptop)
16GB+ RAM recommended

Software

Python 3.10+
CUDA 11.8+
Windows 10/11

Installation

Clone the repository

git clone https://github.com/yourusername/Computer-Use-Agent.git
cd Computer-Use-Agent

Install dependencies

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install transformers gradio pyautogui mss pydantic pillow bitsandbytes

Download the model

Download Holo2-4B from HuggingFace and place it in your preferred directory. Update the MODEL_PATH in core/model.py:
```
MODEL_PATH = r"C:\AI\Holo2-4B"  # change this to your path
```
Run the agent
```
python agent.py
```
Open the UI

Navigate to http://localhost:7860 in your browser.

Usage

Autonomous Mode

Enter a task description (e.g., "Open Notepad and type Hello World")
Select speed preset (Balanced recommended)
Click Start Agent
A new browser tab opens automatically - the agent will work there

Speed Presets

Preset	Resolution	Use Case
Quality	1280px	Best accuracy, slower
Balanced	896px	Good balance
Fast	768px	Faster, still accurate
Fastest	512px	Maximum speed, may miss small elements

Manual Testing

Use the other tabs to test individual components:

Localization - Upload screenshot, describe element, get coordinates
Navigation - Upload screenshot, describe task, get next action
Validator - Verify if an action was successful

Project Structure

Computer-Use-Agent/
├── agent.py              # entry point
├── core/
│   ├── __init__.py
│   ├── model.py          # model loading & inference
│   ├── actions.py        # action classes & execution
│   └── prompts.py        # prompt templates
└── ui/
    ├── __init__.py
    └── gradio_app.py     # web interface

Model

This project uses Holo2-4B from Hcompany, a Vision-Language Model fine-tuned for GUI understanding and computer control tasks.

Model	Parameters	Link
Holo2-4B	4B	huggingface.co/Hcompany/Holo2-4B
Holo1.5-3B	3B	huggingface.co/Hcompany/Holo1.5-3B

Acknowledgments

Hcompany for the Holo models

_{Semester Project - Agentic AI}
_{Spring 2025}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
core		core
imgs		imgs
ui		ui
README.md		README.md
agent.py		agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Use Agent

📰 Updates

Overview

🚀 Features

🏗️ Architecture

⚡ Tech Stack

Requirements

Installation

Usage

Autonomous Mode

Speed Presets

Manual Testing

Project Structure

Model

Acknowledgments

About

Uh oh!

Languages

BrAtUkA/Computer-Use-Agent

Folders and files

Latest commit

History

Repository files navigation

Computer Use Agent

📰 Updates

Overview

🚀 Features

🏗️ Architecture

⚡ Tech Stack

Requirements

Installation

Usage

Autonomous Mode

Speed Presets

Manual Testing

Project Structure

Model

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages