Skip to content

Marways7/cua_desktop_operator_cli_skill

Repository files navigation


Typing SVG


Visitor counter


Windows Python CLI JSON License: AGPL v3 Agent Neutral


English 简体中文 繁體中文 日本語 한국어


What Is This?

CUA Desktop Operator CLI Skill is a standalone, clone-ready Windows desktop operation skill for AI agents that can execute local shell commands.

The repository root is the skill package. Clone it into a local skills or tools folder, point the agent at SKILL.md, and drive the desktop through python -m desktop_operator_cli ....

agent (OpenClaw / Codex / Claude Code / Cursor / ...)
    └─► local shell command execution
            └─► desktop_operator_cli
                    └─► desktop_operator_core
                            └─► Windows Desktop

Why This Exists

Desktop-agent stacks often assume either brittle scripts or a custom runtime integration layer. This repository targets the missing middle:

  • the agent can run local commands
  • the agent can read JSON
  • the agent can inspect screenshot paths
  • the user still wants a disciplined observe → plan → act → verify loop

That is exactly what desktop_operator_cli provides.


Key Capabilities

  • Launch applications, focus windows, and click at absolute or relative coordinates
  • Send hotkeys, type text, paste multilingual text, scroll, and wait
  • Capture screenshots, window crops, visible-window inventories, and structured state JSON
  • Use bounded UI Automation queries and actions when accessibility metadata is available
  • Reuse built-in macros for search, chat panel toggles, media control, browser navigation, and settings
  • Operate entirely through CLI JSON without any extra protocol bridge

Architecture

flowchart TB
    A["AI Agent"] -->|"local shell commands"| B["desktop_operator_cli"]
    B --> C["desktop_operator_core"]
    C --> D["Windows Desktop"]
Loading
Layer Role
Skill layer Tells the agent when and how to use this skill
CLI layer Exposes a stable command surface and UTF-8 JSON stdout
Runtime layer Performs desktop actions, screenshots, UIA, macros, and artifact management

Repository Layout

cua_desktop_operator_cli_skill/
├── SKILL.md
├── README*.md
├── LICENSE
├── SECURITY.md
├── requirements.txt
├── agents/
├── references/
│   ├── cli-command-catalog.md
│   ├── shell-agent-setup.md
│   ├── compatibility.md
│   ├── interaction-patterns.md
│   ├── macro-catalog.md
│   └── failure-recovery.md
├── scripts/
│   ├── setup_runtime.ps1
│   ├── run_cli_command.ps1
│   ├── verify_real_tasks.ps1
│   └── verify_real_tasks.py
├── desktop_operator_core/
└── desktop_operator_cli/

Quick Start

Recommended models: this skill works with any shell-capable agent, but the best results usually come from frontier models with stronger computer-use and tool-use performance, such as OpenAI GPT-5.4 and the latest Claude Opus model available in your client.

Step 1 — Clone into a local skill-accessible folder

# For Codex / Claude Code / Cursor
git clone https://github.com/Marways7/cua_desktop_operator_cli_skill "$HOME\.codex\skills\cua_desktop_operator_cli_skill"

# For OpenClaw or any shell-native agent
git clone https://github.com/Marways7/cua_desktop_operator_cli_skill "<your local skills or tools folder>"

Step 2 — Install dependencies

cd cua_desktop_operator_cli_skill
.\scripts\setup_runtime.ps1

Step 3 — Validate the CLI path

python -m desktop_operator_cli observe

Step 4 — Point the agent at SKILL.md

The agent should read the root SKILL.md, then use the CLI commands described there.

No extra protocol wiring is required.


CLI Command Reference

Core commands:

  • observe
  • find-window
  • focus-window
  • click-relative
  • paste-text
  • run-macro
  • validate-state
  • cleanup-artifacts

Advanced commands:

  • execute-actions
  • uia-query
  • uia-click
  • uia-type

Full descriptions: references/cli-command-catalog.md


Design Principles

Principle Meaning
Observation first Look before acting, and validate after mutating steps
Shell-native interface Agents call local commands and receive structured JSON
Local execution only No cloud planner, no remote executor
Small reversible steps Prefer short command sequences over blind batching
Macro when stable Reuse macros for repeated flows instead of re-planning them every time

Recommended Workflow for Agents

  1. Run observe.
  2. Inspect screenshot path, active window, visible windows, and state JSON.
  3. Narrow the target with find-window or focus-window.
  4. Prefer run-macro, click-relative, or uia-* over fragile absolute clicks.
  5. Validate with observe or validate-state.
  6. After success, run cleanup-artifacts unless the user wants traces preserved.

Artifact Management

Artifacts are stored outside the repository in a task-scoped local runtime folder.

Typical artifacts include:

  • full screenshots
  • target-window crops
  • state JSON payloads
  • execution logs
  • failure screenshots

After a successful task, clean them with:

python -m desktop_operator_cli cleanup-artifacts

Validation

Run:

.\scripts\verify_real_tasks.ps1 --task observe

Available validation targets:

Target What it tests
observe Screenshot capture and window detection
notepad Launch, type, save in Notepad
browser Browser launch and navigation
settings Open Windows Settings via macro
media Media search/play flow via CLI + macros
chat Chat panel toggle and message send
all Run all supported targets in sequence

To keep artifacts for inspection after validation:

.\scripts\verify_real_tasks.ps1 --task all --keep-artifacts

Acknowledgements

We are grateful to the open-source community and the researchers whose work informed this repository. Special thanks to:


License

This project is distributed under the GNU Affero General Public License v3.0.

AGPL is used here so that redistributed or hosted modified versions remain open under the same license.

Copyright (C) 2026 Marways7 and contributors.


Star History

If this project helps you, please consider giving it a star on GitHub.

Star History Chart

Releases

No releases published

Packages

 
 
 

Contributors