GitHub - Marways7/cua_desktop_operator_cli_skill: CLI skill that lets any AI agent operate a Windows desktop — clone-ready, model-agnostic, no cloud required

What Is This?

CUA Desktop Operator CLI Skill is a standalone, clone-ready Windows desktop operation skill for AI agents that can execute local shell commands.

The repository root is the skill package. Clone it into a local skills or tools folder, point the agent at SKILL.md, and drive the desktop through python -m desktop_operator_cli ....

agent (OpenClaw / Codex / Claude Code / Cursor / ...)
    └─► local shell command execution
            └─► desktop_operator_cli
                    └─► desktop_operator_core
                            └─► Windows Desktop

Why This Exists

Desktop-agent stacks often assume either brittle scripts or a custom runtime integration layer. This repository targets the missing middle:

the agent can run local commands
the agent can read JSON
the agent can inspect screenshot paths
the user still wants a disciplined observe → plan → act → verify loop

That is exactly what desktop_operator_cli provides.

Key Capabilities

Launch applications, focus windows, and click at absolute or relative coordinates
Send hotkeys, type text, paste multilingual text, scroll, and wait
Capture screenshots, window crops, visible-window inventories, and structured state JSON
Use bounded UI Automation queries and actions when accessibility metadata is available
Reuse built-in macros for search, chat panel toggles, media control, browser navigation, and settings
Operate entirely through CLI JSON without any extra protocol bridge

Architecture

flowchart TB
    A["AI Agent"] -->|"local shell commands"| B["desktop_operator_cli"]
    B --> C["desktop_operator_core"]
    C --> D["Windows Desktop"]

Layer	Role
Skill layer	Tells the agent when and how to use this skill
CLI layer	Exposes a stable command surface and UTF-8 JSON stdout
Runtime layer	Performs desktop actions, screenshots, UIA, macros, and artifact management

Repository Layout

cua_desktop_operator_cli_skill/
├── SKILL.md
├── README*.md
├── LICENSE
├── SECURITY.md
├── requirements.txt
├── agents/
├── references/
│   ├── cli-command-catalog.md
│   ├── shell-agent-setup.md
│   ├── compatibility.md
│   ├── interaction-patterns.md
│   ├── macro-catalog.md
│   └── failure-recovery.md
├── scripts/
│   ├── setup_runtime.ps1
│   ├── run_cli_command.ps1
│   ├── verify_real_tasks.ps1
│   └── verify_real_tasks.py
├── desktop_operator_core/
└── desktop_operator_cli/

Quick Start

Recommended models: this skill works with any shell-capable agent, but the best results usually come from frontier models with stronger computer-use and tool-use performance, such as OpenAI GPT-5.4 and the latest Claude Opus model available in your client.

Step 1 — Clone into a local skill-accessible folder

# For Codex / Claude Code / Cursor
git clone https://github.com/Marways7/cua_desktop_operator_cli_skill "$HOME\.codex\skills\cua_desktop_operator_cli_skill"

# For OpenClaw or any shell-native agent
git clone https://github.com/Marways7/cua_desktop_operator_cli_skill "<your local skills or tools folder>"

Step 2 — Install dependencies

cd cua_desktop_operator_cli_skill
.\scripts\setup_runtime.ps1

Step 3 — Validate the CLI path

python -m desktop_operator_cli observe

Step 4 — Point the agent at `SKILL.md`

The agent should read the root SKILL.md, then use the CLI commands described there.

No extra protocol wiring is required.

CLI Command Reference

Core commands:

observe
find-window
focus-window
click-relative
paste-text
run-macro
validate-state
cleanup-artifacts

Advanced commands:

execute-actions
uia-query
uia-click
uia-type

Full descriptions: references/cli-command-catalog.md

Design Principles

Principle	Meaning
Observation first	Look before acting, and validate after mutating steps
Shell-native interface	Agents call local commands and receive structured JSON
Local execution only	No cloud planner, no remote executor
Small reversible steps	Prefer short command sequences over blind batching
Macro when stable	Reuse macros for repeated flows instead of re-planning them every time

Recommended Workflow for Agents

Run observe.
Inspect screenshot path, active window, visible windows, and state JSON.
Narrow the target with find-window or focus-window.
Prefer run-macro, click-relative, or uia-* over fragile absolute clicks.
Validate with observe or validate-state.
After success, run cleanup-artifacts unless the user wants traces preserved.

Artifact Management

Artifacts are stored outside the repository in a task-scoped local runtime folder.

Typical artifacts include:

full screenshots
target-window crops
state JSON payloads
execution logs
failure screenshots

After a successful task, clean them with:

python -m desktop_operator_cli cleanup-artifacts

Validation

Run:

.\scripts\verify_real_tasks.ps1 --task observe

Available validation targets:

Target	What it tests
`observe`	Screenshot capture and window detection
`notepad`	Launch, type, save in Notepad
`browser`	Browser launch and navigation
`settings`	Open Windows Settings via macro
`media`	Media search/play flow via CLI + macros
`chat`	Chat panel toggle and message send
`all`	Run all supported targets in sequence

To keep artifacts for inspection after validation:

.\scripts\verify_real_tasks.ps1 --task all --keep-artifacts

Acknowledgements

We are grateful to the open-source community and the researchers whose work informed this repository. Special thanks to:

microsoft/cua_skill — for shaping reusable Computer Use Agent skill packaging ideas.
bytedance/UI-TARS-desktop — for influencing observation-first desktop agent interaction patterns.

License

This project is distributed under the GNU Affero General Public License v3.0.

AGPL is used here so that redistributed or hosted modified versions remain open under the same license.

Star History

If this project helps you, please consider giving it a star on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Is This?

Why This Exists

Key Capabilities

Architecture

Repository Layout

Quick Start

Step 1 — Clone into a local skill-accessible folder

Step 2 — Install dependencies

Step 3 — Validate the CLI path

Step 4 — Point the agent at `SKILL.md`

CLI Command Reference

Design Principles

Recommended Workflow for Agents

Artifact Management

Validation

Acknowledgements

License

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agents		agents
desktop_operator_cli		desktop_operator_cli
desktop_operator_core		desktop_operator_core
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.ja.md		README.ja.md
README.ko.md		README.ko.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
README.zh-Hant.md		README.zh-Hant.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

What Is This?

Why This Exists

Key Capabilities

Architecture

Repository Layout

Quick Start

Step 1 — Clone into a local skill-accessible folder

Step 2 — Install dependencies

Step 3 — Validate the CLI path

Step 4 — Point the agent at SKILL.md

CLI Command Reference

Design Principles

Recommended Workflow for Agents

Artifact Management

Validation

Acknowledgements

License

Star History

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 4 — Point the agent at `SKILL.md`

Packages