CUA Desktop Operator CLI Skill is a standalone, clone-ready Windows desktop operation skill for AI agents that can execute local shell commands.
The repository root is the skill package. Clone it into a local skills or tools folder, point the agent at SKILL.md, and drive the desktop through python -m desktop_operator_cli ....
agent (OpenClaw / Codex / Claude Code / Cursor / ...)
└─► local shell command execution
└─► desktop_operator_cli
└─► desktop_operator_core
└─► Windows Desktop
Desktop-agent stacks often assume either brittle scripts or a custom runtime integration layer. This repository targets the missing middle:
- the agent can run local commands
- the agent can read JSON
- the agent can inspect screenshot paths
- the user still wants a disciplined observe → plan → act → verify loop
That is exactly what desktop_operator_cli provides.
- Launch applications, focus windows, and click at absolute or relative coordinates
- Send hotkeys, type text, paste multilingual text, scroll, and wait
- Capture screenshots, window crops, visible-window inventories, and structured state JSON
- Use bounded UI Automation queries and actions when accessibility metadata is available
- Reuse built-in macros for search, chat panel toggles, media control, browser navigation, and settings
- Operate entirely through CLI JSON without any extra protocol bridge
flowchart TB
A["AI Agent"] -->|"local shell commands"| B["desktop_operator_cli"]
B --> C["desktop_operator_core"]
C --> D["Windows Desktop"]
| Layer | Role |
|---|---|
| Skill layer | Tells the agent when and how to use this skill |
| CLI layer | Exposes a stable command surface and UTF-8 JSON stdout |
| Runtime layer | Performs desktop actions, screenshots, UIA, macros, and artifact management |
cua_desktop_operator_cli_skill/
├── SKILL.md
├── README*.md
├── LICENSE
├── SECURITY.md
├── requirements.txt
├── agents/
├── references/
│ ├── cli-command-catalog.md
│ ├── shell-agent-setup.md
│ ├── compatibility.md
│ ├── interaction-patterns.md
│ ├── macro-catalog.md
│ └── failure-recovery.md
├── scripts/
│ ├── setup_runtime.ps1
│ ├── run_cli_command.ps1
│ ├── verify_real_tasks.ps1
│ └── verify_real_tasks.py
├── desktop_operator_core/
└── desktop_operator_cli/
Recommended models: this skill works with any shell-capable agent, but the best results usually come from frontier models with stronger computer-use and tool-use performance, such as OpenAI GPT-5.4 and the latest Claude Opus model available in your client.
# For Codex / Claude Code / Cursor
git clone https://github.com/Marways7/cua_desktop_operator_cli_skill "$HOME\.codex\skills\cua_desktop_operator_cli_skill"
# For OpenClaw or any shell-native agent
git clone https://github.com/Marways7/cua_desktop_operator_cli_skill "<your local skills or tools folder>"cd cua_desktop_operator_cli_skill
.\scripts\setup_runtime.ps1python -m desktop_operator_cli observeThe agent should read the root SKILL.md, then use the CLI commands described there.
No extra protocol wiring is required.
Core commands:
observefind-windowfocus-windowclick-relativepaste-textrun-macrovalidate-statecleanup-artifacts
Advanced commands:
execute-actionsuia-queryuia-clickuia-type
Full descriptions: references/cli-command-catalog.md
| Principle | Meaning |
|---|---|
| Observation first | Look before acting, and validate after mutating steps |
| Shell-native interface | Agents call local commands and receive structured JSON |
| Local execution only | No cloud planner, no remote executor |
| Small reversible steps | Prefer short command sequences over blind batching |
| Macro when stable | Reuse macros for repeated flows instead of re-planning them every time |
- Run
observe. - Inspect screenshot path, active window, visible windows, and state JSON.
- Narrow the target with
find-windoworfocus-window. - Prefer
run-macro,click-relative, oruia-*over fragile absolute clicks. - Validate with
observeorvalidate-state. - After success, run
cleanup-artifactsunless the user wants traces preserved.
Artifacts are stored outside the repository in a task-scoped local runtime folder.
Typical artifacts include:
- full screenshots
- target-window crops
- state JSON payloads
- execution logs
- failure screenshots
After a successful task, clean them with:
python -m desktop_operator_cli cleanup-artifactsRun:
.\scripts\verify_real_tasks.ps1 --task observeAvailable validation targets:
| Target | What it tests |
|---|---|
observe |
Screenshot capture and window detection |
notepad |
Launch, type, save in Notepad |
browser |
Browser launch and navigation |
settings |
Open Windows Settings via macro |
media |
Media search/play flow via CLI + macros |
chat |
Chat panel toggle and message send |
all |
Run all supported targets in sequence |
To keep artifacts for inspection after validation:
.\scripts\verify_real_tasks.ps1 --task all --keep-artifactsWe are grateful to the open-source community and the researchers whose work informed this repository. Special thanks to:
- microsoft/cua_skill — for shaping reusable Computer Use Agent skill packaging ideas.
- bytedance/UI-TARS-desktop — for influencing observation-first desktop agent interaction patterns.
This project is distributed under the GNU Affero General Public License v3.0.
AGPL is used here so that redistributed or hosted modified versions remain open under the same license.
Copyright (C) 2026 Marways7 and contributors.
If this project helps you, please consider giving it a star on GitHub.