Native macOS UI Automation with AI-Powered Intelligence
Quick Start • Features • Architecture • Documentation • Examples
MacPilot is a state-of-the-art macOS UI automation framework that combines native Apple technologies with AI intelligence to enable human-like interaction with your Mac. Write instructions in plain English, and let MacPilot handle the automation.
✅ Status: Fully Functional - All core components tested and working!
- 🔄 Process Automation - Automate repetitive UI tasks
- 🧪 UI Testing - Test macOS applications
- 🤖 Desktop RPA - Build robotic process automation
- 🔍 Screen Analysis - Extract data from UI elements
- 🧭 Workflow Automation - Create complex UI workflows
git clone <your-repo-url>
cd MacPilot
make test-framework # Test without API dependenciesmake install
# or manually:
pip install -r requirements.txt- System Preferences → Security & Privacy → Accessibility (add Terminal)
- System Preferences → Security & Privacy → Screen Recording (add Terminal)
# List available capabilities
make capabilities
# Take screenshots
make screenshot
# Or use the CLI directly
cd automation_framework
python -m main execute "Take a screenshot"- AI Integration - Natural language instruction processing (OpenAI or local)
- Vision Framework - Advanced UI element detection
- State Awareness - Real-time system state tracking
- Pattern Recognition - Learned UI interaction patterns
- Self-healing - Automated error recovery
- Apple Vision - Native OCR and element detection
- AppleScript - Deep OS integration
- Accessibility APIs - Comprehensive UI control
- Cocoa/AppKit - Native macOS frameworks
- Core Graphics - Low-level screen capture
- Async Architecture - Built on modern async Python
- Type Safety - Full Pydantic validation
- Actor System - Modular action execution
- State Management - Comprehensive UI state tracking
- Pattern System - Reusable interaction patterns
- Chrome Control - Deep browser automation (10 capabilities)
- Finder Operations - File system automation (10 capabilities)
- System Control - Generic UI operations (11 capabilities)
- Menu Navigation - Application menu control
- Window Management - Window state control
graph TD
A[Natural Language Instructions] --> B[AI Analysis Layer]
B --> C[Action Planning]
C --> D[Actor System]
D --> E[UI Interaction Layer]
E --> F[State Management]
F --> B
- Actor System - Chrome, Finder, Generic UI actors
- State Management - UI state tracking and validation
- Configuration - YAML and environment-based config
- CLI Interface - Command-line automation execution
- Python API - Programmatic access to all functionality
| Document | Description |
|---|---|
| DOCUMENTATION.md | Complete technical documentation (70+ pages) |
| INSTALLATION.md | Installation and troubleshooting guide |
| PROJECT_STATUS.md | Current status and capabilities summary |
| AI_SERVICES_GUIDE.md | Guide to using different AI backends |
| AI_IMPLEMENTATION_SUMMARY.md | AI system implementation details |
automation_framework/examples/ |
Usage examples and patterns |
# Works immediately - direct actor commands
make capabilitiesexport OPENAI_API_KEY="your-key-here"
python -m main execute "Open Chrome and search for Python"# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama2
# Modify services/ai/openai_service.py to use local endpoint
# See INSTALLATION.md for detailsMacPilot now supports multiple AI backends that can be easily swapped without changing your automation code:
- 🌐 OpenAI - Cloud-based GPT integration
- 🏠 Local LLM - Ollama, LM Studio, any OpenAI-compatible API
- 👨💻 Human Input - Interactive manual control with rich prompts
- 🔧 Custom APIs - Integrate any HTTP-based AI service
- 📝 Text Processors - Build custom AI logic
# Use local Ollama
python -m main execute "Take screenshot" --ai-service ollama
# Use human input for complex tasks
python -m main execute "Complex workflow" --ai-service human
# List available AI services
python -m main list-ai-services
# Test AI services interactively
python ai_test_cli.py interactivefrom automation_framework.main import AutomationFramework
from automation_framework.services.ai.ai_factory import ai_registry
framework = AutomationFramework()
# Switch to local LLM
ollama_service = ai_registry.get_service("ollama")
await framework.orchestrator.set_ai_service(ollama_service)
await framework.execute_instruction("Search for Python tutorials")Benefits:
- 🔒 Privacy - Use local models for sensitive data
- 💰 Cost Control - Avoid API costs with local AI
- 🔄 Reliability - Automatic fallback between services
- 🛠️ Flexibility - Easy integration of custom AI systems
# List all available functions
make capabilities
# Take screenshots
make screenshot
# Execute natural language instructions (requires AI)
cd automation_framework
python -m main execute "Open Chrome and go to github.com"from automation_framework.main import AutomationFramework
import asyncio
async def main():
framework = AutomationFramework()
# Simple automation
await framework.execute_instruction("Take a screenshot")
# Complex workflows (with AI)
await framework.execute_instruction("""
1. Open Chrome and search for 'Python tutorials'
2. Click the first result
3. Create a folder on Desktop named 'Research'
4. Take a screenshot and save to Research folder
""")
asyncio.run(main())from automation_framework.actors.chrome.browser import ChromeActorStack
async def chrome_example():
chrome = ChromeActorStack()
await chrome.execute_action('open_url', url='https://google.com')
await chrome.execute_action('execute_script', script='window.scrollTo(0, 500)')# Quick framework test (no dependencies)
make test-framework
# Full test suite
make test
# Manual testing
python test_framework.pyfrom automation_framework.actors.base import ActorStack
class MyAppActorStack(ActorStack):
name = "myapp"
description = "MyApp automation"
capabilities = {
'my_action': {
'params': ['param1'],
'description': 'Do something with MyApp'
}
}
async def execute_action(self, action: str, **kwargs):
if action == 'my_action':
return await self._my_action(kwargs['param1'])automation_framework/
├── actors/ # Application-specific automation
├── services/ # Core services (AI, state, orchestration)
├── core/ # Configuration and utilities
├── models/ # Pydantic data models
├── config/ # YAML configuration files
└── tests/ # Test suite
- ✅ Open URLs and manage tabs
- ✅ Execute JavaScript in pages
- ✅ Navigate and interact with web content
- ✅ Download files to specified locations
- ✅ Create, move, delete files and folders
- ✅ Spotlight search integration
- ✅ Finder navigation and interaction
- ✅ Click, type, keyboard shortcuts
- ✅ Mouse movements and gestures
- ✅ Window management and screenshots
- ✅ Application state monitoring
make help # Show all available commands
make setup # Run automated setup
make install # Install dependencies
make test-framework # Test framework (no API needed)
make capabilities # List all actor capabilities
make screenshot # Take screenshots
make clean # Clean up cache filesContributions are welcome! The framework is well-architected and ready for extension:
- 📝 Documentation improvements
- 🧪 Testing and bug fixes
- 🎯 New application actors
- 🔄 Pattern implementations
- 🚀 Performance optimizations
- Enhanced local LLM integration
- Web dashboard for monitoring
- Visual workflow builder
- Safari automation support
- Advanced error recovery
- Performance optimizations
- Network request monitoring
- Multi-monitor support
MacPilot is production-ready with:
- ✅ Solid Architecture: Well-designed, modular, extensible
- ✅ Native Integration: Deep macOS framework integration
- ✅ Type Safety: Comprehensive Pydantic validation
- ✅ Error Handling: Robust error recovery mechanisms
- ✅ Documentation: Complete technical documentation
- ✅ Testing: Verified working components
MacPilot is MIT licensed. See LICENSE for details.
- Apple for comprehensive macOS APIs
- OpenAI for AI capabilities
- Python community for excellent tooling
Ready to automate your Mac? Get started with make test-framework!
📖 Full Documentation • 🚀 Installation Guide • 📊 Project Status
async def main(): pilot = MacPilot()
# Simple automation
await pilot.execute("Open Chrome and search for 'Python tutorials'")
# Complex workflows
await pilot.execute("""
1. Find all PDFs in Downloads
2. Create a folder named 'Documents'
3. Move PDFs older than 30 days
4. Create a summary spreadsheet
""")
if name == "main": asyncio.run(main())
### Pattern Example
```python
from macpilot.patterns import register_pattern
@register_pattern("login_flow")
async def handle_login(username: str, password: str):
return [
{"action": "click", "target": "username_field"},
{"action": "type", "text": username},
{"action": "click", "target": "password_field"},
{"action": "type", "text": password},
{"action": "click", "target": "login_button"}
]
-
User Interface
- CLI tool for automation scripts
- Web dashboard for monitoring
- Visual workflow builder
-
Core Features
- Local LLM support
- Improved error recovery
- Performance optimizations
-
Documentation
- API reference
- Pattern library
- Example gallery
-
Testing
- Increase test coverage
- Integration tests
- Performance benchmarks
- Additional Features
- Safari automation support
- Network request monitoring
- Advanced screen recording
- Workflow marketplace
Contributions are welcome! Areas we're focusing on:
- 📝 Documentation improvements
- 🧪 Testing and bug fixes
- 🎯 New application actors
- 🔄 Pattern implementations
- 🐛 Performance optimizations
Check our Contributing Guide for details.
MacPilot is MIT licensed. See LICENSE for details.
- Apple for macOS APIs
- OpenAI for GPT models
- Python community
Made with ❤️ by the MacPilot Team