🤖 MacPilot - Advanced macOS UI Automation Framework

Native macOS UI Automation with AI-Powered Intelligence

Quick Start • Features • Architecture • Documentation • Examples

🌟 What is MacPilot?

MacPilot is a state-of-the-art macOS UI automation framework that combines native Apple technologies with AI intelligence to enable human-like interaction with your Mac. Write instructions in plain English, and let MacPilot handle the automation.

✅ Status: Fully Functional - All core components tested and working!

Perfect For:

🔄 Process Automation - Automate repetitive UI tasks
🧪 UI Testing - Test macOS applications
🤖 Desktop RPA - Build robotic process automation
🔍 Screen Analysis - Extract data from UI elements
🧭 Workflow Automation - Create complex UI workflows

🚀 Quick Start

1. Clone and Test

git clone <your-repo-url>
cd MacPilot
make test-framework  # Test without API dependencies

2. Install Dependencies

make install
# or manually:
pip install -r requirements.txt

3. Grant Permissions

System Preferences → Security & Privacy → Accessibility (add Terminal)
System Preferences → Security & Privacy → Screen Recording (add Terminal)

4. Try It Out

# List available capabilities
make capabilities

# Take screenshots
make screenshot

# Or use the CLI directly
cd automation_framework
python -m main execute "Take a screenshot"

✨ Key Features

🧠 Core Intelligence

AI Integration - Natural language instruction processing (OpenAI or local)
Vision Framework - Advanced UI element detection
State Awareness - Real-time system state tracking
Pattern Recognition - Learned UI interaction patterns
Self-healing - Automated error recovery

🎯 Native Integration

Apple Vision - Native OCR and element detection
AppleScript - Deep OS integration
Accessibility APIs - Comprehensive UI control
Cocoa/AppKit - Native macOS frameworks
Core Graphics - Low-level screen capture

🛠 Developer Experience

Async Architecture - Built on modern async Python
Type Safety - Full Pydantic validation
Actor System - Modular action execution
State Management - Comprehensive UI state tracking
Pattern System - Reusable interaction patterns

🔄 Application Control

Chrome Control - Deep browser automation (10 capabilities)
Finder Operations - File system automation (10 capabilities)
System Control - Generic UI operations (11 capabilities)
Menu Navigation - Application menu control
Window Management - Window state control

🏗️ Architecture

graph TD
    A[Natural Language Instructions] --> B[AI Analysis Layer]
    B --> C[Action Planning]
    C --> D[Actor System]
    D --> E[UI Interaction Layer]
    E --> F[State Management]
    F --> B

Confirmed Working Components ✅

Actor System - Chrome, Finder, Generic UI actors
State Management - UI state tracking and validation
Configuration - YAML and environment-based config
CLI Interface - Command-line automation execution
Python API - Programmatic access to all functionality

📖 Documentation

Document	Description
DOCUMENTATION.md	Complete technical documentation (70+ pages)
INSTALLATION.md	Installation and troubleshooting guide
PROJECT_STATUS.md	Current status and capabilities summary
AI_SERVICES_GUIDE.md	Guide to using different AI backends
AI_IMPLEMENTATION_SUMMARY.md	AI system implementation details
`automation_framework/examples/`	Usage examples and patterns

🔧 Configuration Options

Option 1: No AI (Basic Automation)

# Works immediately - direct actor commands
make capabilities

Option 2: OpenAI Integration

export OPENAI_API_KEY="your-key-here"
python -m main execute "Open Chrome and search for Python"

Option 3: Local AI (Recommended)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama2

# Modify services/ai/openai_service.py to use local endpoint
# See INSTALLATION.md for details

🤖 Flexible AI Integration

MacPilot now supports multiple AI backends that can be easily swapped without changing your automation code:

Available AI Services

🌐 OpenAI - Cloud-based GPT integration
🏠 Local LLM - Ollama, LM Studio, any OpenAI-compatible API
👨‍💻 Human Input - Interactive manual control with rich prompts
🔧 Custom APIs - Integrate any HTTP-based AI service
📝 Text Processors - Build custom AI logic

Quick Examples

# Use local Ollama
python -m main execute "Take screenshot" --ai-service ollama

# Use human input for complex tasks
python -m main execute "Complex workflow" --ai-service human

# List available AI services
python -m main list-ai-services

# Test AI services interactively
python ai_test_cli.py interactive

Runtime Service Switching

from automation_framework.main import AutomationFramework
from automation_framework.services.ai.ai_factory import ai_registry

framework = AutomationFramework()

# Switch to local LLM
ollama_service = ai_registry.get_service("ollama")
await framework.orchestrator.set_ai_service(ollama_service)

await framework.execute_instruction("Search for Python tutorials")

Benefits:

🔒 Privacy - Use local models for sensitive data
💰 Cost Control - Avoid API costs with local AI
🔄 Reliability - Automatic fallback between services
🛠️ Flexibility - Easy integration of custom AI systems

💻 Usage Examples

CLI Usage

# List all available functions
make capabilities

# Take screenshots  
make screenshot

# Execute natural language instructions (requires AI)
cd automation_framework
python -m main execute "Open Chrome and go to github.com"

Python API

from automation_framework.main import AutomationFramework
import asyncio

async def main():
    framework = AutomationFramework()
    
    # Simple automation
    await framework.execute_instruction("Take a screenshot")
    
    # Complex workflows (with AI)
    await framework.execute_instruction("""
        1. Open Chrome and search for 'Python tutorials'
        2. Click the first result
        3. Create a folder on Desktop named 'Research'
        4. Take a screenshot and save to Research folder
    """)

asyncio.run(main())

Direct Actor Usage

from automation_framework.actors.chrome.browser import ChromeActorStack

async def chrome_example():
    chrome = ChromeActorStack()
    await chrome.execute_action('open_url', url='https://google.com')
    await chrome.execute_action('execute_script', script='window.scrollTo(0, 500)')

🧪 Testing

# Quick framework test (no dependencies)
make test-framework

# Full test suite
make test

# Manual testing
python test_framework.py

🛠️ Development

Adding New Actors

from automation_framework.actors.base import ActorStack

class MyAppActorStack(ActorStack):
    name = "myapp"
    description = "MyApp automation"
    
    capabilities = {
        'my_action': {
            'params': ['param1'],
            'description': 'Do something with MyApp'
        }
    }
    
    async def execute_action(self, action: str, **kwargs):
        if action == 'my_action':
            return await self._my_action(kwargs['param1'])

Project Structure

automation_framework/
├── actors/          # Application-specific automation
├── services/        # Core services (AI, state, orchestration)  
├── core/           # Configuration and utilities
├── models/         # Pydantic data models
├── config/         # YAML configuration files
└── tests/          # Test suite

🚀 Current Capabilities

Chrome Browser Automation

✅ Open URLs and manage tabs
✅ Execute JavaScript in pages
✅ Navigate and interact with web content
✅ Download files to specified locations

File System Operations

✅ Create, move, delete files and folders
✅ Spotlight search integration
✅ Finder navigation and interaction

Generic UI Automation

✅ Click, type, keyboard shortcuts
✅ Mouse movements and gestures
✅ Window management and screenshots
✅ Application state monitoring

🔄 Makefile Commands

make help           # Show all available commands
make setup          # Run automated setup
make install        # Install dependencies
make test-framework # Test framework (no API needed)
make capabilities   # List all actor capabilities  
make screenshot     # Take screenshots
make clean          # Clean up cache files

🤝 Contributing

Contributions are welcome! The framework is well-architected and ready for extension:

📝 Documentation improvements
🧪 Testing and bug fixes
🎯 New application actors
🔄 Pattern implementations
🚀 Performance optimizations

🎯 Roadmap

High Priority

Enhanced local LLM integration
Web dashboard for monitoring
Visual workflow builder
Safari automation support

Medium Priority

Advanced error recovery
Performance optimizations
Network request monitoring
Multi-monitor support

📊 Project Status

MacPilot is production-ready with:

✅ Solid Architecture: Well-designed, modular, extensible
✅ Native Integration: Deep macOS framework integration
✅ Type Safety: Comprehensive Pydantic validation
✅ Error Handling: Robust error recovery mechanisms
✅ Documentation: Complete technical documentation
✅ Testing: Verified working components

📜 License

MacPilot is MIT licensed. See LICENSE for details.

🙏 Acknowledgments

Apple for comprehensive macOS APIs
OpenAI for AI capabilities
Python community for excellent tooling

Ready to automate your Mac? Get started with make test-framework!

📖 Full Documentation • 🚀 Installation Guide • 📊 Project Status

async def main(): pilot = MacPilot()

# Simple automation
await pilot.execute("Open Chrome and search for 'Python tutorials'")

# Complex workflows
await pilot.execute("""
    1. Find all PDFs in Downloads
    2. Create a folder named 'Documents'
    3. Move PDFs older than 30 days
    4. Create a summary spreadsheet
""")

if name == "main": asyncio.run(main())


### Pattern Example
```python
from macpilot.patterns import register_pattern

@register_pattern("login_flow")
async def handle_login(username: str, password: str):
    return [
        {"action": "click", "target": "username_field"},
        {"action": "type", "text": username},
        {"action": "click", "target": "password_field"},
        {"action": "type", "text": password},
        {"action": "click", "target": "login_button"}
    ]

📋 Todo & Roadmap

High Priority

Medium Priority

Low Priority

🤝 Contributing

Contributions are welcome! Areas we're focusing on:

📝 Documentation improvements
🧪 Testing and bug fixes
🎯 New application actors
🔄 Pattern implementations
🐛 Performance optimizations

Check our Contributing Guide for details.

📜 License

MacPilot is MIT licensed. See LICENSE for details.

🙏 Acknowledgments

Apple for macOS APIs
OpenAI for GPT models
Python community

Made with ❤️ by the MacPilot Team

🌐 Website • 📖 Documentation • 💬 Discord

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.clinerules		.clinerules
.github/prompts		.github/prompts
__pycache__		__pycache__
automation_framework		automation_framework
.env		.env
.env.example		.env.example
.gitignore		.gitignore
AI_IMPLEMENTATION_SUMMARY.md		AI_IMPLEMENTATION_SUMMARY.md
AI_SERVICES_GUIDE.md		AI_SERVICES_GUIDE.md
DOCUMENTATION.md		DOCUMENTATION.md
Dockerfile		Dockerfile
INSTALLATION.md		INSTALLATION.md
MacPilot_LLM_Cheatsheet.md		MacPilot_LLM_Cheatsheet.md
Makefile		Makefile
PROJECT_STATUS.md		PROJECT_STATUS.md
README.MD		README.MD
ai_context_screenshot_20250616_073637.png		ai_context_screenshot_20250616_073637.png
applescript_ai_service.py		applescript_ai_service.py
chain_commander.py		chain_commander.py
cleanup_bloat.sh		cleanup_bloat.sh
debug_mouse.py		debug_mouse.py
demo_human_ai.py		demo_human_ai.py
demo_simulated_human.py		demo_simulated_human.py
direct_applescript.py		direct_applescript.py
docker-compose.yml		docker-compose.yml
macpilot.py		macpilot.py
macpilot_applescript.py		macpilot_applescript.py
macpilot_cmd.py		macpilot_cmd.py
macpilot_server.py		macpilot_server.py
macpilot_simple_yaml.py		macpilot_simple_yaml.py
macpilot_status.py		macpilot_status.py
macpilot_terminal.py		macpilot_terminal.py
macpilot_yaml.py		macpilot_yaml.py
requirements.txt		requirements.txt
safe_mouse.py		safe_mouse.py
send_command_demo.py		send_command_demo.py
setup.py		setup.py
simplified_mouse_method.py		simplified_mouse_method.py
smart_finder.py		smart_finder.py
system_status_20250616_073637.yaml		system_status_20250616_073637.yaml
test_ai_services.py		test_ai_services.py
test_applescript_direct.py		test_applescript_direct.py
test_coordinates.py		test_coordinates.py
test_direct_applescript.py		test_direct_applescript.py
test_framework.py		test_framework.py
test_mouse_move.py		test_mouse_move.py
test_window_output.py		test_window_output.py

JoshuaWink/MacPilot

Folders and files

Latest commit

History

Repository files navigation

🤖 MacPilot - Advanced macOS UI Automation Framework

🌟 What is MacPilot?

Perfect For:

🚀 Quick Start

1. Clone and Test

2. Install Dependencies

3. Grant Permissions

4. Try It Out

✨ Key Features

🧠 Core Intelligence

🎯 Native Integration

🛠 Developer Experience

🔄 Application Control

🏗️ Architecture

Confirmed Working Components ✅

📖 Documentation

🔧 Configuration Options

Option 1: No AI (Basic Automation)

Option 2: OpenAI Integration

Option 3: Local AI (Recommended)

🤖 Flexible AI Integration

Available AI Services

Quick Examples

Runtime Service Switching

💻 Usage Examples

CLI Usage

Python API

Direct Actor Usage

🧪 Testing

🛠️ Development

Adding New Actors

Project Structure

🚀 Current Capabilities

Chrome Browser Automation

File System Operations

Generic UI Automation

🔄 Makefile Commands

🤝 Contributing

🎯 Roadmap

High Priority

Medium Priority

📊 Project Status

📜 License

🙏 Acknowledgments

📋 Todo & Roadmap

High Priority

Medium Priority

Low Priority

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages