Skip to content

🤖 AI-powered macOS automation framework - Control your Mac with natural language using GPT models. No code needed, just English instructions!

Notifications You must be signed in to change notification settings

JoshuaWink/MacPilot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 MacPilot - Advanced macOS UI Automation Framework

GitHub stars License Python Platform PRs Welcome

Native macOS UI Automation with AI-Powered Intelligence

Quick StartFeaturesArchitectureDocumentationExamples

🌟 What is MacPilot?

MacPilot is a state-of-the-art macOS UI automation framework that combines native Apple technologies with AI intelligence to enable human-like interaction with your Mac. Write instructions in plain English, and let MacPilot handle the automation.

✅ Status: Fully Functional - All core components tested and working!

Perfect For:

  • 🔄 Process Automation - Automate repetitive UI tasks
  • 🧪 UI Testing - Test macOS applications
  • 🤖 Desktop RPA - Build robotic process automation
  • 🔍 Screen Analysis - Extract data from UI elements
  • 🧭 Workflow Automation - Create complex UI workflows

🚀 Quick Start

1. Clone and Test

git clone <your-repo-url>
cd MacPilot
make test-framework  # Test without API dependencies

2. Install Dependencies

make install
# or manually:
pip install -r requirements.txt

3. Grant Permissions

  • System Preferences → Security & Privacy → Accessibility (add Terminal)
  • System Preferences → Security & Privacy → Screen Recording (add Terminal)

4. Try It Out

# List available capabilities
make capabilities

# Take screenshots
make screenshot

# Or use the CLI directly
cd automation_framework
python -m main execute "Take a screenshot"

✨ Key Features

🧠 Core Intelligence

  • AI Integration - Natural language instruction processing (OpenAI or local)
  • Vision Framework - Advanced UI element detection
  • State Awareness - Real-time system state tracking
  • Pattern Recognition - Learned UI interaction patterns
  • Self-healing - Automated error recovery

🎯 Native Integration

  • Apple Vision - Native OCR and element detection
  • AppleScript - Deep OS integration
  • Accessibility APIs - Comprehensive UI control
  • Cocoa/AppKit - Native macOS frameworks
  • Core Graphics - Low-level screen capture

🛠 Developer Experience

  • Async Architecture - Built on modern async Python
  • Type Safety - Full Pydantic validation
  • Actor System - Modular action execution
  • State Management - Comprehensive UI state tracking
  • Pattern System - Reusable interaction patterns

🔄 Application Control

  • Chrome Control - Deep browser automation (10 capabilities)
  • Finder Operations - File system automation (10 capabilities)
  • System Control - Generic UI operations (11 capabilities)
  • Menu Navigation - Application menu control
  • Window Management - Window state control

🏗️ Architecture

graph TD
    A[Natural Language Instructions] --> B[AI Analysis Layer]
    B --> C[Action Planning]
    C --> D[Actor System]
    D --> E[UI Interaction Layer]
    E --> F[State Management]
    F --> B
Loading

Confirmed Working Components ✅

  1. Actor System - Chrome, Finder, Generic UI actors
  2. State Management - UI state tracking and validation
  3. Configuration - YAML and environment-based config
  4. CLI Interface - Command-line automation execution
  5. Python API - Programmatic access to all functionality

📖 Documentation

Document Description
DOCUMENTATION.md Complete technical documentation (70+ pages)
INSTALLATION.md Installation and troubleshooting guide
PROJECT_STATUS.md Current status and capabilities summary
AI_SERVICES_GUIDE.md Guide to using different AI backends
AI_IMPLEMENTATION_SUMMARY.md AI system implementation details
automation_framework/examples/ Usage examples and patterns

🔧 Configuration Options

Option 1: No AI (Basic Automation)

# Works immediately - direct actor commands
make capabilities

Option 2: OpenAI Integration

export OPENAI_API_KEY="your-key-here"
python -m main execute "Open Chrome and search for Python"

Option 3: Local AI (Recommended)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama2

# Modify services/ai/openai_service.py to use local endpoint
# See INSTALLATION.md for details

🤖 Flexible AI Integration

MacPilot now supports multiple AI backends that can be easily swapped without changing your automation code:

Available AI Services

  • 🌐 OpenAI - Cloud-based GPT integration
  • 🏠 Local LLM - Ollama, LM Studio, any OpenAI-compatible API
  • 👨‍💻 Human Input - Interactive manual control with rich prompts
  • 🔧 Custom APIs - Integrate any HTTP-based AI service
  • 📝 Text Processors - Build custom AI logic

Quick Examples

# Use local Ollama
python -m main execute "Take screenshot" --ai-service ollama

# Use human input for complex tasks
python -m main execute "Complex workflow" --ai-service human

# List available AI services
python -m main list-ai-services

# Test AI services interactively
python ai_test_cli.py interactive

Runtime Service Switching

from automation_framework.main import AutomationFramework
from automation_framework.services.ai.ai_factory import ai_registry

framework = AutomationFramework()

# Switch to local LLM
ollama_service = ai_registry.get_service("ollama")
await framework.orchestrator.set_ai_service(ollama_service)

await framework.execute_instruction("Search for Python tutorials")

Benefits:

  • 🔒 Privacy - Use local models for sensitive data
  • 💰 Cost Control - Avoid API costs with local AI
  • 🔄 Reliability - Automatic fallback between services
  • 🛠️ Flexibility - Easy integration of custom AI systems

💻 Usage Examples

CLI Usage

# List all available functions
make capabilities

# Take screenshots  
make screenshot

# Execute natural language instructions (requires AI)
cd automation_framework
python -m main execute "Open Chrome and go to github.com"

Python API

from automation_framework.main import AutomationFramework
import asyncio

async def main():
    framework = AutomationFramework()
    
    # Simple automation
    await framework.execute_instruction("Take a screenshot")
    
    # Complex workflows (with AI)
    await framework.execute_instruction("""
        1. Open Chrome and search for 'Python tutorials'
        2. Click the first result
        3. Create a folder on Desktop named 'Research'
        4. Take a screenshot and save to Research folder
    """)

asyncio.run(main())

Direct Actor Usage

from automation_framework.actors.chrome.browser import ChromeActorStack

async def chrome_example():
    chrome = ChromeActorStack()
    await chrome.execute_action('open_url', url='https://google.com')
    await chrome.execute_action('execute_script', script='window.scrollTo(0, 500)')

🧪 Testing

# Quick framework test (no dependencies)
make test-framework

# Full test suite
make test

# Manual testing
python test_framework.py

🛠️ Development

Adding New Actors

from automation_framework.actors.base import ActorStack

class MyAppActorStack(ActorStack):
    name = "myapp"
    description = "MyApp automation"
    
    capabilities = {
        'my_action': {
            'params': ['param1'],
            'description': 'Do something with MyApp'
        }
    }
    
    async def execute_action(self, action: str, **kwargs):
        if action == 'my_action':
            return await self._my_action(kwargs['param1'])

Project Structure

automation_framework/
├── actors/          # Application-specific automation
├── services/        # Core services (AI, state, orchestration)  
├── core/           # Configuration and utilities
├── models/         # Pydantic data models
├── config/         # YAML configuration files
└── tests/          # Test suite

🚀 Current Capabilities

Chrome Browser Automation

  • ✅ Open URLs and manage tabs
  • ✅ Execute JavaScript in pages
  • ✅ Navigate and interact with web content
  • ✅ Download files to specified locations

File System Operations

  • ✅ Create, move, delete files and folders
  • ✅ Spotlight search integration
  • ✅ Finder navigation and interaction

Generic UI Automation

  • ✅ Click, type, keyboard shortcuts
  • ✅ Mouse movements and gestures
  • ✅ Window management and screenshots
  • ✅ Application state monitoring

🔄 Makefile Commands

make help           # Show all available commands
make setup          # Run automated setup
make install        # Install dependencies
make test-framework # Test framework (no API needed)
make capabilities   # List all actor capabilities  
make screenshot     # Take screenshots
make clean          # Clean up cache files

🤝 Contributing

Contributions are welcome! The framework is well-architected and ready for extension:

  • 📝 Documentation improvements
  • 🧪 Testing and bug fixes
  • 🎯 New application actors
  • 🔄 Pattern implementations
  • 🚀 Performance optimizations

🎯 Roadmap

High Priority

  • Enhanced local LLM integration
  • Web dashboard for monitoring
  • Visual workflow builder
  • Safari automation support

Medium Priority

  • Advanced error recovery
  • Performance optimizations
  • Network request monitoring
  • Multi-monitor support

📊 Project Status

MacPilot is production-ready with:

  • Solid Architecture: Well-designed, modular, extensible
  • Native Integration: Deep macOS framework integration
  • Type Safety: Comprehensive Pydantic validation
  • Error Handling: Robust error recovery mechanisms
  • Documentation: Complete technical documentation
  • Testing: Verified working components

📜 License

MacPilot is MIT licensed. See LICENSE for details.

🙏 Acknowledgments

  • Apple for comprehensive macOS APIs
  • OpenAI for AI capabilities
  • Python community for excellent tooling

Ready to automate your Mac? Get started with make test-framework!

📖 Full Documentation🚀 Installation Guide📊 Project Status

async def main(): pilot = MacPilot()

# Simple automation
await pilot.execute("Open Chrome and search for 'Python tutorials'")

# Complex workflows
await pilot.execute("""
    1. Find all PDFs in Downloads
    2. Create a folder named 'Documents'
    3. Move PDFs older than 30 days
    4. Create a summary spreadsheet
""")

if name == "main": asyncio.run(main())


### Pattern Example
```python
from macpilot.patterns import register_pattern

@register_pattern("login_flow")
async def handle_login(username: str, password: str):
    return [
        {"action": "click", "target": "username_field"},
        {"action": "type", "text": username},
        {"action": "click", "target": "password_field"},
        {"action": "type", "text": password},
        {"action": "click", "target": "login_button"}
    ]

📋 Todo & Roadmap

High Priority

  • User Interface

    • CLI tool for automation scripts
    • Web dashboard for monitoring
    • Visual workflow builder
  • Core Features

    • Local LLM support
    • Improved error recovery
    • Performance optimizations

Medium Priority

  • Documentation

    • API reference
    • Pattern library
    • Example gallery
  • Testing

    • Increase test coverage
    • Integration tests
    • Performance benchmarks

Low Priority

  • Additional Features
    • Safari automation support
    • Network request monitoring
    • Advanced screen recording
    • Workflow marketplace

🤝 Contributing

Contributions are welcome! Areas we're focusing on:

  • 📝 Documentation improvements
  • 🧪 Testing and bug fixes
  • 🎯 New application actors
  • 🔄 Pattern implementations
  • 🐛 Performance optimizations

Check our Contributing Guide for details.

📜 License

MacPilot is MIT licensed. See LICENSE for details.

🙏 Acknowledgments

  • Apple for macOS APIs
  • OpenAI for GPT models
  • Python community

Made with ❤️ by the MacPilot Team

🌐 Website📖 Documentation💬 Discord

About

🤖 AI-powered macOS automation framework - Control your Mac with natural language using GPT models. No code needed, just English instructions!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Other 1.0%