A personal AI assistant with screen sharing capabilities that can see and understand your screen to provide contextual help.
- Frontend UI (95% Complete): Modern, responsive design with authentication, dashboard, and AI assistant interface
- Backend API (95% Complete): Express.js server with WebSocket support, JWT authentication, and AI integration
- AI Integration (85% Complete): Gemini AI API integration with screen analysis and voice synthesis
- Authentication System (100% Complete): JWT-based authentication with password hashing and session management
- Real-time Features (90% Complete): WebSocket connections, screen sharing, and live AI responses
- Data Persistence (100% Complete): MongoDB Atlas integration with user data, AI sessions, and conversation history
Your Jerry AI Assistant is now production-ready with complete data persistence using MongoDB Atlas, secure authentication, AI session tracking, and scalable cloud database architecture.
- Live Screen Sharing: Share your screen with AI for real-time analysis
- Cross-Platform: Works on desktop browsers with screen sharing permissions
- Visual Indicators: See when your screen is being shared
- Automatic Updates: Screen updates every 500ms for smooth experience
- Image Compression: Automatic compression for optimal performance
- Context-Aware: AI can see and analyze your shared screen using OCR
- Real-Time Responses: Get instant AI assistance for any task
- Multiple AI Providers: Gemini AI, Hugging Face, and intelligent fallbacks
- Voice Synthesis: Browser TTS using react-speech-kit
- Voice Playback: Audio responses with error handling
- Speech Recognition: Voice input for questions
- Conversation History: All conversations are saved and accessible
- JWT Authentication: Secure token-based authentication
- Password Hashing: bcrypt with salt rounds of 12
- Account Security: Login attempt tracking and automatic lockout
- User Profiles: Complete profile management and preferences
- Session Management: Automatic token refresh and secure logout
- Node.js (v16 or higher)
- npm or yarn
- Modern browser with screen sharing support
- Clone the repository
git clone <repository-url>
cd jerry- Install dependencies
# Backend
cd backend
npm install
# Frontend
cd ../frontend
npm install- Set up environment variables
# Backend (.env)
PORT=5000
FRONTEND_URL=http://localhost:3000
MONGODB_URI=your_mongodb_uri
# Authentication (Required)
JWT_SECRET=your_super_secret_jwt_key_change_in_production
JWT_REFRESH_SECRET=your_super_secret_refresh_key_change_in_production
# AI Services (optional but recommended)
GEMINI_API_KEY=your_gemini_api_key
HUGGING_FACE_API_KEY=your_hugging_face_api_key
# Frontend (.env.local)
NEXT_PUBLIC_BACKEND_URL=http://localhost:5000π‘ Tip: See backend/AI_SETUP.md for detailed setup instructions for AI services and authentication.
π‘ Tip: See backend/MONGODB_SETUP.md for detailed MongoDB Atlas setup instructions.
- Start the servers
# Backend (Terminal 1)
cd backend
npm start
# Frontend (Terminal 2)
cd frontend
npm run dev- Access the application
- Frontend: http://localhost:3000
- Backend: http://localhost:5000
- Sign in to your account or create a new one
- Click "Start AI Assistant" from the dashboard
- Share your screen when prompted (optional but recommended)
- Ask questions via text or voice input
- Get instant AI responses with context from your screen
- Enable voice synthesis to hear AI responses
- Click "Share Screen" in the AI assistant interface
- Select your screen/window when prompted
- AI will analyze your screen using OCR for better context
- Ask questions about what's on your screen
- Get contextual responses based on your screen content
- Context-aware responses based on your screen content
- Voice input for hands-free interaction
- Voice synthesis for audio responses
- Multiple AI providers with automatic fallbacks
- Conversation history for reference
- Frontend: Next.js 14 with TypeScript
- Backend: Node.js with Express and Socket.IO
- Real-time: WebSocket communication for instant updates
- Authentication: JWT-based secure authentication
- Database: MongoDB for data persistence
- MediaDevices API: Uses
getDisplayMedia()for screen capture - Canvas Capture: Converts video frames to base64 images with compression
- WebSocket Broadcasting: Sends screen data to AI processing
- Real-time Updates: 500ms refresh rate for smooth experience
- Image Optimization: Automatic compression to 1280x720 max resolution
- HTTPS Required: Screen sharing requires secure context
- JWT Authentication: Secure token-based user authentication
- Password Security: bcrypt hashing with salt rounds
- Account Protection: Automatic lockout after failed attempts
- Data Privacy: Screen data is not stored permanently
- β Chrome/Chromium (v72+)
- β Firefox (v66+)
- β Safari (v13+)
- β Edge (v79+)
- Screen sharing permission
- Microphone (for voice features)
- Camera (optional)
jerry/
βββ backend/ # Node.js server
β βββ src/
β β βββ controllers/
β β βββ routes/
β β βββ services/
β β βββ models/
β β βββ middleware/
β β βββ utils/
β βββ index.js # Main server file
βββ frontend/ # Next.js application
β βββ app/ # App router pages
β βββ components/ # React components
β βββ lib/ # Utility functions
βββ README.md
- ScreenShare.tsx: Handles screen capture and display
- AIChat.tsx: AI conversation interface
- WebSocket Service: Real-time communication
- AI Service: Multi-provider AI integration
- Auth Service: JWT authentication management
- OCR Service: Tesseract.js text extraction
- Run the test script to verify all components:
cd backend
node test-workflow.js- Manual Testing:
- Start both servers (backend and frontend)
- Create an account or sign in
- Start the AI assistant
- Share your screen
- Ask the AI a question
- Verify voice responses work
- Test voice input functionality
- User signs in with secure authentication β
- User starts AI assistant from dashboard β
- User shares screen via WebRTC β
- Frontend captures screen snapshot β
- Sends image via WebSocket β
- Backend performs OCR (Tesseract) β
- Extracted text sent to Gemini AI β
- Gemini gives intelligent suggestions β
- Backend sends AI response β
- Frontend shows chat + plays voice using browser TTS
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly using the test script
- Submit a pull request
This project is licensed under the MIT License.
For support and questions, please open an issue in the repository.