Jerry - AI-Powered Screen Assistant

A personal AI assistant with screen sharing capabilities that can see and understand your screen to provide contextual help.

Current Status

✅ Completed Features

Frontend UI (95% Complete): Modern, responsive design with authentication, dashboard, and AI assistant interface
Backend API (95% Complete): Express.js server with WebSocket support, JWT authentication, and AI integration
AI Integration (85% Complete): Gemini AI API integration with screen analysis and voice synthesis
Authentication System (100% Complete): JWT-based authentication with password hashing and session management
Real-time Features (90% Complete): WebSocket connections, screen sharing, and live AI responses
Data Persistence (100% Complete): MongoDB Atlas integration with user data, AI sessions, and conversation history

🚀 Production Ready

Your Jerry AI Assistant is now production-ready with complete data persistence using MongoDB Atlas, secure authentication, AI session tracking, and scalable cloud database architecture.

Features

🖥️ Real-Time Screen Sharing

Live Screen Sharing: Share your screen with AI for real-time analysis
Cross-Platform: Works on desktop browsers with screen sharing permissions
Visual Indicators: See when your screen is being shared
Automatic Updates: Screen updates every 500ms for smooth experience
Image Compression: Automatic compression for optimal performance

🤖 AI Assistant

Context-Aware: AI can see and analyze your shared screen using OCR
Real-Time Responses: Get instant AI assistance for any task
Multiple AI Providers: Gemini AI, Hugging Face, and intelligent fallbacks
Voice Synthesis: Browser TTS using react-speech-kit
Voice Playback: Audio responses with error handling
Speech Recognition: Voice input for questions
Conversation History: All conversations are saved and accessible

🔐 Secure Authentication

JWT Authentication: Secure token-based authentication
Password Hashing: bcrypt with salt rounds of 12
Account Security: Login attempt tracking and automatic lockout
User Profiles: Complete profile management and preferences
Session Management: Automatic token refresh and secure logout

Getting Started

Prerequisites

Node.js (v16 or higher)
npm or yarn
Modern browser with screen sharing support

Installation

Clone the repository

git clone <repository-url>
cd jerry

Install dependencies

# Backend
cd backend
npm install

# Frontend
cd ../frontend
npm install

Set up environment variables

# Backend (.env)
PORT=5000
FRONTEND_URL=http://localhost:3000
MONGODB_URI=your_mongodb_uri

# Authentication (Required)
JWT_SECRET=your_super_secret_jwt_key_change_in_production
JWT_REFRESH_SECRET=your_super_secret_refresh_key_change_in_production

# AI Services (optional but recommended)
GEMINI_API_KEY=your_gemini_api_key
HUGGING_FACE_API_KEY=your_hugging_face_api_key

# Frontend (.env.local)
NEXT_PUBLIC_BACKEND_URL=http://localhost:5000

💡 Tip: See backend/AI_SETUP.md for detailed setup instructions for AI services and authentication. 💡 Tip: See backend/MONGODB_SETUP.md for detailed MongoDB Atlas setup instructions.

Start the servers

# Backend (Terminal 1)
cd backend
npm start

# Frontend (Terminal 2)
cd frontend
npm run dev

Access the application

Frontend: http://localhost:3000
Backend: http://localhost:5000

Usage

Getting AI Assistance

Sign in to your account or create a new one
Click "Start AI Assistant" from the dashboard
Share your screen when prompted (optional but recommended)
Ask questions via text or voice input
Get instant AI responses with context from your screen
Enable voice synthesis to hear AI responses

Screen Sharing

Click "Share Screen" in the AI assistant interface
Select your screen/window when prompted
AI will analyze your screen using OCR for better context
Ask questions about what's on your screen
Get contextual responses based on your screen content

AI Features

Context-aware responses based on your screen content
Voice input for hands-free interaction
Voice synthesis for audio responses
Multiple AI providers with automatic fallbacks
Conversation history for reference

Technical Details

Architecture

Frontend: Next.js 14 with TypeScript
Backend: Node.js with Express and Socket.IO
Real-time: WebSocket communication for instant updates
Authentication: JWT-based secure authentication
Database: MongoDB for data persistence

Screen Sharing Implementation

MediaDevices API: Uses getDisplayMedia() for screen capture
Canvas Capture: Converts video frames to base64 images with compression
WebSocket Broadcasting: Sends screen data to AI processing
Real-time Updates: 500ms refresh rate for smooth experience
Image Optimization: Automatic compression to 1280x720 max resolution

Security

HTTPS Required: Screen sharing requires secure context
JWT Authentication: Secure token-based user authentication
Password Security: bcrypt hashing with salt rounds
Account Protection: Automatic lockout after failed attempts
Data Privacy: Screen data is not stored permanently

Browser Support

Screen Sharing Support

✅ Chrome/Chromium (v72+)
✅ Firefox (v66+)
✅ Safari (v13+)
✅ Edge (v79+)

Required Permissions

Screen sharing permission
Microphone (for voice features)
Camera (optional)

Development

Project Structure

jerry/
├── backend/          # Node.js server
│   ├── src/
│   │   ├── controllers/
│   │   ├── routes/
│   │   ├── services/
│   │   ├── models/
│   │   ├── middleware/
│   │   └── utils/
│   └── index.js      # Main server file
├── frontend/         # Next.js application
│   ├── app/          # App router pages
│   ├── components/   # React components
│   └── lib/          # Utility functions
└── README.md

Key Components

ScreenShare.tsx: Handles screen capture and display
AIChat.tsx: AI conversation interface
WebSocket Service: Real-time communication
AI Service: Multi-provider AI integration
Auth Service: JWT authentication management
OCR Service: Tesseract.js text extraction

Testing

Test the Complete Workflow

Run the test script to verify all components:

cd backend
node test-workflow.js

Manual Testing:
- Start both servers (backend and frontend)
- Create an account or sign in
- Start the AI assistant
- Share your screen
- Ask the AI a question
- Verify voice responses work
- Test voice input functionality

Expected Workflow

User signs in with secure authentication ➜
User starts AI assistant from dashboard ➜
User shares screen via WebRTC ➜
Frontend captures screen snapshot ➜
Sends image via WebSocket ➜
Backend performs OCR (Tesseract) ➜
Extracted text sent to Gemini AI ➜
Gemini gives intelligent suggestions ➜
Backend sends AI response ➜
Frontend shows chat + plays voice using browser TTS

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly using the test script
Submit a pull request

License

This project is licensed under the MIT License.

Support

For support and questions, please open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backend		backend
frontend		frontend
README.md		README.md

Uh oh!

Uh oh!

harshaltupe12/ScreenShare

Folders and files

Latest commit

History

Repository files navigation

Jerry - AI-Powered Screen Assistant

Current Status

✅ Completed Features

🚀 Production Ready

Features

🖥️ Real-Time Screen Sharing

🤖 AI Assistant

🔐 Secure Authentication

Getting Started

Prerequisites

Installation

Usage

Getting AI Assistance

Screen Sharing

AI Features

Technical Details

Architecture

Screen Sharing Implementation

Security

Browser Support

Screen Sharing Support

Required Permissions

Development

Project Structure

Key Components

Testing

Test the Complete Workflow

Expected Workflow

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages