diff --git a/CHANGELOG.md b/CHANGELOG.md index 64b3fe2..3cc4041 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,5 @@ # ๆ›ดๆ–ฐๆ—ฅๅฟ— +[๐Ÿ‡ฌ๐Ÿ‡ง English Version here](CHANGELOG_en.md) ๆœฌๆ–‡ๆกฃ่ฎฐๅฝ•้กน็›ฎ็š„ๆ‰€ๆœ‰้‡่ฆๅ˜ๆ›ดใ€‚ diff --git a/CHANGELOG_en.md b/CHANGELOG_en.md new file mode 100644 index 0000000..03d6b79 --- /dev/null +++ b/CHANGELOG_en.md @@ -0,0 +1,91 @@ +# Changelog + +This document records all significant changes made to the project. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and the versioning follows [Semantic Versioning](https://semver.org/). + +## [Unreleased] + +### Added +- First open-source release +- Complete GitHub documentation (README, CONTRIBUTING, LICENSE, etc.) +- Docker support +- Environment variable configuration template + +### Changed +- Improved README structure +- Enhanced code comments + +--- + +## [1.0.0] - 2025-01-XX + +### Added +- ๐Ÿšถ **Blind Path Navigation System** + - Real-time tactile paving detection and segmentation + - Intelligent voice guidance + - Obstacle detection and avoidance + - Sharp turn detection and alerts + - Optical flow stabilization + +- ๐Ÿšฆ **Crosswalk Assistance** + - Crosswalk recognition and direction detection + - Traffic light color recognition + - Alignment guidance system + - Safety reminders + +- ๐Ÿ” **Object Recognition and Search** + - YOLO-E open-vocabulary detection + - MediaPipe hand tracking and guidance + - Real-time object tracking + - Grasp action detection + +- ๐ŸŽ™๏ธ **Real-Time Voice Interaction** + - Alibaba Paraformer ASR + - Qwen-Omni-Turbo multimodal dialogue + - Intelligent command parsing + - Context awareness + +- ๐Ÿ“น **Video and Audio Processing** + - Real-time WebSocket streaming + - Audio-video synchronized recording + - IMU data fusion + - Multi-channel audio mixing + +- ๐ŸŽจ **Visualization and Interaction** + - Real-time web monitoring interface + - IMU 3D visualization + - Status dashboard + - Chinese-language interface + +### Tech Stack +- FastAPI + WebSocket +- YOLO11 / YOLO-E +- MediaPipe +- PyTorch + CUDA +- OpenCV +- DashScope API + +### Known Issues +- [ ] Possible lag on low-end GPUs +- [ ] No GPU acceleration support on macOS +- [ ] Some Chinese fonts render incorrectly on Linux + +--- + +## Versioning Guidelines + +### Major +- Incompatible API changes + +### Minor +- Backward-compatible new features + +### Patch +- Backward-compatible bug fixes + +--- + +[Unreleased]: https://github.com/yourusername/aiglass/compare/v1.0.0...HEAD +[1.0.0]: https://github.com/yourusername/aiglass/releases/tag/v1.0.0 diff --git a/PROJECT_STRUCTURE.md b/PROJECT_STRUCTURE.md index 219dfed..f35057f 100644 --- a/PROJECT_STRUCTURE.md +++ b/PROJECT_STRUCTURE.md @@ -1,4 +1,5 @@ # ้กน็›ฎ็ป“ๆž„่ฏดๆ˜Ž +[๐Ÿ‡ฌ๐Ÿ‡ง English Version here](PROJECT_STRUCTURE_en.md) ๆœฌๆ–‡ๆกฃ่ฏฆ็ป†่ฏดๆ˜Ž้กน็›ฎ็š„็›ฎๅฝ•็ป“ๆž„ๅ’Œไธป่ฆๆ–‡ไปถ็š„ไฝœ็”จใ€‚ diff --git a/PROJECT_STRUCTURE_en.md b/PROJECT_STRUCTURE_en.md new file mode 100644 index 0000000..9f11a8c --- /dev/null +++ b/PROJECT_STRUCTURE_en.md @@ -0,0 +1,398 @@ +# Project Structure Guide + +This document explains the projectโ€™s directory layout and the purpose of the main files. + +## ๐Ÿ“ Directory Structure + +``` +rebuild1002/ +โ”œโ”€โ”€ ๐Ÿ“„ Main Application Files +โ”‚ โ”œโ”€โ”€ app_main.py # App entry point (FastAPI service) +โ”‚ โ”œโ”€โ”€ navigation_master.py # Navigation controller (state machine) +โ”‚ โ”œโ”€โ”€ workflow_blindpath.py # Blind-path navigation workflow +โ”‚ โ”œโ”€โ”€ workflow_crossstreet.py # Crosswalk navigation workflow +โ”‚ โ””โ”€โ”€ yolomedia.py # Item search workflow +โ”‚ +โ”œโ”€โ”€ ๐ŸŽ™๏ธ Speech & Audio +โ”‚ โ”œโ”€โ”€ asr_core.py # Speech recognition core +โ”‚ โ”œโ”€โ”€ omni_client.py # Qwen-Omni client +โ”‚ โ”œโ”€โ”€ qwen_extractor.py # Tag extraction (Chinese โ†’ English) +โ”‚ โ”œโ”€โ”€ audio_player.py # Audio player +โ”‚ โ””โ”€โ”€ audio_stream.py # Audio stream manager +โ”‚ +โ”œโ”€โ”€ ๐Ÿค– Models +โ”‚ โ”œโ”€โ”€ yoloe_backend.py # YOLO-E backend (open vocabulary) +โ”‚ โ”œโ”€โ”€ trafficlight_detection.py # Traffic light detection +โ”‚ โ”œโ”€โ”€ obstacle_detector_client.py # Obstacle detector client +โ”‚ โ””โ”€โ”€ models.py # Model definitions +โ”‚ +โ”œโ”€โ”€ ๐ŸŽฅ Video Processing +โ”‚ โ”œโ”€โ”€ bridge_io.py # Thread-safe frame buffers +โ”‚ โ”œโ”€โ”€ sync_recorder.py # A/V synchronized recording +โ”‚ โ””โ”€โ”€ video_recorder.py # Legacy video recorder +โ”‚ +โ”œโ”€โ”€ ๐ŸŒ Web Frontend +โ”‚ โ”œโ”€โ”€ templates/ +โ”‚ โ”‚ โ””โ”€โ”€ index.html # Main UI HTML +โ”‚ โ”œโ”€โ”€ static/ +โ”‚ โ”‚ โ”œโ”€โ”€ main.js # Main JS +โ”‚ โ”‚ โ”œโ”€โ”€ vision.js # Vision stream handling +โ”‚ โ”‚ โ”œโ”€โ”€ visualizer.js # Data visualization +โ”‚ โ”‚ โ”œโ”€โ”€ vision_renderer.js # Rendering +โ”‚ โ”‚ โ”œโ”€โ”€ vision.css # Styles +โ”‚ โ”‚ โ””โ”€โ”€ models/ # 3D models (IMU visualization) +โ”‚ +โ”œโ”€โ”€ ๐ŸŽต Audio Assets +โ”‚ โ”œโ”€โ”€ music/ # System chimes +โ”‚ โ”‚ โ”œโ”€โ”€ converted_ๅ‘ไธŠ.wav +โ”‚ โ”‚ โ”œโ”€โ”€ converted_ๅ‘ไธ‹.wav +โ”‚ โ”‚ โ””โ”€โ”€ ... +โ”‚ โ””โ”€โ”€ voice/ # Pre-recorded voice lines +โ”‚ โ”œโ”€โ”€ voice_mapping.json +โ”‚ โ””โ”€โ”€ *.wav +โ”‚ +โ”œโ”€โ”€ ๐Ÿง  Model Files +โ”‚ โ””โ”€โ”€ model/ +โ”‚ โ”œโ”€โ”€ yolo-seg.pt # Blind-path segmentation model +โ”‚ โ”œโ”€โ”€ yoloe-11l-seg.pt # YOLO-E open-vocabulary model +โ”‚ โ”œโ”€โ”€ shoppingbest5.pt # Item recognition model +โ”‚ โ”œโ”€โ”€ trafficlight.pt # Traffic light model +โ”‚ โ””โ”€โ”€ hand_landmarker.task # MediaPipe hand model +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“น Recordings +โ”‚ โ””โ”€โ”€ recordings/ # Auto-saved video & audio +โ”‚ โ”œโ”€โ”€ video_*.avi +โ”‚ โ””โ”€โ”€ audio_*.wav +โ”‚ +โ”œโ”€โ”€ ๐Ÿ› ๏ธ ESP32 Firmware +โ”‚ โ””โ”€โ”€ compile/ +โ”‚ โ”œโ”€โ”€ compile.ino # Arduino main program +โ”‚ โ”œโ”€โ”€ camera_pins.h # Camera pin definitions +โ”‚ โ”œโ”€โ”€ ICM42688.cpp/h # IMU driver +โ”‚ โ””โ”€โ”€ ESP32_VIDEO_OPTIMIZATION.md +โ”‚ +โ”œโ”€โ”€ ๐Ÿงช Tests +โ”‚ โ”œโ”€โ”€ test_recorder.py # Recording tests +โ”‚ โ”œโ”€โ”€ test_traffic_light.py # Traffic light tests +โ”‚ โ”œโ”€โ”€ test_cross_street_blindpath.py # Navigation tests +โ”‚ โ””โ”€โ”€ test_crosswalk_awareness.py # Crosswalk awareness tests +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“š Docs +โ”‚ โ”œโ”€โ”€ README.md # Main project doc +โ”‚ โ”œโ”€โ”€ INSTALLATION.md # Install guide +โ”‚ โ”œโ”€โ”€ CONTRIBUTING.md # Contribution guide +โ”‚ โ”œโ”€โ”€ FAQ.md # Frequently Asked Questions +โ”‚ โ”œโ”€โ”€ CHANGELOG.md # Changelog +โ”‚ โ”œโ”€โ”€ SECURITY.md # Security policy +โ”‚ โ””โ”€โ”€ PROJECT_STRUCTURE.md # This file +โ”‚ +โ”œโ”€โ”€ ๐Ÿณ Docker +โ”‚ โ”œโ”€โ”€ Dockerfile # Docker image +โ”‚ โ”œโ”€โ”€ docker-compose.yml # Docker Compose config +โ”‚ โ””โ”€โ”€ .dockerignore # Docker ignore list +โ”‚ +โ”œโ”€โ”€ โš™๏ธ Config +โ”‚ โ”œโ”€โ”€ .env.example # Environment variable template +โ”‚ โ”œโ”€โ”€ .gitignore # Git ignore list +โ”‚ โ”œโ”€โ”€ requirements.txt # Python deps +โ”‚ โ”œโ”€โ”€ setup.sh # Linux/macOS setup script +โ”‚ โ””โ”€โ”€ setup.bat # Windows setup script +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“„ License +โ”‚ โ””โ”€โ”€ LICENSE # MIT License +โ”‚ +โ””โ”€โ”€ ๐Ÿ”ง GitHub + โ””โ”€โ”€ .github/ + โ”œโ”€โ”€ ISSUE_TEMPLATE/ + โ”‚ โ”œโ”€โ”€ bug_report.md + โ”‚ โ””โ”€โ”€ feature_request.md + โ””โ”€โ”€ pull_request_template.md +``` + +## ๐Ÿ”‘ Key Files Overview + +### Main Application Layer + +#### `app_main.py` +- **Purpose:** FastAPI main service handling all WebSocket connections +- **Key Features:** + - WebSocket routing (`/ws/camera`, `/ws_audio`, `/ws/viewer`, etc.) + - Model loading & initialization + - State coordination & management + - Audio/video stream distribution +- **Depends on:** All other modules +- **Entry point:** `python app_main.py` + +#### `navigation_master.py` +- **Purpose:** Central navigation controller; manages the system state machine +- **Primary States:** + - IDLE โ€” idle + - CHAT โ€” dialogue mode + - BLINDPATH_NAV โ€” tactile path navigation + - CROSSING โ€” crosswalk + - TRAFFIC_LIGHT_DETECTION โ€” traffic-light detection + - ITEM_SEARCH โ€” item search +- **Core Methods:** + - `process_frame()` โ€” per-frame processing + - `start_blind_path_navigation()` โ€” start tactile path navigation + - `start_crossing()` โ€” start crosswalk mode + - `on_voice_command()` โ€” handle voice commands + +### Workflow Modules + +#### `workflow_blindpath.py` +- **Purpose:** Core logic for tactile path navigation +- **Features:** + - Path segmentation & detection + - Obstacle detection + - Turn detection + - Optical-flow stabilization + - Directional guidance generation +- **State Machine:** + - ONBOARDING โ€” getting onto the path + - NAVIGATING โ€” navigating along the path + - MANEUVERING_TURN โ€” handling turns + - AVOIDING_OBSTACLE โ€” obstacle avoidance + +#### `workflow_crossstreet.py` +- **Purpose:** Crosswalk navigation logic +- **Features:** + - Crosswalk detection + - Directional alignment + - Guidance generation +- **Core Methods:** + - `_is_crosswalk_near()` โ€” determine crosswalk proximity + - `_compute_angle_and_offset()` โ€” compute angle and lateral offset + +#### `yolomedia.py` +- **Purpose:** Item search workflow +- **Features:** + - YOLO-E prompt-based detection + - MediaPipe hand tracking + - Optical-flow target tracking + - Hand guidance (direction prompts) + - Grasp-action detection +- **Modes:** + - `SEGMENT`, `FLASH`, `CENTER_GUIDE`, `TRACK` + +### Speech / Voice Modules + +#### `asr_core.py` +- **Purpose:** AliCloud Paraformer ASR (real-time speech recognition) +- **Features:** + - Real-time transcription + - VAD (Voice Activity Detection) + - Result callbacks +- **Key Class:** `ASRCallback` + +#### `omni_client.py` +- **Purpose:** Qwen-Omni-Turbo multimodal dialogue client +- **Features:** + - Streaming dialogue generation + - Image + text inputs + - Speech output +- **Core Function:** `stream_chat()` + +#### `audio_player.py` +- **Purpose:** Unified audio playback manager +- **Features:** + - TTS playback + - Multi-channel audio mixing + - Volume control + - Thread-safe playback +- **Core Functions:** `play_voice_text()`, `play_audio_threadsafe()` + +### Model Backends + +#### `yoloe_backend.py` +- **Purpose:** YOLO-E open-vocabulary backend +- **Features:** + - Prompt setup + - Real-time segmentation + - Target tracking +- **Key Class:** `YoloEBackend` + +#### `trafficlight_detection.py` +- **Purpose:** Traffic-light detection module +- **Detection Methods:** + 1. YOLO model detection + 2. HSV color classification (fallback) +- **Output:** Red / Green / Yellow / Unknown + +#### `obstacle_detector_client.py` +- **Purpose:** Obstacle detection client +- **Features:** + - Whitelist category filtering + - In-mask (path) checks + - Object attributes (area, position, risk) + +### Video Processing + +#### `bridge_io.py` +- **Purpose:** Thread-safe frame buffering & distribution +- **Features:** + - Producerโ€“consumer pattern + - Raw frame buffer + - Processed frame fan-out +- **Core Functions:** + - `push_raw_jpeg()` โ€” receive ESP32 frames + - `wait_raw_bgr()` โ€” get raw frame + - `send_vis_bgr()` โ€” send processed frame + +#### `sync_recorder.py` +- **Purpose:** Synchronized audio/video recording +- **Features:** + - Sync record video & audio + - Auto timestamped filenames + - Thread safety +- **Outputs:** `recordings/video_*.avi`, `audio_*.wav` + +### Frontend + +#### `templates/index.html` +- **Purpose:** Web monitoring interface +- **Main Areas:** + - Video stream display + - Status panel + - IMU 3D visualization + - Speech recognition results + +#### `static/main.js` +- **Purpose:** Main JavaScript logic +- **Features:** + - WebSocket connection management + - UI updates + - Event handling + +#### `static/vision.js` +- **Purpose:** Vision stream handling +- **Features:** + - Receive video frames via WebSocket + - Canvas rendering + - FPS calculation + +#### `static/visualizer.js` +- **Purpose:** IMU 3D visualization (Three.js) +- **Features:** + - Receive IMU data + - Real-time pose rendering + - Dynamic lighting effects + +## ๐Ÿ”„ Data Flow + +### Video Stream +``` +ESP32-CAM +โ†’ [JPEG] WebSocket /ws/camera +โ†’ bridge_io.push_raw_jpeg() +โ†’ yolomedia / navigation_master +โ†’ bridge_io.send_vis_bgr() +โ†’ [JPEG] WebSocket /ws/viewer +โ†’ Browser Canvas +``` +### Audio Stream (Upstream) +``` +ESP32-MIC +โ†’ [PCM16] WebSocket /ws_audio +โ†’ asr_core +โ†’ DashScope ASR +โ†’ Recognition Result +โ†’ start_ai_with_text_custom() +``` + +### Audio Stream (Downstream) + +``` +Qwen-Omni / TTS +โ†’ audio_player +โ†’ [PCM16] audio_stream +โ†’ [WAV] HTTP /stream.wav +โ†’ ESP32 Speaker +``` + + +### IMU Data Stream +``` +ESP32-IMU +โ†’ [JSON] UDP 12345 +โ†’ process_imu_and_maybe_store() +โ†’ [JSON] WebSocket /ws +โ†’ visualizer.js (Three.js) +``` + +## ๐ŸŽฏ Key Design Patterns + +### 1. State Machine Pattern +- **Location:** `navigation_master.py` +- **Purpose:** Manage system state transitions +- **States:** IDLE โ†’ CHAT / BLINDPATH_NAV / CROSSING / ... + +### 2. Producerโ€“Consumer Pattern +- **Location:** `bridge_io.py` +- **Purpose:** Decouple video reception and processing +- **Implementation:** Threads + Queues + +### 3. Strategy Pattern +- **Location:** Each `workflow_*.py` +- **Purpose:** Implement different navigation strategies +- **Implementation:** Unified `process_frame()` interface + +### 4. Singleton Pattern +- **Location:** Model loading +- **Purpose:** Share model instances globally +- **Implementation:** Global variables + initialization checks + +### 5. Observer Pattern +- **Location:** WebSocket communication +- **Purpose:** Allow multiple clients to subscribe to video streams +- **Implementation:** `camera_viewers: Set[WebSocket]` + +## ๐Ÿ“ฆ Dependencies +``` +app_main.py +โ”œโ”€โ”€ navigation_master.py +โ”‚ โ”œโ”€โ”€ workflow_blindpath.py +โ”‚ โ”‚ โ”œโ”€โ”€ yoloe_backend.py +โ”‚ โ”‚ โ””โ”€โ”€ obstacle_detector_client.py +โ”‚ โ”œโ”€โ”€ workflow_crossstreet.py +โ”‚ โ””โ”€โ”€ trafficlight_detection.py +โ”œโ”€โ”€ yolomedia.py +โ”‚ โ””โ”€โ”€ yoloe_backend.py +โ”œโ”€โ”€ asr_core.py +โ”œโ”€โ”€ omni_client.py +โ”œโ”€โ”€ audio_player.py +โ”œโ”€โ”€ audio_stream.py +โ”œโ”€โ”€ bridge_io.py +โ””โ”€โ”€ sync_recorder.py +``` + +## ๐Ÿš€ Startup Process + +1. **Initialization Phase** (`app_main.py`) + - Load environment variables + - Load navigation models (YOLO, MediaPipe) + - Initialize the audio system + - Start the recording system + - Preload the traffic light detection model + +2. **Service Launch** (FastAPI) + - Register WebSocket routes + - Mount static files + - Start UDP listener (for IMU data) + - Start HTTP service (port 8081) + +3. **Runtime Phase** + - Wait for ESP32 connection + - Receive video/audio/IMU data + - Process user voice commands + - Push real-time processing results + +4. **Shutdown Phase** + - Stop recording (save files) + - Close all WebSocket connections + - Release model resources + - Clean up temporary files + +--- + +**Note:** For detailed implementation of each module, please refer to the corresponding source file comments and docstrings. diff --git a/README.md b/README.md index bb9ddd9..d200ea1 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ # AI ๆ™บ่ƒฝ็›ฒไบบ็œผ้•œ็ณป็ปŸ ๐Ÿค–๐Ÿ‘“ +[๐Ÿ‡ฌ๐Ÿ‡ง English Version here](README_en.md)
diff --git a/README_en.md b/README_en.md new file mode 100644 index 0000000..c5a9023 --- /dev/null +++ b/README_en.md @@ -0,0 +1,526 @@ +# AI Smart Glasses System for the Visually Impaired ๐Ÿค–๐Ÿ‘“ + +
+ +An intelligent navigation and assistance system designed for visually impaired individuals, integrating tactile-path navigation, crosswalk assistance, object recognition, and real-time voice interaction. +โš ๏ธ *This project is for research and educational purposes only. Do not use it directly with visually impaired users without professional supervision.* + +[Features](#-features) โ€ข [Quick Start](#-quick-start) โ€ข [System Architecture](#-system-architecture) โ€ข [User Guide](#-user-guide) โ€ข [Developer Documentation](#-developer-documentation) + +
+ +--- + + + + + + + +## ๐Ÿ“‹ Table of Contents + +- [Features](#-features) +- [System Requirements](#-system-requirements) +- [Quick Start](#-quick-start) +- [System Architecture](#-system-architecture) +- [Usage Guide](#-user-guide) +- [Configuration](#-configuration-guide) +- [Developer Documentation](#-developer-documentation) +- [FAQ](#-faq) +- [Contribution Guidelines](#contribution-guidelines) +- [License](#-license) +- [Acknowledgments](#acknowledgments) + +--- + +## โœจ Features + +### ๐Ÿšถ Tactile Path Navigation +- **Real-time path detection** โ€” detects tactile paving paths with YOLO segmentation. +- **Smart voice guidance** โ€” provides precise directional prompts (turn left, right, straight, etc.). +- **Obstacle detection and avoidance** โ€” identifies obstacles ahead and plans avoidance routes. +- **Turn detection** โ€” detects sharp turns and gives early voice warnings. +- **Optical flow stabilization** โ€” uses Lucasโ€“Kanade optical flow to stabilize mask tracking. + +### ๐Ÿšฆ Crosswalk Assistance +- **Crosswalk detection** โ€” detects zebra crossings in real time. +- **Traffic light detection** โ€” identifies light states via color and shape. +- **Alignment guidance** โ€” helps align to the crosswalk center. +- **Safety prompts** โ€” announces when the light is green to cross safely. + +### ๐Ÿ” Object Recognition & Search +- **Voice-based search** โ€” e.g. โ€œfind the Red Bull for me.โ€ +- **Real-time tracking** โ€” YOLO-E open vocabulary detection + ByteTrack tracking. +- **Hand guidance** โ€” uses MediaPipe hand detection to guide hand position. +- **Grasp detection** โ€” detects grasp motion confirming the object is picked up. +- **Multi-modal feedback** โ€” visual overlay + audio + centering cue. + +### ๐ŸŽ™๏ธ Real-Time Voice Interaction +- **Speech Recognition (ASR)** โ€” powered by Alibaba DashScope Paraformer. +- **Multimodal conversation** โ€” Qwen-Omni-Turbo supports image + text input with speech output. +- **Smart command parsing** โ€” distinguishes navigation, search, and chat. +- **Context awareness** โ€” ignores irrelevant commands in the wrong mode. + +### ๐Ÿ“น Video & Audio Processing +- **Real-time video streaming** โ€” via WebSocket, supports multiple viewers. +- **Audio/video recording** โ€” saves synchronized recordings with timestamps. +- **IMU fusion** โ€” supports ESP32 IMU data for pose estimation. +- **Multi-channel audio mixing** โ€” system voice, AI responses, ambient audio together. + +### ๐ŸŽจ Visualization & Interaction +- **Web dashboard** โ€” view processed video in real time. +- **3D IMU visualization** โ€” real-time rendering via Three.js. +- **Status panel** โ€” shows current navigation state, detections, FPS, etc. +- **Chinese UI (customizable)** โ€” all UI and speech currently in Chinese, can be localized. + +--- + +## ๐Ÿ’ป System Requirements + +### Hardware +**Server / Development** +- CPU: Intel i5 or above (i7/i9 recommended) +- GPU: NVIDIA GPU with CUDA 11.8+ (RTX 3060 or higher recommended) +- RAM: 8 GB (min) / 16 GB (recommended) +- Storage: 10 GB free + +**Client (optional)** +- ESP32-CAM or WebSocket camera +- Microphone +- Speaker/headphones + +### Software +- OS: Windows 10/11, Ubuntu 20.04+, macOS 10.15+ +- Python 3.9 โ€“ 3.11 +- CUDA 11.8 or higher (for GPU) +- Browser: Chrome 90+, Firefox 88+, Edge 90+ + +### API Keys +- **Alibaba DashScope API Key** (required) + For speech recognition and Qwen-Omni dialogue. + โ†’ + +--- + +## ๐Ÿš€ Quick Start + +### 1. Clone +```bash +git clone https://github.com/yourusername/aiglass.git +cd aiglass/rebuild1002 +``` + +### 2. Install Dependencies +```bash +python -m venv venv +# Windows +venv\Scripts\activate +# Linux/macOS +source venv/bin/activate +pip install -r requirements.txt +``` + + + +#### Install CUDA and cuDNN (GPU Acceleration) + +Please refer to the [NVIDIA CUDA Toolkit Installation Guide](https://developer.nvidia.com/cuda-downloads) + +### 3. Download Model Files + +Place the following model files into the `model/` directory: + +| Model File | Purpose | Size | Download Link | +| ---------------------- | ------------------------- | ----- | --------------------------------------------------------------------------------------------------- | +| `yolo-seg.pt` | Blind path segmentation | ~50MB | [To be added] | +| `yoloe-11l-seg.pt` | Open vocabulary detection | ~80MB | [To be added] | +| `shoppingbest5.pt` | Object recognition | ~30MB | [To be added] | +| `trafficlight.pt` | Traffic light detection | ~20MB | [To be added] | +| `hand_landmarker.task` | Hand detection | ~15MB | [MediaPipe Models](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker#models) | + +### 4. Configure API Key + +Create a `.env` file: + +```bash +# .env +DASHSCOPE_API_KEY=your_api_key_here +``` +Or modify it directly in the code (not recommended): +``` +# app_main.py, line 50 +API_KEY = "your_api_key_here" +``` + +### 5. Start the System + +```bash +python app_main.py +``` +The system will start at http://0.0.0.0:8081. +Open your browser to access the real-time monitoring interface. + +## 6. Connect Devices (optional) + +If you are using an **ESP32-CAM**, follow these steps: + +1. Flash `compile/compile.ino` onto the ESP32. +2. Update the Wi-Fi configuration so it connects to the same network as the server. +3. The ESP32 will automatically connect to the WebSocket endpoint. + +--- + +## ๐Ÿ— System Architecture + +### ๐Ÿงฉ Overall Architecture +``` +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Client Layer โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ ESP32-CAM โ”‚ โ”‚ Browser โ”‚ โ”‚ Mobile โ”‚ โ”‚ +โ”‚ โ”‚ (Video/Audio)โ”‚ โ”‚ (Monitoring) โ”‚ โ”‚ (Voice Ctrl) โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +โ”‚ WebSocket โ”‚ HTTP/WS โ”‚ WebSocket +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ FastAPI Main Service (app_main.py) โ”‚ โ”‚ +โ”‚ โ”‚ - WebSocket routing management โ”‚ โ”‚ +โ”‚ โ”‚ - Audio/video stream distribution โ”‚ โ”‚ +โ”‚ โ”‚ - State coordination โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ ASR Module โ”‚ โ”‚ Omni Dialog โ”‚ โ”‚ Audio Playerโ”‚ โ”‚ +โ”‚ โ”‚ (asr_core) โ”‚ โ”‚(omni_client)โ”‚ โ”‚(audio_player)โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ”‚ โ”‚ +โ”‚ Application Layer โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +โ”‚ โ”‚ โ”‚ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Navigation Control Layer โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ NavigationMaster (navigation_master.py) โ”‚ โ”‚ +โ”‚ โ”‚ - State machine: IDLE / CHAT / BLINDPATH_NAV / โ”‚ โ”‚ +โ”‚ โ”‚ CROSSING / TRAFFIC_LIGHT / ITEM_SEARCH โ”‚ โ”‚ +โ”‚ โ”‚ - Mode switching and coordination โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ Blind Path โ”‚ โ”‚ Cross Street โ”‚ โ”‚ Item Search โ”‚ โ”‚ +โ”‚ โ”‚(blindpath) โ”‚ โ”‚(crossstreet) โ”‚ โ”‚(yolomedia) โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +โ”‚ โ”‚ โ”‚ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Model Inference Layer โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ YOLO Segmentation โ”‚ โ”‚ YOLO-E Detection โ”‚ โ”‚ MediaPipe โ”‚ โ”‚ +โ”‚ โ”‚ (Tactile/Crosswalk)โ”‚ โ”‚ (Open Vocabulary)โ”‚ โ”‚ (Hand Tracking)โ”‚โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ Traffic Light โ”‚ โ”‚ Optical Flow โ”‚ โ”‚ +โ”‚ โ”‚ (HSV + YOLO) โ”‚ โ”‚ (Lucas-Kanade)โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +โ”‚ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ External Services Layer โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ AliCloud DashScope API โ”‚ โ”‚ +โ”‚ โ”‚ - Paraformer ASR (Real-time Speech-to-Text) โ”‚ โ”‚ +โ”‚ โ”‚ - Qwen-Omni-Turbo (Multimodal Dialogue) โ”‚ โ”‚ +โ”‚ โ”‚ - Qwen-Turbo (Tag Extraction) โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +``` + + +### ๐Ÿ”ง Core Module Overview + +| Module | File | Function | +|---------|------|-----------| +| **Main Application** | `app_main.py` | FastAPI service, WebSocket management, and system coordination | +| **Navigation Master** | `navigation_master.py` | State machine management, mode switching, voice throttling | +| **Blind Path Navigation** | `workflow_blindpath.py` | Tactile path detection, obstacle avoidance, and turn guidance | +| **Crosswalk Navigation** | `workflow_crossstreet.py` | Crosswalk detection, traffic light recognition, and alignment guidance | +| **Item Search** | `yolomedia.py` | Object detection, hand guidance, and grasp confirmation | +| **Speech Recognition (ASR)** | `asr_core.py` | Real-time ASR, VAD (Voice Activity Detection), and command parsing | +| **Speech Synthesis (TTS)** | `omni_client.py` | Qwen-Omni streaming speech generation | +| **Audio Playback** | `audio_player.py` | Multi-channel mixing, TTS playback, and volume control | +| **Video Recording** | `sync_recorder.py` | Audio and video synchronized recording | +| **Bridge I/O** | `bridge_io.py` | Thread-safe frame buffering and distribution | + + +## ๐Ÿ“– User Guide + +### ๐ŸŽ™๏ธ Voice Commands + +The system supports the following voice commands โ€” **no wake word required**: + +#### ๐Ÿงญ Navigation Control +``` +"Start navigation" / "Blind path navigation" โ†’ Start tactile path navigation +"Stop navigation" / "End navigation" โ†’ Stop tactile path navigation +"Start crossing" / "Help me cross the street" โ†’ Start crosswalk navigation mode +"Stop crossing" / "End crossing" โ†’ Stop crosswalk mode +``` +#### ๐Ÿšฆ Traffic Light Detection + +``` +"Detect traffic light" / "Check the traffic light" โ†’ Start traffic light detection +"Stop detection" / "Stop traffic light" โ†’ Stop traffic light detection +``` + +#### ๐Ÿ” Item Search +``` +"Find [item name]" โ†’ Start object search +Examples: + +"Find Red Bull" + +"Find AD Calcium Milk" + +"Find mineral water" +"Found it" / "Got it" โ†’ Confirm that the item has been found or picked up +``` + +#### ๐Ÿ’ฌ Smart Interaction +``` +"Tell me what this is" โ†’ Capture image and identify object +"Can I eat this?" โ†’ Ask about an item +Any other question โ†’ Start AI conversation +``` + +### ๐Ÿงญ Navigation Status Overview + +The system includes the following main states (automatically switched): + +1. **IDLE** โ€“ *Idle State* + - Waiting for user commands + - Displays raw video stream + +2. **CHAT** โ€“ *Dialogue Mode* + - Engages in multimodal conversation with the AI + - Navigation temporarily paused + +3. **BLINDPATH_NAV** โ€“ *Tactile Path Navigation* + - **ONBOARDING**: Guiding onto the tactile path + - **ROTATION**: Aligning with the path direction + - **TRANSLATION**: Moving to the center of the path + - **NAVIGATING**: Following the tactile path + - Real-time direction correction + - Obstacle detection and avoidance + - **MANEUVERING_TURN**: Handling turns + - **AVOIDING_OBSTACLE**: Obstacle avoidance + +4. **CROSSING** โ€“ *Crosswalk Mode* + - **SEEKING_CROSSWALK**: Searching for crosswalk + - **WAIT_TRAFFIC_LIGHT**: Waiting for green light + - **CROSSING**: Crossing the street + - **SEEKING_NEXT_BLINDPATH**: Searching for the tactile path on the other side + +5. **ITEM_SEARCH** โ€“ *Item Search Mode* + - Detects target object in real time + - Guides userโ€™s hand toward the object + - Confirms successful grasp + +6. **TRAFFIC_LIGHT_DETECTION** โ€“ *Traffic Light Detection* + - Detects real-time traffic light status + - Provides voice feedback for color changes + +--- + +### ๐Ÿ–ฅ๏ธ Web Monitoring Interface + +Open your browser and visit `http://localhost:8081` to view: + +- **Real-Time Video Stream** โ€“ Displays processed video with navigation overlays +- **Status Panel** โ€“ Shows current mode, detection info, and FPS +- **IMU Visualization** โ€“ 3D rendering of device orientation +- **Speech Recognition Results** โ€“ Displays recognized speech and AI responses + +--- + +### ๐Ÿ”Œ WebSocket Endpoints + +| Endpoint | Purpose | Data Format | +|-----------|----------|-------------| +| `/ws/camera` | ESP32 camera streaming | Binary (JPEG) | +| `/ws/viewer` | Browser video viewer | Binary (JPEG) | +| `/ws_audio` | ESP32 audio upload | Binary (PCM16) | +| `/ws_ui` | UI status updates | JSON | +| `/ws` | IMU data input | JSON | +| `/stream.wav` | Audio stream download | Binary (WAV) | + + +## โš™๏ธ Configuration Guide + +### ๐ŸŒ Environment Variables + +Create a `.env` file and set the following parameters: + +```bash +# AliCloud API +DASHSCOPE_API_KEY=sk-xxxxx + +# Model paths (optional โ€” can be omitted if using default) +BLIND_PATH_MODEL=model/yolo-seg.pt +OBSTACLE_MODEL=model/yoloe-11l-seg.pt +YOLOE_MODEL_PATH=model/yoloe-11l-seg.pt + +# Navigation parameters +AIGLASS_MASK_MIN_AREA=1500 # Minimum mask area +AIGLASS_MASK_MORPH=3 # Morphological kernel size +AIGLASS_MASK_MISS_TTL=6 # Frame tolerance for mask loss +AIGLASS_PANEL_SCALE=0.65 # Data panel scale + +# Audio configuration +TTS_INTERVAL_SEC=1.0 # Interval between voice prompts +ENABLE_TTS=true # Enable text-to-speech output +``` +### ๐Ÿง  Modify Model Paths + +If the model files are not located in their default paths, you can update the corresponding files as follows: + +```python +# workflow_blindpath.py +seg_model_path = "your/custom/path/yolo-seg.pt" + +# yolomedia.py +YOLO_MODEL_PATH = "your/custom/path/shoppingbest5.pt" +HAND_TASK_PATH = "your/custom/path/hand_landmarker.task" +``` + +### โš™๏ธ Performance Tuning + +Adjust the following parameters based on your hardware performance: + +```python +# yolomedia.py +HAND_DOWNSCALE = 0.8 # Hand detection downscale (smaller = faster, but less accurate) +HAND_FPS_DIV = 1 # Frame skipping for hand detection (2 = every 2 frames, 3 = every 3 frames) + +# workflow_blindpath.py +FEATURE_PARAMS = dict( + maxCorners=600, # Number of optical flow feature points (fewer = faster) + qualityLevel=0.001, # Minimum quality of feature points + minDistance=5 # Minimum distance between feature points +) +``` +## ๐Ÿ› ๏ธ Developer Documentation + +### ๐Ÿ—ฃ๏ธ Adding a New Voice Command + +1. In `app_main.py`, inside the `start_ai_with_text_custom()` function, add: + +```python +# Check for a new custom command +if "your new keyword" in user_text: + # Execute custom logic + print("[CUSTOM] New command triggered") + await ui_broadcast_final("[System] New feature activated") + return +``` +2. To modify the command filtering rules: + +```python +# Update the allowed_keywords list +allowed_keywords = ["help me see", "help me find", "your new keyword"] +``` + +### ๐Ÿš€ Extending Navigation Functionality + +1. In `workflow_blindpath.py`, add a new state: + +```python +# Initialize inside BlindPathNavigator.__init__() +self.your_new_state_var = False + +# Handle it inside process_frame() +def process_frame(self, image): + if self.your_new_state_var: + # Custom processing logic + guidance_text = "New state guidance" + # ... +``` + +2. In `navigation_master.py`, add a new state to the state machine: + +```python +class NavigationMaster: + def start_your_new_mode(self): + self.state = "YOUR_NEW_MODE" + # Initialization logic +``` + +### ๐Ÿค– Integrating a New Model + +1. Create a model wrapper class: + +```python +# your_model_wrapper.py +class YourModelWrapper: + def __init__(self, model_path): + self.model = load_your_model(model_path) + + def detect(self, image): + # Inference logic + return results +``` + +2. Load the model in `app_main.py`: + +```python +your_model = YourModelWrapper("model/your_model.pt") +``` + +3. Call it within the corresponding workflow: + +```python +results = your_model.detect(image) +``` +### ๐Ÿงฉ Debugging Tips + +1. **Enable detailed logging:** + +```python +# At the top of app_main.py +import logging +logging.basicConfig(level=logging.DEBUG) +``` + +2. **Check frame rate performance bottlenecks:** + +```python +# yolomedia.py +PERF_DEBUG = True # Print processing time per frame +``` + +3. **Test individual modules:** + +```bash +# Test tactile path navigation +python test_cross_street_blindpath.py + +# Test traffic light detection +python test_traffic_light.py + +# Test recording feature +python test_recorder.py +``` + +## โ“ FAQ +TBD - The Original Chinese version has it in the Table of Contents but no actual section for this. + +## Contribution Guidelines +TBD - The Original Chinese version has it in the Table of Contents but no actual section for this. + +## ๐Ÿ“„ License + +This project is licensed under the **MIT License** โ€” see the [LICENSE](LICENSE) file for details. + +## Acknowledgments +TBD - The Original Chinese version has it in the Table of Contents but no actual section for this. \ No newline at end of file diff --git a/assets/images/6dd19750-57af-4560-a007-9a7059956b53_en.png b/assets/images/6dd19750-57af-4560-a007-9a7059956b53_en.png new file mode 100644 index 0000000..19d66db Binary files /dev/null and b/assets/images/6dd19750-57af-4560-a007-9a7059956b53_en.png differ diff --git a/assets/images/bc7d1aac-a9e9-4ef8-9d67-224708d0c9fd_en2.png b/assets/images/bc7d1aac-a9e9-4ef8-9d67-224708d0c9fd_en2.png new file mode 100644 index 0000000..d454870 Binary files /dev/null and b/assets/images/bc7d1aac-a9e9-4ef8-9d67-224708d0c9fd_en2.png differ diff --git a/assets/images/e8dec4a6-8fa6-4d94-bd66-4e9864b67daf_en.png b/assets/images/e8dec4a6-8fa6-4d94-bd66-4e9864b67daf_en.png new file mode 100644 index 0000000..35b4a63 Binary files /dev/null and b/assets/images/e8dec4a6-8fa6-4d94-bd66-4e9864b67daf_en.png differ