Utilize the VLM in the CV_MCP_Server HuggingFace Space to "see" the objects through the camera of the robot.
CV_Robot_MCP is a Python-based project designed to enable a robot to visually interpret its environment by integrating Vision-Language Models (VLMs). The system communicates with the CV_MCP_Server HuggingFace Space to analyze images captured by the robot's camera and provide meaningful object descriptions.
🎬 Watch the demo video here
- Vision-Language Model Integration: Seamlessly connects your robot's camera feed with a VLM via the HuggingFace Space.
- Python Implementation: Entirely written in Python for flexibility and ease of customization.
- Object Recognition: Identifies and describes objects in the robot's field of view.
- Python 3.10+
- Access to the CV_MCP_Server HuggingFace Space
- Robot hardware with camera support (optional for testing with sample images)
Clone this repository:
git clone https://github.com/OppaAI/CV_Robot_MCP.git
cd CV_Robot_MCPInstall Python requirements:
pip install -r requirements.txtGenerate a HuggingFace token and enter it into the .env file
- Start your robot and ensure the camera is functional.
- Run the main script to capture images and send them to the CV_MCP_Server:
python cv_robot.py
- View the object descriptions returned by the VLM.
- Server URL: Update the HuggingFace Space URL in your code/config files if the endpoint changes.
- Camera Settings: Modify resolution, frame rate, or source in your Python scripts as needed.
Contributions are welcome! Please open issues or submit pull requests for feature requests, bugs, or improvements.
This project is licensed under the MIT License. See LICENSE for details.
- CV_MCP_Server HuggingFace Space
- HuggingFace for hosting the VLM deployment
- VLM used in the HF Space: Qwen2.5-VL-7B-Instruct
