Skip to content
This repository was archived by the owner on Dec 3, 2025. It is now read-only.
/ CV_Robot_MCP Public archive

Utilize the VLM in the CV_MCP_Server HuggingFace Space to "see" the objects through the camera of the robot

License

Notifications You must be signed in to change notification settings

OppaAI/CV_Robot_MCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CV_Robot_MCP

Utilize the VLM in the CV_MCP_Server HuggingFace Space to "see" the objects through the camera of the robot.

Overview

CV_Robot_MCP is a Python-based project designed to enable a robot to visually interpret its environment by integrating Vision-Language Models (VLMs). The system communicates with the CV_MCP_Server HuggingFace Space to analyze images captured by the robot's camera and provide meaningful object descriptions.

🎬 Demo video

🎬 Watch the demo video here

Demo Video

Features

  • Vision-Language Model Integration: Seamlessly connects your robot's camera feed with a VLM via the HuggingFace Space.
  • Python Implementation: Entirely written in Python for flexibility and ease of customization.
  • Object Recognition: Identifies and describes objects in the robot's field of view.

Getting Started

Prerequisites

Installation

Clone this repository:

git clone https://github.com/OppaAI/CV_Robot_MCP.git
cd CV_Robot_MCP

Install Python requirements:

pip install -r requirements.txt

Generate a HuggingFace token and enter it into the .env file

Usage

  1. Start your robot and ensure the camera is functional.
  2. Run the main script to capture images and send them to the CV_MCP_Server:
    python cv_robot.py
  3. View the object descriptions returned by the VLM.

Configuration

  • Server URL: Update the HuggingFace Space URL in your code/config files if the endpoint changes.
  • Camera Settings: Modify resolution, frame rate, or source in your Python scripts as needed.

Contributing

Contributions are welcome! Please open issues or submit pull requests for feature requests, bugs, or improvements.

License

This project is licensed under the MIT License. See LICENSE for details.

Acknowledgements

About

Utilize the VLM in the CV_MCP_Server HuggingFace Space to "see" the objects through the camera of the robot

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages