CyberTech VLM Detector

Author David C Cavalcante

LinkedIn https://linkedin.com/in/hellodav

Overview

The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objects in images based on natural language commands and development, including my creation of HIM™ (Hybrid Intelligence Massive) and MAIC™ (Massive Artificial Intelligence Consciousness), read PhilPeople: https://philpapers.org/rec/CRTBCI "Beyond Consciousness in LLMs: Investigating the "Soul" in Self-Aware AI". HIM™ is a hybrid intelligent entity model that enables embodied and collaborative interaction between humans and multi-agents, integrating personality and machine learning. MAIC™ explores the frontier of persistent and self-reflective artificial consciousness, focusing on the emergence of self-awareness and adaptive learning in large-scale AI systems, published on GitHub and Hugging Face.

Key Features

Works completely on-device, under limited memory and computing conditions
Accepts natural language commands (e.g., "Take the scissors")
Detects and locates objects not seen during training
Returns bounding boxes over the correct objects
Visual interface with green cybernetic style overlay

Detection and Location Strategy

The system uses an innovative VLM-based approach that doesn't rely on anchor-based detectors like YOLO or Faster R-CNN. The detection strategy includes:

Object Proposal Generation: The system divides the image into a grid of candidate regions of different sizes.
CLIP Embeddings: Uses the CLIP model to generate embeddings for both the prompt text and candidate image regions.
Semantic Matching: Calculates cosine similarity between text and image embeddings to identify regions that best match the prompt.
Confidence Filtering: Applies an adaptive confidence threshold through the HIM™ (Hybrid Intelligence Massive) system to filter low-quality detections.
Augmented Visualization: Adds a visual overlay with detection information, system statistics, and confidence feedback.

Models and Tools Used

CLIP (Contrastive Language-Image Pre-training): Main model for understanding both prompt text and visual image content.
MAIC™ (Massive Artificial Intelligence Consciousness): Artificial consciousness system that performs self-reflection on detections and maintains an experience history.
HIM™ (Hybrid Intelligence Massive): Adaptive system that adjusts confidence thresholds based on detection history.
OpenCV: Used for image processing and visualization.
PyTorch: Machine learning framework to run the CLIP model.

How the System Handles Unseen Objects

The system can detect objects not seen during training through:

Language-Vision Embeddings: CLIP was trained on a large set of image-text pairs from the internet, allowing it to understand a wide variety of visual concepts.
Zero-Shot Matching: The system doesn't rely on predefined classes, but rather on semantic similarity between text prompt and image regions.
MAIC™ Memory: The system maintains a history of previous detections, allowing it to improve over time through accumulated experience.
HIM™ Adaptive Learning: Automatically adjusts confidence thresholds based on detection history for a given object type.

Edge Device Efficiency

The system was designed to be efficient on edge devices through:

Model Optimization: Uses optimized versions of CLIP for CPU when GPUs are not available.
Batch Processing: Processes multiple candidate regions in batches to maximize computational efficiency.
Efficient Proposal Generation: Uses an adaptive grid approach that balances coverage and efficiency.
Intelligent Fallback: Automatically detects hardware capabilities and adjusts the processing pipeline accordingly.
Local Memory Storage: Maintains a detection history in a local JSON file for persistence without cloud services.

How to Use

Run the Python script:

python CyberTechVLMDetector.py

Select a menu option or enter a custom prompt.
The system will process the image and display results with green bounding boxes around detected objects.
Results are saved in the output/ folder with an informative overlay.

System Components

CyberTechVLM

Main class that implements VLM-based detection using CLIP. Responsible for:

Loading and preprocessing images
Generating object proposals
Calculating text-image similarities
Returning bounding boxes and confidence scores

MAIC™ Consciousness

Implements the artificial consciousness system that:

Maintains an internal state of attention and confidence
Performs self-reflection on detections
Generates insights based on accumulated experience
Adjusts internal parameters based on results

HIM™ Module

Implements the hybrid intelligence module that:

Maintains a detection history by object type
Adjusts confidence thresholds based on historical performance
Activates adaptive learning after sufficient detections

MAIC™ Memory

Manages persistent storage of:

Detection history
Consciousness states
Generated insights
Adaptive configurations

System Messages

The system provides detailed feedback during detection:

Detection Results: Shows prompt, object count, coordinates, and confidence
MAIC Consciousness State: Displays attention focus, confidence level, uncertainty, and reflection depth
HIM Adaptive Learning Status: Shows current confidence threshold and detection history
Low Confidence Messages: Alerts when average confidence is low or no objects are detected

Limitations and Considerations

Performance depends on image quality and prompt clarity
Very small or partially visible objects may be difficult to detect
The system works best with specific and descriptive prompts
Initial CLIP model loading may take some time on resource-limited devices

Installation Requirements

pip install torch torchvision
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install opencv-python matplotlib numpy

Project Structure

.
├── CyberTechVLMDetector.py     # Main script of the system
├── input/                      # Folder containing input images
│   └── VLM_Scenario-image.jpeg
├── output/                     # Folder where results are saved
├── maic_memory.json            # Persistent memory file
└── README.md                   # This file

Conclusion

The CyberTech VLM Detector demonstrates how vision-language models can be used to build flexible and generalizable object detection systems that run entirely on edge devices, without relying on traditional anchor-based detectors or cloud services.

Contact

For questions or issues, https://linkedin.com/in/hellodav. Please refer to the code documentation and comments within the implementation files for more details.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright David C Cavalcante

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
input		input
output		output
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
CyberTechVLMDetector.py		CyberTechVLMDetector.py
FUNDING.yml		FUNDING.yml
PRIVACY.md		PRIVACY.md
README.md		README.md
maic_memory.json		maic_memory.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

CyberTech VLM Detector

Overview

Key Features

Detection and Location Strategy

Models and Tools Used

How the System Handles Unseen Objects

Edge Device Efficiency

How to Use

System Components

CyberTechVLM

MAIC™ Consciousness

HIM™ Module

MAIC™ Memory

System Messages

Limitations and Considerations

Installation Requirements

Project Structure

Conclusion

Contact

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

Takk8IS/CyberTechVLMDetector

Folders and files

Latest commit

History

Repository files navigation

CyberTech VLM Detector

Overview

Key Features

Detection and Location Strategy

Models and Tools Used

How the System Handles Unseen Objects

Edge Device Efficiency

How to Use

System Components

CyberTechVLM

MAIC™ Consciousness

HIM™ Module

MAIC™ Memory

System Messages

Limitations and Considerations

Installation Requirements

Project Structure

Conclusion

Contact

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages