Name		Name	Last commit message	Last commit date
parent directory ..
readme.md		readme.md

readme.md

L2-07: SOTA Foundation Vision Models Benchmarking for Visual Recognition

Overview

This project benchmarks State-Of-The-Art (SOTA) Foundation Vision Models for a variety of visual recognition tasks, including image classification, object detection, and semantic segmentation.

Core Visual Recognition Tasks

Core Visual Recognition Tasks	Description	Examples
Image Classification	Assigns a label to an image.	Classifying an image as "cat" or "dog."
Object Detection	Detects and localizes objects in an image.	Detecting cars and pedestrians in a street scene.
Semantic Segmentation	Classifies each pixel into a category.	Separating road, sky, and pedestrians in an image.
Instance Segmentation	Identifies individual instances of objects and their masks.	Labeling each pedestrian in a crowd separately.
Image Captioning	Generates a textual description of an image.	"A dog playing in the park."
Action Recognition	Identifies actions in an image or video.	Recognizing someone is "running" or "jumping."
Pose Estimation	Estimates joint locations of humans or animals.	Detecting body pose in yoga poses.

Advanced Visual Recognition Tasks

Advanced Visual Recognition Tasks	Description	Examples
Image Segmentation (General)	Divides an image into meaningful regions with pixel-level accuracy.	Separating a cat from the background.
Depth Estimation	Predicts depth for each pixel in an image.	Estimating distances in a 3D scene.
3D Reconstruction from Images	Reconstructs a 3D model from multiple images.	Building a 3D model of a building from photos.
OCR (Optical Character Recognition)	Recognizes and extracts text from images.	Reading a street sign in a photograph.
Image Super-Resolution	Enhances the resolution of an image.	Upscaling a low-resolution image to higher resolution.
Image Inpainting	Fills in missing or corrupted parts of an image.	Restoring damaged areas in an old photograph.
Image Style Transfer	Transfers the style of one image to another.	Applying Van Gogh’s painting style to a photo.

Video-Based Visual Recognition Tasks

Video-Based Visual Recognition Tasks	Description	Examples
Video Classification	Classifies video sequences based on content.	Identifying a video as "sports" or "news."
Object Tracking	Continuously tracks objects across frames.	Following a car in a traffic video.
Video Action Recognition	Recognizes actions in a video sequence.	Identifying a soccer player "kicking a ball."
Video Segmentation	Performs segmentation across video frames.	Segmenting a moving car from the background.
Vision Odometry	Estimates camera motion from a sequence of images.	Estimating a self-driving car's movement.
3D Object Detection from Video	Detects objects and estimates their 3D positions in video.	Detecting pedestrians in a video from a self-driving car.
Action Detection	Identifies specific actions or events in a video stream.	Detecting "running" in a surveillance video.
Video Captioning	Generates textual descriptions for video content.	"A person is playing guitar in the park."
Video Summarization	Creates a condensed version of a video by highlighting key scenes.	Summarizing a 10-minute soccer match into key highlights.
Video Prediction	Predicts future frames in a video sequence.	Anticipating the next frame in a moving car video.

Specialized Tasks

Specialized Tasks	Description	Examples
Self-Supervised Learning (SSL)	Learns features from unlabeled data.	Pretraining a model on large video datasets without labels.
Zero-Shot Classification	Classifies new caories not seen during training.	Recognizing new objects in images using CLIP.
Multi-Modal Image-Text Analysis	Combines image and text for analysis tasks.	Answering questions about image content.

Emerging Research Areas

Emerging Research Areas	Description	Examples
Multi-Modal Learning	Combines visual data with other modalities like text or sound.	Combining video and audio for sentiment analysis.
Few-Shot Learning	Learns to recognize new classes from few labeled examples.	Training on a new animal species with just a few images.

Contributing

If you want to contribute to this project, you are welcome to do so. You can either add new projects, improve existing ones, or fix bugs and errors.

Please follow these steps to contribute:

Fork this repository and clone it to your local machine.
Create a new branch with a descriptive name for your contribution.
Add your code and files to the branch and commit your changes.
Push your branch to your forked repository and create a pull request to the main repository.
Wait for your pull request to be reviewed and merged.

References

SOTA Vision Foundation Models Benchmarking Resources:

Built-In Tools

Vision Foundation Models resources:

Recent Advances in Vision Foundation Models CVPR 2024: https://cvpr.thecvf.com/virtual/2023/tutorial/18558
Foundation Models for Vision - HF collection by @merve : https://huggingface.co/collections/merve/foundation-models-for-vision-6516d5c6af977f435be43ace
Roboflow: https://roboflow.com/model-feature/foundation-vision
The Tenyks Blogger: https://medium.com/@tenyks_blogger/the-foundation-models-reshaping-computer-vision-b299a91527fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L2_07_SOTA_Vision_Foundation_Models_Benchmarking

L2_07_SOTA_Vision_Foundation_Models_Benchmarking

readme.md

L2-07: SOTA Foundation Vision Models Benchmarking for Visual Recognition

Overview

Core Visual Recognition Tasks

Advanced Visual Recognition Tasks

Video-Based Visual Recognition Tasks

Specialized Tasks

Emerging Research Areas

Contributing

References

Files

L2_07_SOTA_Vision_Foundation_Models_Benchmarking

Directory actions

More options

Directory actions

More options

Latest commit

History

L2_07_SOTA_Vision_Foundation_Models_Benchmarking

Folders and files

parent directory

readme.md

L2-07: SOTA Foundation Vision Models Benchmarking for Visual Recognition

Overview

Core Visual Recognition Tasks

Advanced Visual Recognition Tasks

Video-Based Visual Recognition Tasks

Specialized Tasks

Emerging Research Areas

Contributing

References