Skip to content
This repository has been archived by the owner on Jan 4, 2025. It is now read-only.

BinaryAlien/MLVOT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Individual Report

Introduction

This individual report aims to

  • outline my experience with the project,
  • clearly describe the tasks I undertook throughout the project,
  • and identify the challenges I encountered and explain how I addressed them.

Part 1: Single Object Tracking with Kalman (Centroid-Tracker)

Objective

Implement object tracking using a pre-existing object detection algorithm and integrate the Kalman Filter for smooth and accurate tracking.

Tasks

The main tasks I undertook for this part are the following:

  1. Implement a KalmanFilter class with predict and update methods.
  2. Implement an ObjectTracker class to manage object tracking.
  3. Implement a Visualizer class to visualize the tracking results.

KalmanFilter

The KalmanFilter class is responsible for estimating and predicting the state of a moving object based on noisy measurements.

Attributes

Name Description
dt Time for one cycle used to estimate state (sampling rate)
u Accelerations in the x and y directions
x State vector representing the object's position and velocity
A State transition matrix
B Control matrix
H Measurement mapping matrix
P State covariance matrix
Q Process noise covariance matrix
R Measurement noise covariance matrix

Methods

  1. predict projects the current state estimate forward in time to predict the next state.
  2. update uses a new measurement to adjust the state estimate.

ObjectTracker

The ObjectTracker class is responsible for managing detection information as well as Kalman Filter estimation and prediction for the tracked object.

Attributes

Name Description
kalman_filter Instance of KalmanFilter to use at each time step
detect Function to call to detect the object in a frame
state Current object detection, estimation and prediction

Methods

The class has a single step method called at each frame responsible for

  • detecting the object in the given frame using self.detect,
  • predicting the object's state using self.kalman_filter,
  • and updating the self.kalman_filter state.

Visualizer

The Visualizer class is responsible for drawing object state, prediction and estimation on the target frame to visualize tracking results.

Attributes

The class has a single attribute tracker which is an instance of ObjectTracker to extract state information from.

Methods

The class has a single public method show responsible for drawing

  • the object detection as a green circle,
  • the object state prediction as a blue bounding box,
  • and the object state estimation as a red bounding box.

Challenges

  • KalmanFilter
    • Understand the shape of data to replicate the equations seen in class.
    • Beware to use a column vector instead of a line vector for the state vector x.
  • Visualizer
    • Understand the image format used by default (i.e. BGR instead of RGB).
    • Using OpenCV to read a video frame by frame.

Part 2: IOU-based Tracking (Bounding-Box Tracker)

Objective

Develop a Simple IoU-based tracker and extend it for multiple object tracking.

Tasks

The main tasks I undertook for this part are the following:

  1. Implement a bbox module to manipulate bounding boxes.
  2. Implement a track module to manage and export tracks.
  3. Implement a visualize module to handle visualization and video exports.

bbox

The bbox module is composed of the following:

  1. A BoundingBox dataclass representing a bounding box in the detection CSV.
  2. An intersection_over_union function to compute the IoU of two bounding boxes.
BoundingBox
  • Dataclass which represents a bounding box by its top-left and bottom-right points.
  • The properties width and height return the bounding box's dimensions along each axis.
  • The property area computes the area of the bounding box.
intersection_over_union

Function which takes 2 bounding boxes as input and returns the ratio of their intersection over their union.

track

The track module is composed of the following:

  1. A Track dataclass representing a track in the CSV.
  2. A TrackManager class to manage the tracks across video frames.
  3. A TrackHistory class to handle exporting tracks to a file.
Track

Dataclass responsible for storing track information: frame number, ID and bounding box.

TrackManager

Class responsible for creating, updating and deleting tracks across the frames of a video.

Attributes

Name Description
tracks List of current tracks.
next_track_id Object ID for the next upcoming track.

Methods

The class has a single public method step called at each frame responsible for

  • matching detections with tracks,
  • updating matched tracks,
  • creating new tracks for unmatched detections,
  • and deleting unmatched tracks.

Algorithm

The manager algorithm can be described as follows:

  1. Match the tracks with the detections.
    • Compute a similarity matrix where $c_{i,j}$ is the IoU between the $i$-th track and $j$-th detection.
    • Compute associations using the Hungarian algorithm with scipy.optimize.linear_sum_assignment.
  2. Update the matched tracks.
    • Get the track with the corresponding ID.
    • Replace its bounding box with the associated detection.
    • Include the track in the new tracks list.
  3. Create new tracks for unmatched detections.
    • Create new track based on the detection with self.next_track_id as its ID.
    • Increment self.next_track_id to ensure unique IDs across following tracks.
  4. Replace the self.tracks current tracks with the new tracks list.
    • Any previous tracks not included is lost (i.e. unmatched tracks are removed).
TrackHistory

Class responsible for storing and exporting tracks.

Methods

  • dump: Dumps the tracks to a CSV file.
  • extend: Stores a list of tracks.
  • push: Stores a track.

visualize

The visualize module is composed of the following:

  • A Canvas class to draw on a given frame.
  • A Visualizer class to visualize tracking results.
  • A Video class responsible for tracking results video exports.
Canvas

Class responsible for drawing on a given frame.

Attributes

Name Description
data Frame on which to draw.

Methods

  • draw_bbox: Draws a given bounding box.
  • draw_text: Draws some text.
Visualizer

Class responsible for drawing bounding boxes and IDs of tracks.

Methods

  • draw: Draws the current tracks stored by a track manager on the given frame.
  • show: Opens a window to visualize the current frame (possibly with track drawings).
Video

Class responsible for simplifying the use of cv2.VideoWriter.

Methods

  • write: Creates an instance of cv2.VideoWriter if necessary and writes the given frame.
  • release: Releases the cv2.VideoWriter instance.

Challenges

  • bbox: Handle the case where bounding boxes have negative intersections.
    • The dimensions are set to zero whenever the intersections are negative.
  • TrackManager
    • Understanding the scipy.optimize.linear_sum_assignment function.
    • Understanding how to know when detections or tracks are unmatched.

Part 3: Kalman-Guided IoU Tracking (Bounding-Box Tracker)

Objective

Extend IoU-based MOT with Hungarian Algorithm by adding Kalman Filter.

Tasks

The main tasks I undertook for this part are the following:

  1. Adapt the BoundingBox class to compute its centroid.
  2. Adapt the KalmanFilter class to handle bounding boxes through their centroids.
  3. Adapt the TrackManager class to use Kalman Filter predictions.
  4. Adapt the visualize module to handle visualization of predicted centroids.

BoundingBox

Add a property center which computes the centroid of the bounding box based on its coordinates.

KalmanFilter

Change the update method to take in a BoundingBox and use its center property for predictions.

TrackManager

  • Add a filter_params argument to the constructor to initialize the kalman filters of new tracks.
  • Add a filters dictionary attribute which associates a track ID with its corresponding KalmanFilter instance.
  • Change the step method to account for creating and updating the KalmanFilter instances of each track.
  • Use predictions from the corresponding KalmanFilter instances instead of the last known detection bounding boxes for computing the similarity score between tracks and detections.

visualize

  • Add a draw_cross function to the Canvas class in order to draw a X shape at a given point.
  • Adapt the Visualizer class to draw a cross at the corresponding centroids predicted by the KalmanFilter instances for the current tracks.

Challenges

  • As this part consists mostly of adapting the existing pipeline, I did not really encounter any major challenges apart from correctly translating coordinates between centroids and bounding box corner points.

Part 4: Appearance-Aware IoU-Kalman Object Tracker

Objective

Extend IoU-Kalman tracker to include object re-identification (ReID).

Tasks

The main tasks I undertook for this part are the following:

  1. Implement a reident class to compute appearance features of a patch.
  2. Adapt the TrackManager to account for appearance features of patches.

reident

This module is composed of the following:

  • An ObjectIdentifier class responsible for computing appearance features of a patch.
  • An extract_patch function to extract a patch from a frame.
  • A normalize_patch function to normalize a given patch.
ObjectIdentifier

Class responsible for computing appearance features using a lightweight pre-trained model.

Attributes

Name Description
feature_extractor Sequential layer of MobileNet v2 excluding the classifier head.
device Device on which to run the computation.

Methods

The class has a single public method __call__ called on a frame and a bounding box responsible for:

  • Extracting the corresponding patch through extract_patch.
  • Normalizing the corresponding patch through normalize_patch.
  • Computing the appearance features using the feature_extractor attribute.
extract_patch

This function is responsible for cropping a frame to the given bounding box.

normalize_patch

This function is responsible for preprocessing the patch by

  • resizing the input patch to 224 x 224,
  • Converting the colors from BGR to RGB,
  • and normalizing the values for the model using a Z-score.

TrackManager

  • Add an identifier attribute and constructor argument to store an ObjectIdentifier instance.
  • Add a track_features dictionary attributes which associates a track ID with its feature vector.
  • Add a weights attribute which provides weights for IoU and appearance features scores.
  • Change the step method to account for updating the feature vectors of each track.
  • Use weighted sum between IoU score and appearance score for computing the similarity score between tracks and detections.

Challenges

  • Correctly extracting the feature layers of MobileNet.
    • I used to only extract the features of the last convolution layer which led to poor results.
    • I got much better results by leveraging the features of the last dense layer before the classifier head.
  • Finding the best distance function.
    • I tried different distances empirically and cosine similarity seemed to perform better in my case.

Part 5: Appearance-Aware IoU-Kalman Object Tracker: Detector Extension

Objective

Integrate more efficient lightweight deep Learning-based object detector for pedestrian detection.

Tasks

The main tasks I undertook for this part are the following:

  1. Adapt the Detector script to generate the detections CSV using a lightweight YOLO model.

Detector

The class LightweightDetector is responsible for loading a lightweight YOLO model and use it to infer bounding boxes of the "person" class for a given frame.

Attributes

Name Description
frames_dir The root directory of input frames.
model The YOLO model to use.

Methods

The only method is predict which takes in the index of a frame and does the following:

  1. Loads the corresponding frame from the self.frames_dir directory.
  2. Runs inference on the frame using the model stored in self.model.
  3. Converts the resulting bounding boxes to our own BoundingBox class and return them.

The predict method can then be called for each frame to generate a detection CSV file.

Challenges

  • Understand how to import and use YOLO models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages