Skip to content
This repository has been archived by the owner on Jan 4, 2025. It is now read-only.


Folders and files

Last commit message
Last commit date

Latest commit



8 Commits

Repository files navigation

Individual Report


This individual report aims to

  • outline my experience with the project,
  • clearly describe the tasks I undertook throughout the project,
  • and identify the challenges I encountered and explain how I addressed them.

Part 1: Single Object Tracking with Kalman (Centroid-Tracker)


Implement object tracking using a pre-existing object detection algorithm and integrate the Kalman Filter for smooth and accurate tracking.


The main tasks I undertook for this part are the following:

  1. Implement a KalmanFilter class with predict and update methods.
  2. Implement an ObjectTracker class to manage object tracking.
  3. Implement a Visualizer class to visualize the tracking results.


The KalmanFilter class is responsible for estimating and predicting the state of a moving object based on noisy measurements.


Name Description
dt Time for one cycle used to estimate state (sampling rate)
u Accelerations in the x and y directions
x State vector representing the object's position and velocity
A State transition matrix
B Control matrix
H Measurement mapping matrix
P State covariance matrix
Q Process noise covariance matrix
R Measurement noise covariance matrix


  1. predict projects the current state estimate forward in time to predict the next state.
  2. update uses a new measurement to adjust the state estimate.


The ObjectTracker class is responsible for managing detection information as well as Kalman Filter estimation and prediction for the tracked object.


Name Description
kalman_filter Instance of KalmanFilter to use at each time step
detect Function to call to detect the object in a frame
state Current object detection, estimation and prediction


The class has a single step method called at each frame responsible for

  • detecting the object in the given frame using self.detect,
  • predicting the object's state using self.kalman_filter,
  • and updating the self.kalman_filter state.


The Visualizer class is responsible for drawing object state, prediction and estimation on the target frame to visualize tracking results.


The class has a single attribute tracker which is an instance of ObjectTracker to extract state information from.


The class has a single public method show responsible for drawing

  • the object detection as a green circle,
  • the object state prediction as a blue bounding box,
  • and the object state estimation as a red bounding box.


  • KalmanFilter
    • Understand the shape of data to replicate the equations seen in class.
    • Beware to use a column vector instead of a line vector for the state vector x.
  • Visualizer
    • Understand the image format used by default (i.e. BGR instead of RGB).
    • Using OpenCV to read a video frame by frame.

Part 2: IOU-based Tracking (Bounding-Box Tracker)


Develop a Simple IoU-based tracker and extend it for multiple object tracking.


The main tasks I undertook for this part are the following:

  1. Implement a bbox module to manipulate bounding boxes.
  2. Implement a track module to manage and export tracks.
  3. Implement a visualize module to handle visualization and video exports.


The bbox module is composed of the following:

  1. A BoundingBox dataclass representing a bounding box in the detection CSV.
  2. An intersection_over_union function to compute the IoU of two bounding boxes.
  • Dataclass which represents a bounding box by its top-left and bottom-right points.
  • The properties width and height return the bounding box's dimensions along each axis.
  • The property area computes the area of the bounding box.

Function which takes 2 bounding boxes as input and returns the ratio of their intersection over their union.


The track module is composed of the following:

  1. A Track dataclass representing a track in the CSV.
  2. A TrackManager class to manage the tracks across video frames.
  3. A TrackHistory class to handle exporting tracks to a file.

Dataclass responsible for storing track information: frame number, ID and bounding box.


Class responsible for creating, updating and deleting tracks across the frames of a video.


Name Description
tracks List of current tracks.
next_track_id Object ID for the next upcoming track.


The class has a single public method step called at each frame responsible for

  • matching detections with tracks,
  • updating matched tracks,
  • creating new tracks for unmatched detections,
  • and deleting unmatched tracks.


The manager algorithm can be described as follows:

  1. Match the tracks with the detections.
    • Compute a similarity matrix where $c_{i,j}$ is the IoU between the $i$-th track and $j$-th detection.
    • Compute associations using the Hungarian algorithm with scipy.optimize.linear_sum_assignment.
  2. Update the matched tracks.
    • Get the track with the corresponding ID.
    • Replace its bounding box with the associated detection.
    • Include the track in the new tracks list.
  3. Create new tracks for unmatched detections.
    • Create new track based on the detection with self.next_track_id as its ID.
    • Increment self.next_track_id to ensure unique IDs across following tracks.
  4. Replace the self.tracks current tracks with the new tracks list.
    • Any previous tracks not included is lost (i.e. unmatched tracks are removed).

Class responsible for storing and exporting tracks.


  • dump: Dumps the tracks to a CSV file.
  • extend: Stores a list of tracks.
  • push: Stores a track.


The visualize module is composed of the following:

  • A Canvas class to draw on a given frame.
  • A Visualizer class to visualize tracking results.
  • A Video class responsible for tracking results video exports.

Class responsible for drawing on a given frame.


Name Description
data Frame on which to draw.


  • draw_bbox: Draws a given bounding box.
  • draw_text: Draws some text.

Class responsible for drawing bounding boxes and IDs of tracks.


  • draw: Draws the current tracks stored by a track manager on the given frame.
  • show: Opens a window to visualize the current frame (possibly with track drawings).

Class responsible for simplifying the use of cv2.VideoWriter.


  • write: Creates an instance of cv2.VideoWriter if necessary and writes the given frame.
  • release: Releases the cv2.VideoWriter instance.


  • bbox: Handle the case where bounding boxes have negative intersections.
    • The dimensions are set to zero whenever the intersections are negative.
  • TrackManager
    • Understanding the scipy.optimize.linear_sum_assignment function.
    • Understanding how to know when detections or tracks are unmatched.

Part 3: Kalman-Guided IoU Tracking (Bounding-Box Tracker)


Extend IoU-based MOT with Hungarian Algorithm by adding Kalman Filter.


The main tasks I undertook for this part are the following:

  1. Adapt the BoundingBox class to compute its centroid.
  2. Adapt the KalmanFilter class to handle bounding boxes through their centroids.
  3. Adapt the TrackManager class to use Kalman Filter predictions.
  4. Adapt the visualize module to handle visualization of predicted centroids.


Add a property center which computes the centroid of the bounding box based on its coordinates.


Change the update method to take in a BoundingBox and use its center property for predictions.


  • Add a filter_params argument to the constructor to initialize the kalman filters of new tracks.
  • Add a filters dictionary attribute which associates a track ID with its corresponding KalmanFilter instance.
  • Change the step method to account for creating and updating the KalmanFilter instances of each track.
  • Use predictions from the corresponding KalmanFilter instances instead of the last known detection bounding boxes for computing the similarity score between tracks and detections.


  • Add a draw_cross function to the Canvas class in order to draw a X shape at a given point.
  • Adapt the Visualizer class to draw a cross at the corresponding centroids predicted by the KalmanFilter instances for the current tracks.


  • As this part consists mostly of adapting the existing pipeline, I did not really encounter any major challenges apart from correctly translating coordinates between centroids and bounding box corner points.

Part 4: Appearance-Aware IoU-Kalman Object Tracker


Extend IoU-Kalman tracker to include object re-identification (ReID).


The main tasks I undertook for this part are the following:

  1. Implement a reident class to compute appearance features of a patch.
  2. Adapt the TrackManager to account for appearance features of patches.


This module is composed of the following:

  • An ObjectIdentifier class responsible for computing appearance features of a patch.
  • An extract_patch function to extract a patch from a frame.
  • A normalize_patch function to normalize a given patch.

Class responsible for computing appearance features using a lightweight pre-trained model.


Name Description
feature_extractor Sequential layer of MobileNet v2 excluding the classifier head.
device Device on which to run the computation.


The class has a single public method __call__ called on a frame and a bounding box responsible for:

  • Extracting the corresponding patch through extract_patch.
  • Normalizing the corresponding patch through normalize_patch.
  • Computing the appearance features using the feature_extractor attribute.

This function is responsible for cropping a frame to the given bounding box.


This function is responsible for preprocessing the patch by

  • resizing the input patch to 224 x 224,
  • Converting the colors from BGR to RGB,
  • and normalizing the values for the model using a Z-score.


  • Add an identifier attribute and constructor argument to store an ObjectIdentifier instance.
  • Add a track_features dictionary attributes which associates a track ID with its feature vector.
  • Add a weights attribute which provides weights for IoU and appearance features scores.
  • Change the step method to account for updating the feature vectors of each track.
  • Use weighted sum between IoU score and appearance score for computing the similarity score between tracks and detections.


  • Correctly extracting the feature layers of MobileNet.
    • I used to only extract the features of the last convolution layer which led to poor results.
    • I got much better results by leveraging the features of the last dense layer before the classifier head.
  • Finding the best distance function.
    • I tried different distances empirically and cosine similarity seemed to perform better in my case.

Part 5: Appearance-Aware IoU-Kalman Object Tracker: Detector Extension


Integrate more efficient lightweight deep Learning-based object detector for pedestrian detection.


The main tasks I undertook for this part are the following:

  1. Adapt the Detector script to generate the detections CSV using a lightweight YOLO model.


The class LightweightDetector is responsible for loading a lightweight YOLO model and use it to infer bounding boxes of the "person" class for a given frame.


Name Description
frames_dir The root directory of input frames.
model The YOLO model to use.


The only method is predict which takes in the index of a frame and does the following:

  1. Loads the corresponding frame from the self.frames_dir directory.
  2. Runs inference on the frame using the model stored in self.model.
  3. Converts the resulting bounding boxes to our own BoundingBox class and return them.

The predict method can then be called for each frame to generate a detection CSV file.


  • Understand how to import and use YOLO models.


No description, website, or topics provided.






No releases published


No packages published
