This individual report aims to
- outline my experience with the project,
- clearly describe the tasks I undertook throughout the project,
- and identify the challenges I encountered and explain how I addressed them.
Implement object tracking using a pre-existing object detection algorithm and integrate the Kalman Filter for smooth and accurate tracking.
The main tasks I undertook for this part are the following:
- Implement a
KalmanFilter
class withpredict
andupdate
methods. - Implement an
ObjectTracker
class to manage object tracking. - Implement a
Visualizer
class to visualize the tracking results.
The KalmanFilter
class is responsible for estimating and predicting the state of a moving object based on noisy measurements.
Attributes
Name | Description |
---|---|
dt |
Time for one cycle used to estimate state (sampling rate) |
u |
Accelerations in the x and y directions |
x |
State vector representing the object's position and velocity |
A |
State transition matrix |
B |
Control matrix |
H |
Measurement mapping matrix |
P |
State covariance matrix |
Q |
Process noise covariance matrix |
R |
Measurement noise covariance matrix |
Methods
predict
projects the current state estimate forward in time to predict the next state.update
uses a new measurement to adjust the state estimate.
The ObjectTracker
class is responsible for managing detection information as well as Kalman Filter estimation and prediction for the tracked object.
Attributes
Name | Description |
---|---|
kalman_filter |
Instance of KalmanFilter to use at each time step |
detect |
Function to call to detect the object in a frame |
state |
Current object detection, estimation and prediction |
Methods
The class has a single step
method called at each frame responsible for
- detecting the object in the given frame using
self.detect
, - predicting the object's state using
self.kalman_filter
, - and updating the
self.kalman_filter
state.
The Visualizer
class is responsible for drawing object state, prediction and estimation on the target frame to visualize tracking results.
Attributes
The class has a single attribute tracker
which is an instance of ObjectTracker
to extract state information from.
Methods
The class has a single public method show
responsible for drawing
- the object detection as a green circle,
- the object state prediction as a blue bounding box,
- and the object state estimation as a red bounding box.
KalmanFilter
- Understand the shape of data to replicate the equations seen in class.
- Beware to use a column vector instead of a line vector for the state vector
x
.
Visualizer
- Understand the image format used by default (i.e. BGR instead of RGB).
- Using OpenCV to read a video frame by frame.
Develop a Simple IoU-based tracker and extend it for multiple object tracking.
The main tasks I undertook for this part are the following:
- Implement a
bbox
module to manipulate bounding boxes. - Implement a
track
module to manage and export tracks. - Implement a
visualize
module to handle visualization and video exports.
The bbox
module is composed of the following:
- A
BoundingBox
dataclass representing a bounding box in the detection CSV. - An
intersection_over_union
function to compute the IoU of two bounding boxes.
- Dataclass which represents a bounding box by its top-left and bottom-right points.
- The properties
width
andheight
return the bounding box's dimensions along each axis. - The property
area
computes the area of the bounding box.
Function which takes 2 bounding boxes as input and returns the ratio of their intersection over their union.
The track
module is composed of the following:
- A
Track
dataclass representing a track in the CSV. - A
TrackManager
class to manage the tracks across video frames. - A
TrackHistory
class to handle exporting tracks to a file.
Dataclass responsible for storing track information: frame number, ID and bounding box.
Class responsible for creating, updating and deleting tracks across the frames of a video.
Attributes
Name | Description |
---|---|
tracks |
List of current tracks. |
next_track_id |
Object ID for the next upcoming track. |
Methods
The class has a single public method step
called at each frame responsible for
- matching detections with tracks,
- updating matched tracks,
- creating new tracks for unmatched detections,
- and deleting unmatched tracks.
Algorithm
The manager algorithm can be described as follows:
-
Match the tracks with the detections.
- Compute a similarity matrix where
$c_{i,j}$ is the IoU between the$i$ -th track and$j$ -th detection. - Compute associations using the Hungarian algorithm with
scipy.optimize.linear_sum_assignment
.
- Compute a similarity matrix where
-
Update the matched tracks.
- Get the track with the corresponding ID.
- Replace its bounding box with the associated detection.
- Include the track in the new tracks list.
-
Create new tracks for unmatched detections.
- Create new track based on the detection with
self.next_track_id
as its ID. - Increment
self.next_track_id
to ensure unique IDs across following tracks.
- Create new track based on the detection with
-
Replace the
self.tracks
current tracks with the new tracks list.- Any previous tracks not included is lost (i.e. unmatched tracks are removed).
Class responsible for storing and exporting tracks.
Methods
dump
: Dumps the tracks to a CSV file.extend
: Stores a list of tracks.push
: Stores a track.
The visualize
module is composed of the following:
- A
Canvas
class to draw on a given frame. - A
Visualizer
class to visualize tracking results. - A
Video
class responsible for tracking results video exports.
Class responsible for drawing on a given frame.
Attributes
Name | Description |
---|---|
data |
Frame on which to draw. |
Methods
draw_bbox
: Draws a given bounding box.draw_text
: Draws some text.
Class responsible for drawing bounding boxes and IDs of tracks.
Methods
draw
: Draws the current tracks stored by a track manager on the given frame.show
: Opens a window to visualize the current frame (possibly with track drawings).
Class responsible for simplifying the use of cv2.VideoWriter
.
Methods
write
: Creates an instance ofcv2.VideoWriter
if necessary and writes the given frame.release
: Releases thecv2.VideoWriter
instance.
bbox
: Handle the case where bounding boxes have negative intersections.- The dimensions are set to zero whenever the intersections are negative.
TrackManager
- Understanding the
scipy.optimize.linear_sum_assignment
function. - Understanding how to know when detections or tracks are unmatched.
- Understanding the
Extend IoU-based MOT with Hungarian Algorithm by adding Kalman Filter.
The main tasks I undertook for this part are the following:
- Adapt the
BoundingBox
class to compute its centroid. - Adapt the
KalmanFilter
class to handle bounding boxes through their centroids. - Adapt the
TrackManager
class to use Kalman Filter predictions. - Adapt the
visualize
module to handle visualization of predicted centroids.
Add a property center
which computes the centroid of the bounding box based on its coordinates.
Change the update
method to take in a BoundingBox
and use its center
property for predictions.
- Add a
filter_params
argument to the constructor to initialize the kalman filters of new tracks. - Add a
filters
dictionary attribute which associates a track ID with its correspondingKalmanFilter
instance. - Change the
step
method to account for creating and updating theKalmanFilter
instances of each track. - Use predictions from the corresponding
KalmanFilter
instances instead of the last known detection bounding boxes for computing the similarity score between tracks and detections.
- Add a
draw_cross
function to theCanvas
class in order to draw a X shape at a given point. - Adapt the
Visualizer
class to draw a cross at the corresponding centroids predicted by theKalmanFilter
instances for the current tracks.
- As this part consists mostly of adapting the existing pipeline, I did not really encounter any major challenges apart from correctly translating coordinates between centroids and bounding box corner points.
Extend IoU-Kalman tracker to include object re-identification (ReID).
The main tasks I undertook for this part are the following:
- Implement a
reident
class to compute appearance features of a patch. - Adapt the
TrackManager
to account for appearance features of patches.
This module is composed of the following:
- An
ObjectIdentifier
class responsible for computing appearance features of a patch. - An
extract_patch
function to extract a patch from a frame. - A
normalize_patch
function to normalize a given patch.
Class responsible for computing appearance features using a lightweight pre-trained model.
Attributes
Name | Description |
---|---|
feature_extractor |
Sequential layer of MobileNet v2 excluding the classifier head. |
device |
Device on which to run the computation. |
Methods
The class has a single public method __call__
called on a frame and a bounding box responsible for:
- Extracting the corresponding patch through
extract_patch
. - Normalizing the corresponding patch through
normalize_patch
. - Computing the appearance features using the
feature_extractor
attribute.
This function is responsible for cropping a frame to the given bounding box.
This function is responsible for preprocessing the patch by
- resizing the input patch to 224 x 224,
- Converting the colors from BGR to RGB,
- and normalizing the values for the model using a Z-score.
- Add an
identifier
attribute and constructor argument to store anObjectIdentifier
instance. - Add a
track_features
dictionary attributes which associates a track ID with its feature vector. - Add a
weights
attribute which provides weights for IoU and appearance features scores. - Change the
step
method to account for updating the feature vectors of each track. - Use weighted sum between IoU score and appearance score for computing the similarity score between tracks and detections.
- Correctly extracting the feature layers of MobileNet.
- I used to only extract the features of the last convolution layer which led to poor results.
- I got much better results by leveraging the features of the last dense layer before the classifier head.
- Finding the best distance function.
- I tried different distances empirically and cosine similarity seemed to perform better in my case.
Integrate more efficient lightweight deep Learning-based object detector for pedestrian detection.
The main tasks I undertook for this part are the following:
- Adapt the
Detector
script to generate the detections CSV using a lightweight YOLO model.
The class LightweightDetector
is responsible for loading a lightweight YOLO model and use it to infer bounding boxes of the "person" class for a given frame.
Attributes
Name | Description |
---|---|
frames_dir |
The root directory of input frames. |
model |
The YOLO model to use. |
Methods
The only method is predict
which takes in the index of a frame and does the following:
- Loads the corresponding frame from the
self.frames_dir
directory. - Runs inference on the frame using the model stored in
self.model
. - Converts the resulting bounding boxes to our own
BoundingBox
class and return them.
The predict
method can then be called for each frame to generate a detection CSV file.
- Understand how to import and use YOLO models.
- To do so, I have learnt to use the Ultralytics framework through its official documentation.