Suspicious Face Detection : A Monitoring System

Application Scenario

This project is based on the detection of suspicious behavior of a person analyzing the facial aspects of the respective person. The project primarily focuses the implementation to be done as a lab monitoring system.
The overview of the system is shown below.

As observed in the above image, a computer installed with an application that runs in the background is provided to the user (student) and a web application is accessed by the officials. The application consists of real-time detection and algorithms that detect the suspicious behavior of the user and provide instant warnings to the officials. The detection algorithms focus on the certain important aspects of the face as well as some external aspects to make the detection possible. The detection details and warnings are sent from the computer to the remote server from which the details are updated to a database and also rendered to a web application. The warnings are notified to the officials by means of this web application.

Plan for the proposed Project

Note that there can be unexpected delays and changes to this plan.

Basic Face Detection

The detection of the face is an essential feature for this monitoring system. Though this detection could serve as an attendance recording procedure at the initial stage of the monitoring process, the availability of the student through out the monitoring session is highly important. Also, the monitoring system ensures that the student solely does his work. Here is when the detection of multiple faces comes to play.

The frame of webcam input video is taken and processed for detection of faces in the frame using the MediaPipe Face Detection model. The face detection feature is done in two ways. One way is be the detection of no-faces and then the other way is the detection of multi-faces in input video frame. The model upon fed with an input frame results in a binary outcome , that is a at least a face is detected in the frame or not. If detected , then the system checks whether a single face or multi faces are detected. As per the needs of the system, multi faces being detected and no face being detected , both are considered as suspicious detections to be notified to the officials.

Other algorithms and models

HAAR cascade classifier HCC
Multi-task Cascaded Convolutional Networks MTCNN

Overview Of the Tested Face Detection Stategies

Detection Strategy	Limitations	Positives	Improvements
`HAAR Classifier`	- Detection of Non-Faces as Faces at some instances - No detection of faces when the lighting is less	- Simple & lightweight	- Asynchronous Programming - Multi-Threading
`Multi-Task CNN`	- Inability to limit the distance of detection		- Asynchronous Programming - Multi-Threading
`MediaPipe Face Detection Model`		- Lightweight Object Detection - Effective GPU utilization - Quality Prediction - Allows Estimation Face Rotation (roll angle)

Head Orientation Detection

This system calculates a good estimation of the angle of orientation of the face of a student. The system utilizes 3D Coordinate Geometry to achieve this estimation.

3D Coordinate Plane

A 3D Coordinate plane is created using the 3D coordinates of the facial landmarks obtained using the MediaPipe Pose Estimation. The nose point landmarkN, left L and right R ear landmarks are extracted as 3D points (x,y,z) coordinate tuple from the results obtained using the model for a frame input from the webcam.

Algorithm

Get Nose point coordinates (N)
Get Left and Right Ear point coordinates (L & R)
Get 3D line vector (LR)
Get 3D point (M)
Get 3D line vector (NM)
Get 3D Plane perpendicular to Camera (P)
Find angle between line NM & plane P

Speaking Detection

A Speaking Detection Model pre-trained using HOG based dlib face detector is utilized to predict whether a lip movement is observed or not.

Background

Collect a sequence of 25 frames
For each video frame in this sequence :
- Detect the face in the frame using a face detector (MediaPipe Face Mesh Model ).
- From the landmark predictor, fetch the points that mark the inner edges of the top and bottom lip.
- Calculate the average pixel separation between each part pairs and store this distance value into the lip separation sequence.
- Once all 25 frames are processed this way, perform min-max scaling over the 25-length sequence.
- Feed this normalized lip separation sequence into the RNN.
- The RNN generates a 2-element tuple (speech, silence) that represents the likelihood that the speaker was speaking or silent during the 25 video frames before it.
- Repeat the process for the next 25-frame window of the input video

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
Application/test		Application/test
Demonstration		Demonstration
Implementation		Implementation
Proposal		Proposal
assets		assets
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Suspicious Face Detection : A Monitoring System

Application Scenario

Plan for the proposed Project

Basic Face Detection

Other algorithms and models

Overview Of the Tested Face Detection Stategies

Head Orientation Detection

3D Coordinate Plane

Algorithm

Speaking Detection

Background

Resources

About

Releases

Packages

Languages

AKSHILMY/Suspicious-Face-Detection

Folders and files

Latest commit

History

Repository files navigation

Suspicious Face Detection : A Monitoring System

Application Scenario

Plan for the proposed Project

Basic Face Detection

Other algorithms and models

Overview Of the Tested Face Detection Stategies

Head Orientation Detection

3D Coordinate Plane

Algorithm

Speaking Detection

Background

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages