Share your ideas on how to implement the Facial Gesture Capture feature #7

gutzcha · 2023-02-16T16:57:55Z

gutzcha
Feb 16, 2023
Maintainer

Hey everyone,

I have given much thought to implementing this feature, and here are my ideas and thoughts.
Considering that we already have pose data (facial landmarks extracted using mediapipe), we can use the "rule-based" method or the DNN model method.

Rule-based method:
A rule-based method is when you define certain rules and thresholds for each decision. For example, if the distance between the landmark of the upper eyelid and the lower eyelid is less than 0.1, the left eye is closed.

The advantage of this method is that you need a tiny amount of data. However, it lacks robustness; what if the head is tilted upward and the distance between the eyelids is smaller than the threshold?
Deep learning model:

A deep learning model that was trained on many samples would be able to do the task, it will be able to generalize better and thus perform well on new data.

However, training a DNN model requires a lot of labeled data, for example, samples of non-blinking and blinking sequences, which we do not have. What we do have is a large dataset of unlabeled faces and a very small dataset of labeled actions.
So what I am proposing is to use unsupervised pretraining (on the large unlabeled data) followed by supervised fine-tuning (on the smaller labeled data).

The proposed method is a masked autoencoder, as described here.

The masked autoencoder will take a pose sequence as input with the dimensions of
Nx64 framesx478 landmarksx2 coordinates.
The first step is to embed the sequences, which will be done using a temporal dilated CNN (see paper).
Next, each embedded landmark will have a class encoding (like positional encoding),
Then, a portion of the landmarks will be omitted (masked).
Next, the remaining embedded landmarks will be fed into a transformer encoder-decoder model, which will eventually reconstruct the original pose sequence.
Finally, after the model is trained, we will collect labeled datasets from the crowd and use it to fine-tune the pre trained encoder on a classification task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share your ideas on how to implement the Facial Gesture Capture feature #7

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Share your ideas on how to implement the Facial Gesture Capture feature #7

gutzcha Feb 16, 2023 Maintainer

Replies: 0 comments

gutzcha
Feb 16, 2023
Maintainer