3VGC

A Tri-Modal Video Genre Classification Dataset 0. Regroup for data loading- Friday

Audio - LSTM(Extract features manually) and 2d CNN(CNN Extraction for features)
Video - 3dCNN(Exists) , Tune hyperparameters etc.
Maybe text (optional)- Train CNN,Transformer,LSTM.
Speech to text