Temporal Max Pooling

Love the work, I am just having difficulty understanding the architecture for the SI + DI model.
From what I see in the architecture of the resnext.mat model,  the model uses a temporal max pooling layer just before the softmax layer. It says the input to the temporal max pooling layer are the merged conv7 features and Video2. I am assuming the merged conv7 features come from running the dynamic image through the ResNext model. Where does the Video2 come from?
Are we supposed to pass the whole video or just a single frame from the video clip?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporal Max Pooling #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Temporal Max Pooling #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions