Skip to content

Temporal Max Pooling #22

@vinitmuchhala

Description

@vinitmuchhala

Love the work, I am just having difficulty understanding the architecture for the SI + DI model.
From what I see in the architecture of the resnext.mat model, the model uses a temporal max pooling layer just before the softmax layer. It says the input to the temporal max pooling layer are the merged conv7 features and Video2. I am assuming the merged conv7 features come from running the dynamic image through the ResNext model. Where does the Video2 come from?
Are we supposed to pass the whole video or just a single frame from the video clip?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions