This repository will contain the official implementation for our paper:
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
NExT-Vid is a novel method for effective video representation learning via next-frame prediction in an autoregressive framework.
We are currently organizing and preparing the code for public release. Please stay tuned!