Skip to content

kelechi-c/moonvid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

moonvid 🎥🌒

kaggle notebook

A hacky video captioning framework, using a small VLM (moondream v2) and text language model (Llama 3.2-1b).

method

Uses a vision language model to generate text captions from single frames of a video (which is essesntially a sequence of frames), and then a large langauge model to merge the several captions into one single coherent caption. \

This isn't an 'SOTA' model, just an experiment.

Acknowledgments

About

hacky video captioning with moondream

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published