GitHub - kelechi-c/moonvid: hacky video captioning with moondream

moonvid 🎥🌒

A hacky video captioning framework, using a small VLM (moondream v2) and text language model (Llama 3.2-1b).

method

Uses a vision language model to generate text captions from single frames of a video (which is essesntially a sequence of frames), and then a large langauge model to merge the several captions into one single coherent caption. \

This isn't an 'SOTA' model, just an experiment.

Acknowledgments

moondream v2, lightweight vision language model by vikhyat
Meta AI's Llama 3.2-1b-instruct, small and capable, instruction-tuned text language model

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/moonvid		src/moonvid
.gitignore		.gitignore
README.md		README.md
moonvid.ipynb		moonvid.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

moonvid 🎥🌒

method

Acknowledgments

About

Releases

Packages

Languages

kelechi-c/moonvid

Folders and files

Latest commit

History

Repository files navigation

moonvid 🎥🌒

method

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages