Powered by llama3, Whisper, Paddleocr, bge-base-en-v1.5, KeyBert, xlm-roberta_punctuation_fullstop_truecase and paraphrase-multilingual-MiniLM-L12-v2, we construct an agent to implement online Q&A, video segmentation, Inter-class quizzes for multi educational videos understanding. We hope to expand the functionality and effectiveness of online education.
We use the videos from link as exmaple (you can download from link) and you can find demo of VidMentor here.
├── 📂 checkpoints #save model checkpoints
├── 📂 videos #save all origin videos
├── 📂 asset #save necessary files
├── 📂 backend
│ ├── 📄 backend_audio.py #extract audio info into database
│ ├── 📄 backend_search.py #support search and answer in website demo
│ ├── 📄 backend_visual.py #extract visual info into database
│ ├── 📄 backend_llm.py #support building llm agents
├── 📂 database #save all video's data
├── 📂 utils
│ ├── 📄 tamplate.py #provide different tamplates for different llm agents
│ ├── 📄 trees.py #provide tools to generate mind map
│ ├── 📄 utils.py #provide some useful common tools
├── 📂 models
│ ├── 📄 bgemodel.py #bgemodel method
│ ├── 📄 llm_model.py #llm model method
│ ├── 📄 whisper_model.py #whisper model method
│ ├── 📄 keybert_model.py #keybert method
│ ├── 📄 punctuator_model.py #punctuator model method
├── 📄 README.md #readme file
├── 📄 TUTORIAL.md #tutorial for vidmentor
├── 📄 requirements.txt #packages requirement
├── 📄 st_demo.py #run streamlit website demo
├── 📄 download_ckpt.py #download all model into local
├── 📄 build_database.py #build database
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://github.com/Kailuo-Lai/VidMentor.git
conda create -n vidmentor python=3.9
conda activate vidmentor
cd VidMentor
pip install -r requirements.txt
- Downlowd Graphviz from link.
- Add Graphviz to your system path.
python download_ckpt.py
- Build llama.cpp from link.
- Quantize the llama3 weight in the
checkpoints
folder following the instructions from link - Change the argument
--llm_version
inst_demo.py
andbuild_database.py
to the output file name of the quantized llama3 weight.
You can find the tutorial of VidMentor🦙 here.
We are grateful for the following awesome projects
- llama3: An open-source large language model created by Meta
- Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle
- KeyBert: A minimal method for keyword extraction with BERT
- bge-base-en-v1.5: A general embedding model created by BAAI
- paraphrase-multilingual-MiniLM-L12-v2: A multilingual text embedding
- xlm-roberta_punctuation_fullstop_truecase: An xlm-roberta model fine-tuned to restore punctuation
Thanks to all the contributors who have helped to make this project better!
Yifan Wu 💻 |
Kailuo 💻 |
chenminghao 💻 |
||||
|