A Step-by-Step Guide to Augmenting Digitized Historical Images,
with the help of LAVIS, groundingDINO and Segment Anything
The project in its original state can be found here.
WebUI designed for augmenting digitized historical images by generating captions, grounding the captions and segmenting their content.
- Simple WebUI using
- Caption generation using BLIP and BLIP2
- Translation of captions to English using Helsinki-NLP
- Grounding of captions using 🦕 groundingDINO
- Segmentation of images using Segment Anything and Agnostic segmentation
- Visualization of the results
- Clone the repository and its submodules
git clone --recurse-submodules https://github.com/tgieruc/Heritage-in-the-digital-age.git
- Install the dependencies
bash setup.sh
- Run the server
python3 webui.py
Thanks to the following people for their help and their work:
- The caption generation pipeline: Chenkai Wang
- The English to French translation model: MarianMT
- Captioning: LAVIS
- Phrase Grounding: GLIP, MDETR, groundingDINO
- The NLP model for ranking the expressions: DistilBERT
- One segmentation model was created using the Segmentation Models library
- The other segmentation models from Segment Anything
You can reach me here 😊