GitHub - CREESTL/Img2AudioStory: Turn image to text. Write a story based on the text. Read the story.

Overview

Turn image to text. Write a story based on that text. Read the story

Load local .jpg image
Generate text description of the image with Salesforce/blip-image-captioning-large model based HuggingFace pipeline
Generate more text (a story) based on the image description using GPT2 model based HuggingFace pipeline
Convert story to .flac audio file with HuggingFace Inference API for kan-bayashi_ljspeech_vits model

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
img2story.py		img2story.py
moscow.jpg		moscow.jpg
requirements.txt		requirements.txt