Context:
I was recently reading “Desert Solitaire” by Edward Abbey where he paints the Moab scenery with a powerful and passionate brush. The author's descriptions of the scenery were so vivid and detailed that I found it difficult to visualize the scene as quickly as I would have liked. So I came up with an idea of generating images based on the highlighted text.
Idea:
- User will highlight the text on reader and after highlighting they will see an option to generate image
- Software will use highlighted text as the prompt to Image generation models (StabilityAI/Dall-e/Titan etc)
- User will see the generated images
My Setup:
- I am running Stability AI diffusion model on ec2. Follow this git repository to set it up: https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Create SSH tunnel to the ec2 instance : ssh -N -L 192.168.86.43:7860:127.0.0.1:7860 -i ec2_key.pem username@ip-address
- Running KOReader on Android simulator on mac. In directory koreader/plugins create folder visualizebookgpt.koplugin and copy and paster files from repo to the plugin folder
Demo:
visualize_book_gpt-1.mp4
TODO:
- Add support openai(DALL-E) api calls