InsightLens is an AI-powered image analysis tool designed to deliver quick, insightful, and contextually accurate information about images. Powered by Google Generative AI (Gemini), Streamlit, and various AI libraries, InsightLens generates captions, offers detailed descriptions with emojis, and allows users to ask questions about image content.
Whether you're using InsightLens to enhance content creation, explore visual storytelling, or analyze images for insights, this tool provides a seamless and interactive experience that’s both informative and engaging.
Experience InsightLens in action! 👉🏻
Explore the story behind every image with InsightLens!
- Features
- How It Works
- Installation
- Usage
- Technologies Used
- Results
- Conclusion
- Future Enhancements
- License
- Contact
- Automatic Captioning: Generates a brief, one-line caption for uploaded images.
- Detailed Descriptions: Provides concise summeries that highlight the primary content and context of the image.
- Image Q&A: Users can ask questions about the image's content, with responses powered by Gemini AI.
- Interactive User Interface: InsightLens is designed with animations, style effects, and balloons for a lively experience.
- Privacy by Design: InsightLens does not store any images or questions asked, ensuring a secure and private interaction every time.
- Upload an Image: The user uploads an image in
.jpg
,.jpeg
, or.png
format. - Automatic Captioning: InsightLens auto-generates a caption using Gemini AI.
- Detailed Summaries: A structured, emoji-enhanced description provides a deeper understanding of the image.
- Interactive Q&A: Users can ask specific questions about the image, and the AI responds with insightful answers.
- Visual Enhancements: InsightLens offers a polished user experience with glowing titles, fade-in effects, and celebratory balloons upon successful interactions.
-
Clone the repository:
git clone https://github.com/hk-kumawat/InsightLens.git
-
Install dependencies:
pip install -r requirements.txt
-
Setup environment variables:
- Obtain an API key from Google Generative AI.
- Create a
.env
file in the root directory and add:GEMINI_API_KEY=your_gemini_api_key
- Replace
your_gemini_api_key
with your actual Gemini API Key.
-
Run the Streamlit App:
streamlit run app.py
-
Upload and Explore:
- Upload an image and instantly receive a caption, detailed description, and engage in Q&A about the image.
-
Programming Language: Python
-
Libraries:
streamlit
— For creating the user interface.Pillow
— For image processing.python-dotenv
— Manages environment variables.google-generativeai
— For generating captions, descriptions, and answering questions.
-
API:
- Gemini API by Google Generative AI — Powers the core captioning, description generation, and Q&A functionalities.
InsightLens successfully analyzes images, providing an insightful, one-line caption along with a structured, emoji-based description and engaging Q&A responses. This AI-powered analysis is useful in applications ranging from social media to education and creative design.
In the above example, InsightLens provides a brief caption, structured description, and accurate answers to user questions about the image content.
The InsightLens project exemplifies the potential of AI in image analysis, creating a valuable tool for visually rich applications. By integrating Google Generative AI (Gemini) and Streamlit, InsightLens enables users to explore image content in a meaningful and interactive way.
Real-world applications for InsightLens include content creation (captions and descriptions for social media), education (visual storytelling) and research (image-based inquiries). This makes InsightLens a versatile tool for both personal and professional uses.
- Multi-turn Conversation: Enable the assistant to maintain conversation context across multiple interactions.
- Advanced Emotion Detection: Expand sentiment capabilities to identify a wider range of emotional tones in image context.
- Integration with External Services: Extend InsightLens’s functionality to connect with APIs for additional insights (e.g., related news or facts about image objects).
- Voice Interaction: Add voice input/output for a more dynamic user experience.
This project is licensed under the MIT License — see the LICENSE file for details.
Feel free to reach out for collaborations or questions:
- 💻 — Explore more projects by Harshal Kumawat.
- 🌐 — Let's connect professionally.
- 📧 — Reach out for inquiries or collaboration.
Where visuals meet intelligence—thank you for discovering what InsightLens can do! Let's keep exploring new horizons together. 🌍🔍
"Every image has a story; let InsightLens help you discover it, one image at a time." - Harshal Kumawat