Skip to content

An example to use MultiModal capabilities with Pydantic-AI to process and analyze images

License

Notifications You must be signed in to change notification settings

rawheel/Pydantic-ai-MultiModal-Example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🖼️ Pydantic-AI MultiModal Image Processor

A demonstration project showcasing how to use MultiModal capabilities with Pydantic-AI to process and analyze images using OpenAI's LLM. This project specifically focuses on resume analysis and information extraction from images, Feel free to customize it as per your needs.

✨ Features

  • 🔄 Image processing using OpenAI's LLM
  • 📊 Structured data extraction using Pydantic models
  • 🎨 Support for multiple image formats
  • 📄 Resume information extraction including:
    • 🔗 LinkedIn profile
    • 💻 GitHub profile
    • 📧 Email
    • 💼 Work experience
    • 🎓 Education
    • 🛠️ Skills

📋 Prerequisites

  • 🐍 Python 3.8+
  • 🔑 OpenAI API key
  • 📦 Git

🚀 Installation

  1. Clone the repository:
git clone https://github.com/rawheel/Pydantic-ai-MultiModal-Example.git
cd Pydantic-ai-MultiModal-Example
  1. Set up virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
  1. Environment variables:
LLM_MODEL=gpt-4o-mini
OPENAI_API_KEY=your_openai_api_key_here

📝 Usage

Basic usage web URLs:

from app import ImageSummarizer
# Initialize summarizer
summarizer = ImageSummarizer()
# Example image URLs
image_urls = [
    'https://example.com/path/to/image.jpg',
    # Add more image URLs as needed
]
# Run analysis
summary = summarizer.summarize(image_urls, "summarize the resume")
print(summary)

Run the example script:

python app.py

📁 Project Structure

├── app.py            # Main application file
├── config.py         # Configuration settings
├── requirements.txt  # Project dependencies
├── .env             # Environment variables (create this)
└── README.md        # Project documentation

⚙️ Configuration

The project uses environment variables for configuration. Available options:

  • LLM_MODEL: The OpenAI model to use (example: "gpt-4o-mini")
  • OPENAI_API_KEY: Your OpenAI API key

🤝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

👤 Author

Raheel Siddiqui


⭐️ If you find this project useful, please consider giving it a star!

About

An example to use MultiModal capabilities with Pydantic-AI to process and analyze images

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages