Note
This toolkit supports Google's newest Gemini 2.0 model & 1.5 models, as well as the experimentals models (as of December 13, 2024)
The Gemini AI Toolkit is the easiest way for developers to build with Google's Gemini AI models. It offers seamless integration for chat, text generation, and multimodal interactions, allowing you to process and analyze text, images, audio, video, code, and moreβall in one comprehensive package with minimal dependencies.
- Multimodal Interaction: Effortlessly process and analyze a wide array of file typesβincluding PDFs, images, videos, audio files, text documents, and code snippetsβunlocking new dimensions of AI-assisted understanding.
- Interactive Chat: Engage in dynamic, context-aware conversations with Gemini, enabling real-time dialogue that adapts to your needs.
- Smart File Handling: Seamlessly upload and process files from local paths or URLs, with automatic temporary storage management to keep your workspace clutter-free.
- Command Support: Utilize intuitive commands to control the toolkit's functionality, enhancing efficiency and user experience.
- Customizable Parameters: Tailor your AI interactions by enabling structured JSON output for automated processing, using streaming responses for faster interactions, and adjusting temperature, token limits, and safety thresholds and more to suit your needs
- Lightweight Design: Enjoy a streamlined experience with minimal dependenciesβprimarily leveraging the requests packageβmaking setup and deployment a breeze.
- Installation
- API Key Configuration
- Usage
- Special Commands
- Advanced Configuration
- Supported Models
- Error Handling and Safety
- Supported File Types
- Caching and Cleanup
- Contributing
- Reporting Issues
- Submitting Pull Requests
- Versioning and Changelog
- Security
- License
-
Clone the repository:
git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git
-
Navigate to the repository folder:
cd gemini-ai-toolkit
-
Install the required dependencies:
pip install -r requirements.txt
-
Obtain an API key from Google AI Studio.
-
You have three options for managing your API key:
Click here to view the API key configuration options
-
Setting it as an environment variable on your device (recommended for everyday use)
- Navigate to your terminal.
- Add your API key like so:
export GEMINI_API_KEY=your_api_key
This method allows the API key to be loaded automatically when using the wrapper or CLI.
-
Using an .env file (recommended for development):
- Install python-dotenv if you haven't already:
pip install python-dotenv
. - Create a .env file in the project's root directory or rename
example.env
in the root folder to.env
and replaceyour_api_key_here
with your API key. - Add your API key to the .env file like so:
GEMINI_API_KEY=your_api_key
This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly.
- Install python-dotenv if you haven't already:
-
Direct Input:
-
If you prefer not to use a
.env
file, you can directly pass your API key as an argument to the CLI or the wrapper functions.CLI
--api_key "your_api_key"
Wrapper
api_key="your_api_key"
This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.
-
-
For processing multiple input types including audio, video, text, images, code and a wide range of files. This mode allows you to upload files (from local paths or URLs), chat with the AI about the content, and maintain a knowledge base throughout the conversation.
CLI
python cli.py --multimodal --prompt "Analyze both of these files and provide a summary of each, one by one. Don't overlook any details." --files file1.jpg https://example.com/file2.pdf
Wrapper
from gemini import Multimodal
Multimodal().run(prompt="Analyze both of these files and provide a summary of each, one by one. Don't overlook any details.", files=["file1.jpg", "https://example.com/file2.pdf"])
For interactive conversations with the AI model.
CLI
python cli.py --chat
Wrapper
from gemini import Chat
Chat().run()
For generating text based on a prompt or a set of instructions.
CLI
python cli.py --text --prompt "Write a story about a magic backpack."
Wrapper
from gemini import Text
Text().run(prompt="Write a story about a magic backpack.")
During interaction with the toolkit, you can use the following special commands:
/exit
or/quit
: End the conversation and exit the program./clear
: Clear the conversation history (useful for saving API credits)./upload
: Upload a file for multimodal processing.- Usage:
/upload file_path_and_or_url [optional prompt]
- Example:
/upload file1.jpg https://example.com/file2.pdf Analyze the files and provide a summary of each
- Usage:
Description | CLI Flags | CLI Usage | Wrapper Usage |
---|---|---|---|
Chat mode | -c , --chat |
--chat |
See mode usage above. |
Text mode | -t , --text |
--text |
See mode usage above. |
Multimodal mode | -m , --multimodal |
--multimodal |
See mode usage above. |
User prompt | -p , --prompt |
--prompt "Your prompt here" |
prompt="Your prompt here" |
File inputs | -f , --files |
--files file1.jpg https://example.com/file2.pdf |
files=["file1.jpg", "https://example.com/file2.pdf"] |
Enable streaming | -s , --stream |
--stream |
stream=True |
Enable JSON output | -js , --json |
--json |
json=True |
API Key | -ak , --api_key |
--api_key "your_api_key" |
api_key="your_api_key" |
Model name | -md , --model |
--model "gemini-2.0-flash-exp" |
model="gemini-2.0-flash-exp" |
System prompt | -sp , --system_prompt |
--system_prompt "Set custom instructions" |
system_prompt="Set custom instructions" |
Max tokens | -mt , --max_tokens |
--max_tokens 1024 |
max_tokens=1024 |
Temperature | -tm , --temperature |
--temperature 0.7 |
temperature=0.7 |
Top-p | -tp , --top_p |
--top_p 0.9 |
top_p=0.9 |
Top-k | -tk , --top_k |
--top_k 40 |
top_k=40 |
Candidate count | -cc , --candidate_count |
--candidate_count 1 |
candidate_count=1 |
Stop sequences | -ss , --stop_sequences |
--stop_sequences ["\n", "."] |
stop_sequences=["\n", "."] |
Safety categories | -sc , --safety_categories |
--safety_categories ["HARM_CATEGORY_HARASSMENT"] |
safety_categories=["HARM_CATEGORY_HARASSMENT"] |
Safety thresholds | -st , --safety_thresholds |
--safety_thresholds ["BLOCK_NONE"] |
safety_thresholds=["BLOCK_NONE"] |
Model | Inputs | Context Length |
---|---|---|
gemini-2.0-flash-exp |
Text, images, audio, video | 8192 |
gemini-1.5-flash |
Text, images, audio, video | 8192 |
gemini-1.5-flash-8b |
Text, images, audio, video | 8192 |
gemini-1.5-pro |
Text, images, audio, video | 8192 |
gemini-1.0-pro ( Set to be deprecated on 2/15/2025 ) |
Text | 2048 |
Model | Inputs | Context Length |
---|---|---|
gemini-exp-1114 |
Text, images, audio, video | 8192 |
gemini-1.5-pro-exp-0827 |
Text, images, audio, video | 8192 |
gemini-1.5-flash-8b-exp-0924 |
Text, images, audio, video | 8192 |
Note
The availability of specific models may be subject to change. Always refer to Google's official documentation for the most up-to-date information on model availability and capabilities. See base models docs here and experimental model docs here.
The Gemini AI Toolkit now includes robust error handling to help you diagnose and resolve issues quickly. Here are some common error codes and their solutions:
HTTP Code | Status | Description | Solution |
---|---|---|---|
400 | INVALID_ARGUMENT | Malformed request body | Check API reference for correct format and supported versions |
400 | FAILED_PRECONDITION | API not available in your country | Enable billing on your project in Google AI Studio |
403 | PERMISSION_DENIED | API key lacks permissions | Verify API key and access rights |
404 | NOT_FOUND | Resource not found | Check if all parameters are valid for your API version |
429 | RESOURCE_EXHAUSTED | Rate limit exceeded | Ensure you're within model rate limits or request a quota increase |
500 | INTERNAL | Unexpected error on Google's side | Retry after a short wait; report persistent issues |
503 | UNAVAILABLE | Service temporarily overloaded/down | Retry after a short wait; report persistent issues |
For rate limit errors (429), the toolkit will automatically pause for 15 seconds before retrying the request.
The Gemini AI Toolkit supports a wide range of file types for multimodal processing. Here are the supported file extensions:
Category | File Extensions |
---|---|
Images | jpg , jpeg , png , webp , gif , heic , heif |
Videos | mp4 , mpeg , mpg , mov , avi , flv , webm , wmv , 3gp |
Audio | wav , mp3 , aiff , aac , ogg , flac |
Text/Documents | txt , html , css , js , ts , csv , md , py , json , xml , rtf , pdf |
Note
Google's Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours.
The Gemini AI Toolkit implements a caching mechanism for downloaded files to improve performance and reduce unnecessary network requests. Here's how it works:
- When a file is downloaded from a URL, it's stored in a temporary cache folder (
.gemini_ai_toolkit_cache
). - The file will be used to process the request and will be stored locally due to Google's upload requirements.
- The cache is automatically cleaned up at the end of each session to prevent accumulation of temporary files.
You don't need to manage this cache manually, but it's good to be aware of its existence, especially if you're processing large files or have limited storage space.
Contributions are welcome!
Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.
Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:
- Check if the issue has already been reported.
- Use the Bug Report template to create a detailed report.
- Submit the report here.
Your report will help us make the project better for everyone.
Got an idea for a new feature? Feel free to suggest it. Here's how:
- Check if the feature has already been suggested or implemented.
- Use the Feature Request template to create a detailed request.
- Submit the request here.
Your suggestions for improvements are always welcome.
Stay up-to-date with the latest changes and improvements in each version:
- CHANGELOG.md provides detailed descriptions of each release.
Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.
Licensed under the MIT License. See LICENSE for details.