A powerful open-source research assistant powered by Google's Gemini AI that performs deep, multi-layered research on any topic.
- Automated deep research with adjustable breadth and depth
- Follow-up question generation for better context
- Concurrent processing of multiple research queries
- Comprehensive final report generation with citations
- Three research modes: fast, balanced, and comprehensive
- Progress tracking and detailed logging
- Source tracking and citation management
- Python 3.9+
- Google Gemini API key
- Docker (if using dev container)
- VS Code with Dev Containers extension (if using dev container)
You can set up this project in one of two ways:
- Open the project in VS Code
- When prompted, click "Reopen in Container" or run the "Dev Containers: Reopen in Container" command
- Create a
.env
file in the root directory and add your Gemini API key:GEMINI_KEY=your_api_key_here
-
Clone the repository:
git clone <repository-url> cd open-gemini-deep-research
-
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the root directory and add your Gemini API key:GEMINI_KEY=your_api_key_here
Run the main script with your research query:
python main.py "your research query here"
--mode
: Research mode (choices: fast, balanced, comprehensive) [default: balanced]--num-queries
: Number of queries to generate [default: 3]--learnings
: List of previous learnings [optional]
Example:
python main.py "Impact of artificial intelligence on healthcare" --mode comprehensive --num-queries 5
The script will:
- Analyze your query for optimal research parameters
- Ask follow-up questions for clarification
- Conduct multi-layered research
- Generate a comprehensive report saved as
final_report.md
- Show progress updates throughout the process
open-gemini-deep-research/
├── .devcontainer/
│ └── devcontainer.json
├── src/
│ ├── __init__.py
│ └── deep_research.py
├── .env
├── .gitignore
├── dockerfile
├── main.py
├── README.md
└── requirements.txt
The application offers three research modes that affect how deeply and broadly the research is conducted:
-
Fast Mode
- Performs quick, surface-level research
- Maximum of 3 concurrent queries
- No recursive deep diving
- Typically generates 2-3 follow-up questions per query
- Best for time-sensitive queries or initial exploration
- Processing time: ~1-3 minutes
-
Balanced Mode (Default)
- Provides moderate depth and breadth
- Maximum of 7 concurrent queries
- No recursive deep diving
- Generates 3-5 follow-up questions per query
- Explores main concepts and their immediate relationships
- Processing time: ~3-6 minutes
- Recommended for most research needs
-
Comprehensive Mode
- Conducts exhaustive, in-depth research
- Maximum of 5 initial queries, but includes recursive deep diving
- Each query can spawn sub-queries that go deeper into the topic
- Generates 5-7 follow-up questions with recursive exploration
- Explores primary, secondary, and tertiary relationships
- Includes counter-arguments and alternative viewpoints
- Processing time: ~5-12 minutes
- Best for academic or detailed analysis
-
Query Analysis
- Analyzes initial query to determine optimal research parameters
- Assigns breadth (1-10 scale) and depth (1-5 scale) values
- Adjusts parameters based on query complexity and chosen mode
-
Query Generation
- Creates unique, non-overlapping search queries
- Uses semantic similarity checking to avoid redundant queries
- Maintains query history to prevent duplicates
- Adapts number of queries based on mode settings
-
Research Tree Building
- Implements a tree structure to track research progress
- Each query gets a unique UUID for tracking
- Maintains parent-child relationships between queries
- Tracks query order and completion status
- Provides detailed progress visualization through JSON tree structure
-
Deep Research (Comprehensive Mode)
- Implements recursive research strategy
- Each query can generate one follow-up query
- Reduces breadth at deeper levels (breadth/2)
- Maintains visited URLs to avoid duplicates
- Combines learnings from all levels
-
Report Generation
- Synthesizes findings into a coherent narrative
- Minimum 3000-word detailed report
- Includes inline citations and source tracking
- Organizes information by relevance and relationship
- Adds creative elements like scenarios and analogies
- Maintains factual accuracy while being engaging
- Uses Google's Gemini AI for:
- Query analysis and generation
- Content processing and synthesis
- Semantic similarity checking
- Report generation
- Implements concurrent processing for queries
- Uses progress tracking system with tree visualization
- Maintains research tree structure for relationship mapping
The research tree is implemented through the ResearchProgress
class that tracks:
- Query relationships (parent-child)
- Query completion status
- Learnings per query
- Query order
- Unique IDs for each query
The complete research tree structure is automatically saved to research_tree.json
when generating the final report, allowing for later analysis or visualization of the research process.
Example tree structure:
{
"query": "root query",
"id": "uuid-1",
"status": "completed",
"depth": 2,
"learnings": ["learning 1", "learning 2"],
"sub_queries": [
{
"query": "sub-query 1",
"id": "uuid-2",
"status": "completed",
"depth": 1,
"learnings": ["learning 3"],
"sub_queries": [],
"parent_query": "root query"
}
],
"parent_query": null
}