A comprehensive platform that scrapes targeted tweets from X (Twitter), categorizes them using AI to identify pain points and product requests, and presents insights through an interactive dashboard for startup ideation.
What: Automated system that scrapes tweets matching specific keywords and categorizes them into:
- Product Requests: Tweets asking for tools, apps, or solutions that don't exist
- Pain Points: Tweets expressing frustration about existing problems
- Advice Requests: Tweets asking for recommendations or guidance
- Wish Lists: Tweets wishing certain tools or features existed
Why: Identify unmet market needs and early startup opportunities from real user frustrations and requests.
The system follows a modular architecture with three main components:
- Scraper (
apify_scraper.py): Uses Apify's Twitter Scraper to fetch tweets daily based on configured keywords - Categorizer (
categorizer.py): Processes tweets through Gemini 2.5 Flash to classify and extract insights - Dashboard (
dashboard.py): Interactive web interface built with Dash for visualizing results
| Component | Technology | Purpose |
|---|---|---|
| Scraper | Apify Client + Twitter Scraper | Fetch tweets with configurable filters |
| Categorizer | Google Gemini 2.5 Flash | AI-powered tweet analysis and classification |
| Dashboard | Dash + Plotly + Bootstrap | Interactive web visualization |
| Storage | JSON files | Structured data persistence |
| Automation | Python scripts | Daily job orchestration |
- Automated Daily Scraping: Configurable keyword-based tweet collection
- AI-Powered Analysis: Intelligent categorization using Gemini 2.5 Flash
- Interactive Dashboard: Dark/light mode with date selection and filtering
- Structured Data: JSON-based storage with analysis results
- Error Handling: Comprehensive logging and error recovery
- Responsive Design: Mobile-friendly dashboard interface
gummy-x/
├── apify_scraper.py # Twitter/X scraping using Apify
├── categorizer.py # AI-powered tweet analysis
├── dashboard.py # Interactive web dashboard
├── daily_job.py # Automated daily execution
├── scraper_config.json # Scraping configuration
├── requirements.txt # Python dependencies
├── data/
│ ├── scraped/ # Raw tweet data (DDMMYY_data.json)
│ └── analysis/ # Processed analysis results (DDMMYY_analysis.json)
├── logs/ # Application logs
└── gummyx/ # Virtual environment
- Python 3.11+
- Apify account and API token
- Google Gemini API key
-
Clone the repository:
git clone <repository-url> cd gummy-x
-
Create and activate virtual environment:
python -m venv gummyx source gummyx/bin/activate # On Windows: gummyx\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables: Create a
.envfile in the project root:GEMINI_API_KEY=your_gemini_api_key_here APIFY_TOKEN=your_apify_token_here GITHUB_TOKEN=your_github_token_here # Optional, for automated commits
-
Configure scraping parameters: Edit
scraper_config.jsonto customize:- Keywords and search terms
- Date ranges
- Engagement filters (likes, replies, retweets)
- Language and location settings
-
Run scraping:
python apify_scraper.py
-
Analyze tweets:
python categorizer.py
-
Launch dashboard:
python dashboard.py
The dashboard will be available at
http://localhost:8050
Run the complete pipeline:
python daily_job.pyThis will:
- Execute scraping with current configuration
- Process tweets through AI analysis
- Save results to date-stamped files
- Optionally commit changes to Git
Key configuration options:
words_or: Array of keywords to search for (OR logic)words_and: Array of required keywords (AND logic)min_likes,min_replies,min_retweets: Engagement filterssince,until: Date range for scrapinglang: Language filter (default: "en")maxItems: Maximum tweets to collect
- Date Selection: Choose from available analysis dates
- Theme Toggle: Switch between light and dark modes
- Interactive Cards: Expandable product request details
- Tweet Examples: View source tweets for each category
- Error Handling: Graceful handling of missing data
{
"id": "tweet_id",
"text": "tweet_content",
"user_handle": "@username",
"created_at": "2025-07-26T10:30:00Z",
"engagement_score": 15,
"url": "https://twitter.com/..."
}{
"total_tweets_analyzed": 150,
"product_requests": [
{
"category": "Product Request",
"description": "Tool description",
"pain_point": "Problem being solved",
"target_audience": "Who needs this",
"urgency_level": "High/Medium/Low",
"tweets": [...]
}
],
"token_usage": {"input": 5000, "output": 2000}
}All components log to the logs/ directory:
apify_scraper.log: Scraping operationscategorizer.log: AI analysis operationsdashboard.log: Dashboard interactions
This project is licensed under the MIT License - see the LICENSE file for details.