A Python script that scrapes Reddit user data, analyzes posting patterns, and generates comprehensive user personas with detailed citations.
- Scrapes Reddit user posts and comments using PRAW (Python Reddit API Wrapper)
- Analyzes user behavior patterns, interests, and activity
- Generates detailed user personas with evidence-based characteristics
- Provides citations with direct links to supporting posts/comments
- Outputs comprehensive text-based persona reports
- Python 3.7+
- Reddit API credentials (client ID, client secret)
- Required Python packages (see Installation section)
- Clone this repository:
git clone https://github.com/yourusername/reddit-persona-generator.git
cd reddit-persona-generator- Install required packages:
pip install -r requirements.txt- Set up Reddit API credentials:
- Go to https://www.reddit.com/prefs/apps
- Create a new application (script type)
- Create a
.envfile in the project root with your credentials:
REDDIT_CLIENT_ID=your_client_id_here
REDDIT_CLIENT_SECRET=your_client_secret_here
REDDIT_USER_AGENT=PersonaGenerator/1.0
- Run the
python reddit_persona_generator.py-
Enter a Reddit profile URL when prompted:
- Example:
https://www.reddit.com/user/spez - Example:
https://www.reddit.com/user/username
- Example:
-
The script will:
- Scrape the user's recent posts and comments
- Analyze their activity patterns and interests
- Generate a comprehensive persona report
- Save the results to
username_persona.txt
The generated persona includes:
- Basic Information: Account age, karma, activity statistics
- Interests: Primary subreddits and topics of engagement
- Personality Traits: Behavioral patterns derived from content analysis
- Activity Patterns: Peak activity times and posting frequency
- Expertise Areas: Topics where the user demonstrates knowledge
- Citations: Direct links to posts/comments supporting each characteristic
The repository includes sample persona files for the provided test users:
kojied_persona.txtHungry-Move-6603_persona.txt
- Analyzes posting frequency and timing patterns
- Identifies primary subreddits and topics of interest
- Performs content analysis for personality trait detection
- Tracks karma patterns and engagement levels
- Only analyzes publicly available Reddit data
- Provides citations to original sources
- Respects Reddit's API terms of service
- Does not store or redistribute user data
persona_generator.py # Main script
requirements.txt # Python dependencies
.env # API credentials (not included)
README.md # This file
sample_outputs/ # Example persona files
├── kojied_persona.txt
└── Hungry-Move-6603_persona.txt
└── spez_persona.txt
praw- Python Reddit API Wrapperpython-dotenv- Environment variable managementtextblob- Text processing and sentiment analysisrequests- HTTP requestsbeautifulsoup4- HTML parsing (if needed)
The script respects Reddit's API rate limits:
- Maximum 100 requests per minute
- Built-in delays between requests
- Graceful error handling for rate limit issues
Test the script with the provided example URLs:
-
"Invalid Reddit profile URL" error
- Ensure the URL follows the format:
https://www.reddit.com/user/username - Check that the username exists and is public
- Ensure the URL follows the format:
-
API authentication errors
- Verify your
.envfile contains correct credentials - Ensure your Reddit app is configured as a "script" type
- Verify your
-
Rate limiting
- The script includes built-in delays
- If you hit rate limits, wait a few minutes before retrying
-
Empty or minimal output
- User may have very few public posts/comments
- Some users have private profiles or deleted content
Created as part of a data analysis assignment focusing on social media user behavior analysis.