A Python-based service that monitors GitHub events and provides metrics via a REST API.
- Real-time monitoring of GitHub events (WatchEvent, PullRequestEvent, IssuesEvent)
- REST API endpoints for metrics:
- Average time between pull requests for a repository
- Event counts by type with time-based filtering
- Visual representation of PR intervals
- Rate limit handling for GitHub API
- Efficient event storage and deduplication
- Beautiful visualization of PR metrics
flowchart TB
User(("End User")) -- Sends request for data --> API["REST API Service"]
API -- gets data --> Storage[("Event Storage")]
Collector["Event Collector"] -- Collects events --> GitHub["GitHub API"]
Collector -- sends events --> Storage
style User stroke-width:2px
style API stroke-width:2px
style Storage fill:#fbb,stroke:#333,stroke-width:2px
style Collector stroke-width:2px
style GitHub fill:#bfb,stroke:#333,stroke-width:2px
- Clone the repository:
git clone <repository-url>
cd to githur repository- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
# Create .env file
GITHUB_TOKEN=your_github_token # Optional but recommended
POLL_INTERVAL=60 # Seconds between GitHub API polls- Run the service:
uvicorn app.api:app --reloadGET /metrics/{owner}/{repo}/avg-pr-interval
Returns the average time between pull requests for a repository.
GET /metrics/{owner}/{repo}/event-counts?offset=10
Returns event counts by type for the last offset minutes.
GET /viz/{owner}/{repo}/pr-intervals.png?days=10
Returns a PNG visualization of PR intervals over the specified number of days.
- Background task polls GitHub's public events API every 60 seconds
- Filters for WatchEvent, PullRequestEvent, and IssuesEvent
- Handles rate limiting using GitHub API headers
- Deduplicates events using event IDs
- Events are stored with repository and timestamp information
- Efficient querying for specific repositories and time ranges
- Automatic cleanup of old events
- Two-panel visualization for PR metrics:
- PR creation timeline (scatter plot)
- PR interval distribution with mean and median lines
-
GitHub API Availability:
- The service assumes the GitHub API is generally available
- Rate limiting is handled gracefully
-
Time Zones:
- All timestamps are stored and processed in UTC
- API responses maintain UTC timestamps
-
Performance:
- The service is designed for moderate load
- Event collection runs every 60 seconds
- Visualization is generated on-demand
- Add authentication for API endpoints
- Implement event data persistence
- Add more metrics and visualizations
- Add monitoring and alerting
- Implement data retention policies
- Add caching for frequently accessed metrics