Skip to content

feat: Complete Greenhouse API implementation with CLI tools#25

Open
rothnic wants to merge 4 commits intoPickle-Pixel:mainfrom
rothnic:feat/greenhouse-support
Open

feat: Complete Greenhouse API implementation with CLI tools#25
rothnic wants to merge 4 commits intoPickle-Pixel:mainfrom
rothnic:feat/greenhouse-support

Conversation

@rothnic
Copy link

@rothnic rothnic commented Feb 27, 2026

Summary

This PR implements complete Greenhouse ATS support using the official Greenhouse Job Board API, adding 129 pre-configured AI/ML employers and comprehensive CLI management tools.

What's Included

Core Greenhouse Integration

  • Official API Integration: Uses boards-api.greenhouse.io/v1/boards/{token}/jobs endpoint for reliable, structured job data
  • Full Job Descriptions: Retrieves complete job descriptions with ?content=true parameter
  • Rich Job Data: Captures job_id, updated_at, offices, department, and full descriptions
  • User Customization: Supports ~/.applypilot/greenhouse.yaml for custom employer lists
  • Robust Error Handling: Retry logic for rate limits and graceful degradation

CLI Management Tools (5 Commands)

# Verify a company works with our implementation
applypilot greenhouse verify stripe

# Discover correct slugs from company names or URLs
applypilot greenhouse discover "Notion"
applypilot greenhouse discover --url https://careers.company.com

# Validate your entire employer configuration
applypilot greenhouse validate

# Add a specific job by URL
applypilot greenhouse add-job https://boards.greenhouse.io/stripe/jobs/12345 --dry-run

Pre-configured Employers

  • 129 Greenhouse ATS employers organized by category:
    • Core AI/ML: Scale AI, Anthropic, Cohere, etc.
    • Infrastructure: Stripe, Datadog, Cloudflare, MongoDB, etc.
    • Fintech: Robinhood, Coinbase, Plaid, Brex, etc.
    • And more categories

Test Coverage

25 tests passing

  • API response parsing
  • Job storage and deduplication
  • Location and query filtering
  • End-to-end integration

Verified Working

  • ✅ Successfully fetches 173+ jobs from Scale AI
  • ✅ CLI commands all functional
  • ✅ Jobs integrate with enrichment/scoring pipeline
  • ✅ Full descriptions stored for tailoring

Files Added/Modified

src/applypilot/discovery/greenhouse.py      # Core API implementation
tests/discovery/test_greenhouse.py          # Comprehensive test suite
src/applypilot/cli.py                        # CLI integration
src/applypilot/cli_greenhouse/__init__.py   # 5 CLI management tools
src/applypilot/config/greenhouse.yaml       # 129 employer configurations
CHANGELOG.md                                 # Documented changes
README.md                                    # Added to supported sources

New Dependencies

None. Uses existing project dependencies (httpx, pyyaml, typer).

Breaking Changes

None. This is a pure addition - all existing functionality remains unchanged.

Usage Example

# Run discovery including Greenhouse employers
applypilot run discover

# Validate your employer list
applypilot greenhouse validate

# Add a specific job you found
applypilot greenhouse add-job https://boards.greenhouse.io/scaleai/jobs/4413274005

# Continue with pipeline
applypilot run enrich score tailor

Notes

  • Greenhouse integration runs automatically during the discover stage
  • Jobs are deduplicated across all sources (Greenhouse, Workday, JobSpy, etc.)
  • All jobs feed into the same enrichment/scoring/tailoring pipeline

Add comprehensive Greenhouse ATS scraping to capture jobs from AI/ML startups:

New Module: src/applypilot/discovery/greenhouse.py
- HTML scraping with BeautifulSoup for job-boards.greenhouse.io
- Parallel execution with ThreadPoolExecutor
- Location filtering (remote detection, accept/reject patterns)
- Query matching for job title filtering
- Duplicate prevention via URL-based deduplication

New Config: src/applypilot/config/greenhouse.yaml
- 129 verified Greenhouse employers
- Organized by category: Core AI, Infrastructure, Fintech, Healthcare, etc.
- Companies: Scale AI, Stripe, Figma, Notion, MongoDB, Datadog, etc.

Pipeline Integration:
- Wired into _run_discover() alongside JobSpy, Workday, SmartExtract
- Stats tracking for new/existing jobs
- Error handling with graceful degradation

Testing:
- Comprehensive unit tests in tests/discovery/test_greenhouse.py
- Verified with Scale AI: found 32 jobs including ML Engineer roles
- All 129 employers load successfully
- Parallel search tested with 4 workers

Closes: Option A for expanding AI company coverage
Replace HTML scraping with Greenhouse Job Board API:
- Use boards-api.greenhouse.io/v1/boards/{token}/jobs endpoint
- Add full job descriptions from API (content=true parameter)
- Add new fields: job_id, updated_at, offices, description
- Add retry logic for rate limits (HTTP 429)
- Add user config override (~/.applypilot/greenhouse.yaml)
- Remove BeautifulSoup dependency for this module
- Update all tests for API-based implementation
- 25 tests passing
Integrate greenhouse CLI into main applypilot CLI:
- verify: Check if company slug is valid
- discover: Find slugs from company name or career URL
- validate: Check all companies in greenhouse.yaml
- list-employers: Display configured employers
- add-job: Add specific job from URL with structured display

Commands available via: applypilot greenhouse <command>
- Update CHANGELOG: reflect API-based approach (not HTML scraping)
- Document new CLI commands and user config override
- Update README: add 129 Greenhouse employers to supported sources
- Fix notion → notionhq slug in greenhouse.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant