A powerful tool to collect and structure detailed information about Y Combinator founders in a clean, searchable format. It helps users quickly discover founder profiles, company associations, and industry backgrounds from one of the world’s most influential startup ecosystems.
Designed for accuracy and flexibility, this project enables filtered access to founder data for research, analysis, and business intelligence use cases.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for y-combinator-founders you've just found your team — Let’s Chat. 👆👆
This project retrieves structured founder profiles associated with Y Combinator and outputs them as a clean dataset. It solves the problem of fragmented founder information by centralizing key identity, company, and industry details. It is ideal for researchers, investors, analysts, and startup ecosystem builders.
- Collects individual founder profiles with rich metadata
- Supports flexible filtering by name and company
- Outputs normalized, analysis-ready data
- Suitable for automation, research, and analytics workflows
| Feature | Description |
|---|---|
| Founder Filtering | Filter founders by first name, last name, or company name. |
| Limit Control | Restrict the number of returned founder profiles. |
| Rich Profiles | Includes names, companies, industries, batches, and tags. |
| Structured Output | Returns clean, consistent records ready for analysis. |
| Scalable Design | Handles small queries as well as large founder datasets. |
| Field Name | Field Description |
|---|---|
| id | Unique identifier of the founder. |
| firstName | Founder’s first name. |
| lastName | Founder’s last name. |
| fullName | Combined full name of the founder. |
| currentCompany | Current company affiliation, if available. |
| hackerNewsId | Associated Hacker News username. |
| avatarThumb | URL to the founder’s profile image. |
| currentTitle | Current professional title, if available. |
| companySlug | Identifier for the founder’s primary company. |
| topCompany | Indicates association with a top-ranked company. |
| allCompaniesText | Text list of all associated companies. |
| industries | List of industries the founder has worked in. |
| parentIndustries | High-level industry categories. |
| subIndustries | Detailed industry classifications. |
| currentRegion | Geographic region of activity. |
| titles | Historical or current titles held. |
| batches | Y Combinator batch participation. |
| tags | Metadata tags associated with the profile. |
[
{
"id": 585,
"firstName": "Sam",
"lastName": "Altman",
"fullName": "Sam Altman",
"currentCompany": null,
"hackerNewsId": "sama",
"avatarThumb": "https://bookface-images.s3.amazonaws.com/avatars/53d3aaf413bcd879484bbe18904199aeddfbaa1d.jpg",
"currentTitle": null,
"companySlug": "loopt",
"topCompany": false,
"allCompaniesText": "Y Combinator, Loopt",
"industries": ["Consumer", "Social"],
"parentIndustries": ["Consumer"],
"subIndustries": ["Consumer -> Social"],
"currentRegion": "Unspecified",
"titles": [],
"batches": ["S05"],
"tags": ["ycdc_public"]
}
]
🎉 Y Combinator Founders/
├── src/
│ ├── main.py
│ ├── filters/
│ │ ├── name_filter.py
│ │ └── company_filter.py
│ ├── models/
│ │ └── founder.py
│ ├── utils/
│ │ └── validators.py
│ └── output/
│ └── dataset_writer.py
├── data/
│ ├── sample_input.json
│ └── sample_output.json
├── requirements.txt
└── README.md
- Startup investors use it to identify founders and analyze their industry backgrounds for deal sourcing.
- Market researchers use it to study trends across Y Combinator batches and sectors.
- Recruiters use it to discover experienced founders for advisory or leadership roles.
- Founders use it to research peers and understand ecosystem dynamics.
- Analysts use it to build structured founder datasets for reports and dashboards.
How do I filter founders by name or company? You can provide filter values for founder names or company names, and the system will return only matching profiles.
Is there a limit to how many founders can be returned? Yes, you can define a numeric limit to control the maximum number of profiles returned in a single run.
What format is the output data in? All data is returned in a structured JSON format, making it easy to integrate with databases or analytics tools.
Can this be integrated into automated workflows? Yes, the project is designed to work well in automated data pipelines and scheduled jobs.
Primary Metric: Processes hundreds of founder profiles per run with consistent response times.
Reliability Metric: Maintains a high success rate with stable filtering and data extraction.
Efficiency Metric: Optimized data handling ensures low memory usage even on larger datasets.
Quality Metric: Outputs complete, normalized founder records with minimal missing fields.
