Add startup data loader pipeline by TheMastermindNetwork · Pull Request #51 · TechX-Resources/startup-validator

TheMastermindNetwork · 2026-03-20T16:25:39Z

What this does:
Loads the Startup Success Prediction dataset from Kaggle, cleans and filters it into data/processed/startups.json ready for the similarity tool.

Files changed:

app/services/data_loader.py - main pipeline logic
data/raw/startup data.csv - raw Kaggle dataset
data/processed/startups.json - cleaned output (923 companies)

How to run:
python app/services/data_loader.py

Verify it works:
python -c "import json; data=json.load(open('data/processed/startups.json')); print(f'Loaded {len(data)} companies'); print('Sample:', data[0])"

Output:
Loaded 923 companies
Sample: {'company_name': 'Bandsintown', 'industry': 'music', 'description': 'Music company based in San Diego, CA. Status: acquired.', 'founded_year': 2007, 'status': 'acquired', 'funding_total_usd': 375000.0, 'funding_rounds': 3.0, 'has_VC': False, 'is_top500': False}

Add startup data loader pipeline

a9db226

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add startup data loader pipeline#51

Add startup data loader pipeline#51
TheMastermindNetwork wants to merge 1 commit intomainfrom
data-pipeline-loader

TheMastermindNetwork commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TheMastermindNetwork commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant