Skip to content

Add startup data loader pipeline#51

Open
TheMastermindNetwork wants to merge 1 commit intomainfrom
data-pipeline-loader
Open

Add startup data loader pipeline#51
TheMastermindNetwork wants to merge 1 commit intomainfrom
data-pipeline-loader

Conversation

@TheMastermindNetwork
Copy link
Collaborator

What this does:
Loads the Startup Success Prediction dataset from Kaggle, cleans and filters it into data/processed/startups.json ready for the similarity tool.

Files changed:

  • app/services/data_loader.py - main pipeline logic
  • data/raw/startup data.csv - raw Kaggle dataset
  • data/processed/startups.json - cleaned output (923 companies)

How to run:
python app/services/data_loader.py

Verify it works:
python -c "import json; data=json.load(open('data/processed/startups.json')); print(f'Loaded {len(data)} companies'); print('Sample:', data[0])"

Output:
Loaded 923 companies
Sample: {'company_name': 'Bandsintown', 'industry': 'music', 'description': 'Music company based in San Diego, CA. Status: acquired.', 'founded_year': 2007, 'status': 'acquired', 'funding_total_usd': 375000.0, 'funding_rounds': 3.0, 'has_VC': False, 'is_top500': False}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant