A Python-based pipeline to collect KOSPI and NASDAQ stock data, upload it to MongoDB Atlas, and convert it into FinMA-compatible training datasets in both regression and classification formats.
## 📁 Project Structure
project-directory/
├── multiclass/
│ └── \[Multiclass dataset generation notebooks]
├── regression/
│ └── \[Regression dataset generation notebooks]
├── (No DB) kospi_daily_jsonl.ipynb
├── (No DB) kospi_weekly_jsonl.ipynb
├── (No DB) kospi_monthly_jsonl.ipynb
├── (No DB) nasdaq_jsonl_parser.ipynb
├── (No DB) regression_kospi_daily_jsonl.ipynb
├── (No DB) regression_kospi_jsonl.ipynb
├── (No DB) regression_nasdaq_jsonl_parser.ipynb
├── KOSPI to Mongo.py
├── NASDAQ to Mongo.py
├── NASDAQ to Mongo use Ticker.py
├── kospi_daily_jsonl_5_days.py
├── kospi_daily_jsonl_10_days(with overlap).py
├── kospi_monthly_jsonl_10_months.py
├── kospi_weekly_jsonl_10_weeks.py
├── 0528_FastSHAP.ipynb
├── generate_masked_prompts.ipynb
├── .gitignore
├── requirements.txt
└── README.md
- ✅ Collect KOSPI & NASDAQ stock data (Top 50)
- ✅ Upload structured data to MongoDB Atlas
- ✅ Generate training data in
.jsonlformat for:- Multi-class classification
- Regression prediction (e.g., % change)
- ✅ Support various time frames (daily, weekly, monthly)
- ✅ Implement SHAP-based feature importance analysis
KOSPI to Mongo.py: Fetch and upload KOSPI top 50 stock dataNASDAQ to Mongo.py: Fetch NASDAQ data using hardcoded tickersNASDAQ to Mongo use Ticker.py: Fetch NASDAQ data using tickers from MongoDB
(No DB) *.ipynb: Generate.jsonldata directly from raw FinanceDataReader or yfinance output (no MongoDB dependency)kospi_daily_jsonl_5_days.py: Create 5-day window JSONL fileskospi_daily_jsonl_10_days(with overlap).py: Create overlapping 10-day JSONL datasetkospi_monthly_jsonl_10_months.py: Monthly overlapping datasetkospi_weekly_jsonl_10_weeks.py: Weekly overlapping dataset
0528_FastSHAP.ipynb: FastSHAP implementation for regression feature importancegenerate_masked_prompts.ipynb: Prompt generation for language model training
- Clone the repository:
git clone https://github.com/yourusername/your-repo-name.git
cd your-repo-name- Set up environment variables:
Create a .env file with:
MONGO_PASSWORD=your_mongodb_password- Install dependencies:
pip install -r requirements.txtAll dependencies are listed in requirements.txt. Major packages:
FinanceDataReaderyfinancepandaspymongotqdmpython-dotenv
(You can auto-generate the list using pip freeze > requirements.txt.)
For questions or contributions, please contact: [Your Name or GitHub handle here]