Skip to content
View yashraj10's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report yashraj10

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
yashraj10/README.md

Hello, I'm Yashraj Jadhav πŸ‘‹

πŸ’‘ MS Business Analytics (STEM) @ USC Marshall Β· Data Scientist Β· ML Engineer

Profile Views

I build ML systems that go beyond notebooks β€” from causal inference pipelines informing multi-million dollar decisions to production Transformer models deployed on the cloud. My work sits at the intersection of deep learning, agentic AI, and real-world deployment, with a focus on systems that are defensible end-to-end, not just metric-maximizing.


🧠 Field of Work

End-to-End ML Systems & Production Deployment Built a 3-model ICU census forecasting system (Random Forest + Ridge + Survival Analysis) deployed as a FastAPI microservice on GCP Cloud Run with Docker, 6 endpoints, Pydantic v2 validation, and 12 integration tests. Real-time serving with p50/p95 latency instrumentation.

Deep Learning & Sequence Modeling Designed a custom PyTorch Transformer encoder (4 layers, 4-head, 128-dim) from scratch on 200K Dota 2 behavioral sequences for rage quit prediction. Achieved AUC-PR 0.269, beating XGBoost/LSTM/LR baselines. Attention weight extraction via forward hooks for interpretability.

Agentic AI & LLM Pipelines Built a 5-node LangGraph agentic pipeline (AI Repo Co-Pilot) using GPT-4o-mini for automated code review β€” 33/33 adversarial eval tests passing. Multi-step orchestration with structured output validation and self-healing fallback logic.

Causal Inference & Uplift Modeling Designed causal inference pipelines at Capgemini supporting $4.2M pharmaceutical go/no-go decisions β€” treatment/control cohorts, stratified subgroup analysis, effect sizes with confidence intervals, and SHAP-based interpretability on XGBoost churn models with PSI drift detection.

Recommendation Systems & Distributed ML Engineered a Neural Collaborative Filtering system on the MovieLens 32M dataset using PySpark feature engineering, TensorFlow NCF, Apache Airflow orchestration, MLflow experiment tracking, and AWS S3 as a 4-tier data lake. Drift-gated model promotion via PSI monitoring.

RAG & LLM-Grounded Decision Support Built Decision Twin β€” a Gemini API + RAG architecture for retention strategy recommendations β€” comparing grounded (RAG) vs. ungrounded LLM outputs across accuracy, hallucination rate, and business coherence metrics.


πŸ’Ό Professional Experience

Data Scientist @ Capgemini Technology Services (July 2022 – August 2024) Built production ML on 500K+ patient records β€” readmission classifiers, K-Means diagnostic clustering on 2.5 TB, and SQL ETL pipelines processing 50K+ daily records. Deployed TensorFlow SavedModel on GCP Cloud Run; AUC-ROC 0.84, F1 0.63, 99.8% request success rate.

Data Science Intern @ Capgemini Technology Services (March 2022 – July 2022) Profiled 300K+ patient records and conducted hypothesis testing across 3 clinical units to shape feature selection for production models.


πŸ› οΈ Tech Stack

πŸ”€ Languages

Python SQL R

πŸ”§ Libraries and Frameworks

PyTorch TensorFlow Keras scikit-learn XGBoost HuggingFace Transformers LangGraph OpenAI Gemini FastAPI Streamlit pandas NumPy Matplotlib Seaborn PySpark

βš™οΈ Tools & Platforms

GCP AWS SageMaker Docker MLflow Airflow Spark Kafka REST-API Git Tableau Power BI

πŸ’» Operating Systems

macOS Linux



πŸ“ Writing

I write about ML systems, model interpretability, and production engineering on Medium.
β†’ Rage Quit Predictor: Building a Transformer from Scratch


πŸ”— Let's Connect

LinkedIn Portfolio GitHub Email


"Production ML isn't about the model. It's about what happens after the model."

Pinned Loading

  1. rage-quit-predictor rage-quit-predictor Public

    Custom PyTorch transformer predicting Dota 2 rage quits from behavioral event sequences. AUC-PR 0.269 Β· MLflow Β· Kafka Β· AWS S3 Β· PSI Drift Monitoring Β· Live demo included

    Python 1

  2. movie-recommender-pipeline movie-recommender-pipeline Public

    Production-grade movie recommendation engine: PySpark (32M ratings) β†’ TF Neural Collaborative Filtering β†’ Apache Airflow 8-task DAG β†’ MLflow registry + PSI drift monitoring β†’ AWS S3 data lake

    Python

  3. ai-repo-copilot ai-repo-copilot Public

    AI-powered CLI tool that analyzes any codebase and answers questions with exact file names and line numbers. Built with LangGraph + GPT-4o-mini. 33/33 eval score.

    Python

  4. churn-prediction-model churn-prediction-model Public

    Churn prediction system detecting behavioral disengagement via drift-based features (0.92 AUC) with IPW-adjusted uplift analysis. Includes Streamlit dashboard.

    Python

  5. ICU-Census-Prediction ICU-Census-Prediction Public

    End-to-end ICU census prediction pipeline using arrival forecasting, LOS modeling, and discharge hazard analysis with real hospital data

    Python