yashraj10

Hello, I'm Yashraj Jadhav 👋

💡 MS Business Analytics (STEM) @ USC Marshall · Data Scientist · ML Engineer

I build ML systems that go beyond notebooks — from causal inference pipelines informing multi-million dollar decisions to production Transformer models deployed on the cloud. My work sits at the intersection of deep learning, agentic AI, and real-world deployment, with a focus on systems that are defensible end-to-end, not just metric-maximizing.

🧠 Field of Work

End-to-End ML Systems & Production Deployment Built a 3-model ICU census forecasting system (Random Forest + Ridge + Survival Analysis) deployed as a FastAPI microservice on GCP Cloud Run with Docker, 6 endpoints, Pydantic v2 validation, and 12 integration tests. Real-time serving with p50/p95 latency instrumentation.

Deep Learning & Sequence Modeling Designed a custom PyTorch Transformer encoder (4 layers, 4-head, 128-dim) from scratch on 200K Dota 2 behavioral sequences for rage quit prediction. Achieved AUC-PR 0.269, beating XGBoost/LSTM/LR baselines. Attention weight extraction via forward hooks for interpretability.

Agentic AI & LLM Pipelines Built a 5-node LangGraph agentic pipeline (AI Repo Co-Pilot) using GPT-4o-mini for automated code review — 33/33 adversarial eval tests passing. Multi-step orchestration with structured output validation and self-healing fallback logic.

Causal Inference & Uplift Modeling Designed causal inference pipelines at Capgemini supporting $4.2M pharmaceutical go/no-go decisions — treatment/control cohorts, stratified subgroup analysis, effect sizes with confidence intervals, and SHAP-based interpretability on XGBoost churn models with PSI drift detection.

Recommendation Systems & Distributed ML Engineered a Neural Collaborative Filtering system on the MovieLens 32M dataset using PySpark feature engineering, TensorFlow NCF, Apache Airflow orchestration, MLflow experiment tracking, and AWS S3 as a 4-tier data lake. Drift-gated model promotion via PSI monitoring.

RAG & LLM-Grounded Decision Support Built Decision Twin — a Gemini API + RAG architecture for retention strategy recommendations — comparing grounded (RAG) vs. ungrounded LLM outputs across accuracy, hallucination rate, and business coherence metrics.

💼 Professional Experience

Data Scientist @ Capgemini Technology Services (July 2022 – August 2024) Built production ML on 500K+ patient records — readmission classifiers, K-Means diagnostic clustering on 2.5 TB, and SQL ETL pipelines processing 50K+ daily records. Deployed TensorFlow SavedModel on GCP Cloud Run; AUC-ROC 0.84, F1 0.63, 99.8% request success rate.

Data Science Intern @ Capgemini Technology Services (March 2022 – July 2022) Profiled 300K+ patient records and conducted hypothesis testing across 3 clinical units to shape feature selection for production models.

🛠️ Tech Stack

🔤 Languages

🔧 Libraries and Frameworks

⚙️ Tools & Platforms

💻 Operating Systems

📝 Writing

I write about ML systems, model interpretability, and production engineering on Medium.
→ Rage Quit Predictor: Building a Transformer from Scratch

🔗 Let's Connect

"Production ML isn't about the model. It's about what happens after the model."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly