Skip to content
View mathachew7's full-sized avatar

Highlights

  • Pro

Block or report mathachew7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mathachew7/README.md

👋 Hey, I'm Subash Yadav

🎯 AWS Certified Data Engineer | Data Analytics Specialist | ML & ETL Pipeline Builder
I design and optimize data pipelines, analytics platforms, and ML workflows that turn massive, messy datasets into insights that drive action — in healthcare, finance, and enterprise systems.


🌍 Mission

I believe data is only powerful when it drives measurable impact.
From cutting ETL latency by 65% to uncovering millions in savings through analytics,
I focus on building scalable, reliable, and compliance-ready solutions that solve real-world problems.


⚙️ My Toolbox

Languages Cloud & Platforms Data/ML Stack
Python • SQL • Bash AWS (S3, Redshift, Athena) • Docker • Git Airflow • Pandas • scikit-learn • Power BI • Tableau
T-SQL • PL/SQL Snowflake • PostgreSQL • MS SQL Server Spark • Hadoop

🚀 Featured Projects

🏥 Healthcare Compliance & Analytics Dashboard

  • Built HIPAA-compliant ETL pipelines (Python + SQL) to clean, validate, and model 15M+ healthcare records.
  • Designed 12+ Power BI dashboards, cutting manual reporting time by 40% and improving operational decision-making.

🚆 Real-Time Public Transit Data Platform

  • Developed an AWS Lambda + Airflow streaming pipeline for GTFS feeds, reducing data refresh delays from 15 minutes to near real-time.
  • Enabled predictive maintenance insights for city transit systems.

📊 Financial Risk & Forecasting Engine

  • Created an XGBoost + time series forecasting model to identify SME revenue drop risks post-COVID.
  • Delivered a self-service analytics dashboard that improved policy decision turnaround times.

🎓 University Career Performance Analytics

  • Automated ETL workflows processing 10M+ student records into Snowflake models for accreditation KPI tracking.
  • Enabled 100+ staff to self-serve analytics through Tableau, reducing prep time by 60%.

✍️ Insight & Publications

✍️ Insights & Publications

  • 📊 Building Scalable ETL Pipelines with AWS & Airflow — Case study based on production healthcare data workflows (In Progress)
  • 🛠 Power BI for Compliance & Operational Dashboards — Internal training materials and best practices (Upcoming)
  • 📈 Optimizing Real-Time Data Streams for Transit Systems — Technical blog series (Planned)

🧭 Current Focus

  • Building real-time, high-volume data pipelines for enterprise and public systems.
  • Publishing practical engineering & analytics content that blends ML, compliance, and business impact.

📫 Let’s Connect

📎 LinkedIn 📬 Email 💻 Portfolio coming soon


🔁 "Don’t just code — create outcomes."

Pinned Loading

  1. MedAdhereAI MedAdhereAI Public

    AI-powered predictive pipeline to forecast medication adherence risk using real-world refill data from chronic disease patients. Built for research, explainability, and publication.

    1