Contact Information:
- 📞 +1 857-350-7158
- 📧 kaustubh202500@gmail.com
- 📍 Bensalem, PA
- LinkedIn | GitHub | Medium | Tableau
Data Analyst and Engineer with 3+ years of experience in optimizing ETL pipelines, cloud infrastructure, and data visualization. Proficient in Python, SQL, Snowflake, Tableau, and AWS, with expertise in data modeling, statistical analysis, and workflow automation. Skilled in transforming datasets into actionable insights and driving cost-effective, data-driven decisions.
- Python (pandas, NumPy, Matplotlib, seaborn)
- R, SQL (MSSQL, T-SQL, PL SQL), NoSQL (MongoDB)
- AWS (EC2, Lambda, S3, RDS, DynamoDB, Step Functions, Redshift, EventBridge, CloudWatch, SNS, SES)
- Snowflake, GCP, Azure
- ETL Pipelines, Data Cleaning, Schema Migration, Incremental Loads
- Dimensional Modeling, Data Warehousing
- Alteryx, SSIS, Docker, Git, Apache Airflow, Streamlit, Apache Kafka, ER Studio
- Tableau (LOD Expressions, Filters, Parameters, Heatmaps)
- Power BI (DAX, Slicers, Conditional Formatting)
- Time-Series Analysis, Sentiment Analysis, Trend Analysis, Predictive Modeling
Data Support Specialist
Feb 2025 – Present
- Built and launched CI/CD pipeline from scratch using GitHub Actions, automating validation, deployment, and synchronization for Python, SQL, HTML, JSON, and CSV scripts, improving development and production workflows
- Automated version control, validation, formatting, merging, production syncing and code reviews with 3 YAML workflows, in CI/CD pipeline, integrating automated testing to ensure high-quality, production-ready scripts in Agile environment
- Configured self-hosted GitHub Actions runner on Azure VM, for seamless production code sync, reducing deployment errors by 30%
- Led pilot testing of 30+ scripts across multiple formats, refining automation before scaling to 300+ interdependent scripts with a dependency-aware system, optimizing CI/CD integration and deployment efficiency
- Created Python-based scripts for data extraction from DebtMaster, leveraging Flask to create web-based interfaces with HTML and CSS for interactive data visualization and automation
- Wrote SQL queries to meet stakeholder-specific data requirements, ensuring well-structured insights for business decision-making
Data Analyst
Feb 2024 – Jan 2024
- Developed a chatbot application with FastAPI, Python, and sBERT model to answer user queries and FAQs for website visitors, using 80% similarity threshold to provide relevant answers, streamlining internal communication and improving response times
- Engineered ML-powered API with FastAPI and Uvicorn, loading pre-trained model from pickle file to deliver real-time responses and log unanswered queries in a MySQL database for continuous improvement
- Analyzed 500+ queries with sentiment analysis, topic modeling to identify trends, emotions and themes, enhancing accuracy by 15%
- Facilitated data handling by automating PII extraction workflows with NLTK, employing named entity recognition (NER) to detect and extract sensitive information, improving communication efficiency by 40%
Data Analyst Intern
Jun 2023 – Dec 2023
- Leveraged pandas profiling to generate a report on 28K+ records, identifying missing values, duplicates, and schema mismatches
- Designed star schema dimensional model for MSSQL and used Alteryx ETL to clean, transform data, improving completeness by 40%
- Assessed donation data in Python to evaluate alumni engagement and trends in donation patterns using time series analysis
- Implemented Tableau dashboards with LOD expressions, calculated fields, and time-series analysis to analyze student performance, donations and enhancing forecasting accuracy ($37K vs. $28K manually) using Tableau trends
Data Analyst
Dec 2019 – Aug 2021
- Designed ETL pipelines in Airflow, utilizing DAGs for task scheduling, monitoring, and integrating data from 3 sources: student performances, workshops, and customer feedback sources
- Analyzed robotic training session data with SciPy, statsmodels, using feature engineering to boost final assessment scores by 20%
- Developed Tableau dashboards with filters, heatmaps, and KPI trends to assess workshop performance and resource allocation
- Led Python and robotics training for 30+ students, covering data analysis, JavaScript game design, app development, data visualization, and programming fundamentals
Northeastern University | Boston, MA
M.S. in Industrial Engineering
Jan 2022 – Dec 2023
GPA: 3.7/4.0
Certifications: AWS Solutions Architect Associate, AWS Cloud Practitioner
- Stock Market Analysis: GitHub
Tools: Lambda, Docker, EventBridge, DynamoDB, Python - IPL Chatbot: GitHub
Tools: OpenAI, Snowflake, Streamlit, Python
- [LinkedIn](https://www.linkedin.com/in/kaustubh-khede