Skip to content
View ShreyPatel4's full-sized avatar
🏠
Working from home
🏠
Working from home
  • Boston
  • 05:21 (UTC -05:00)

Organizations

@ganpat-university

Block or report ShreyPatel4

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ShreyPatel4/README.md

Hi, I’m Shrey Patel 👋

🚀 Data Engineer | Generative AI Enthusiast | Full Stack Developer

Welcome to my GitHub profile! I am a dedicated Data Engineer with hands-on experience building scalable data pipelines, integrating generative AI into production workflows, and optimizing complex ML architectures. I love tackling real-world problems with data-driven insights, focusing on performance, reliability, and compliance.


🔧 Technologies & Tools

  • Languages & Programming: Python, R, SQL, Java, C++, Scala
  • Data Engineering & Distributed Systems: PySpark, Hive, Spark, Kafka, Airflow, Flink, Docker, Kubernetes, Jenkins, Git
  • Generative AI & ML: Diffusion Models, Large Language Models (LLMs), CNN-RNN Architectures, Self-Attention, TensorFlow, PyTorch
  • Cloud Services: AWS (Lambda, Rekognition, SageMaker, Glue), Azure Databricks, GCP (BigQuery, Vertex AI)
  • Databases & Data Warehouses: Snowflake, Redshift, BigQuery, MySQL, PostgreSQL, DynamoDB, HBase
  • Other Tools: Pentaho-ETL, Presto, Grafana, Datadog

🏆 Featured Projects

High-Performance Image Captioning System

  • Overview: Built a CNN-RNN-based captioning model enhanced with self-attention mechanisms to improve feature representation.
  • Performance: Leveraged mixed-precision (FP16) training and HPC resources, cutting memory usage by 40% and reducing training time by 50%.
  • Key Achievements:
    • +15% improvement in BLEU, CIDEr, METEOR, and ROUGE scores.
    • Real-time captioning deployed on Kubernetes with microservices architecture for high throughput and robust performance.

Latent Diffusion Model for Text-to-Image Synthesis

  • Overview: Implemented a Latent Diffusion Model (LDM) by integrating U-Net, VAE, and CLIP for high-fidelity image generation.
  • Performance: Fine-tuned CLIP for better text-image alignment, improving FID by 20% and achieving sub-100ms latency via ONNX/TensorRT.
  • Key Achievements:
    • Reduced training compute costs by 50% with FP16, DDP, and gradient checkpointing.
    • Built an efficient data pipeline (DiffusionDB, Parquet) to minimize I/O bottlenecks by 60%.

💼 Professional Experience

Data Engineer @ Ridgeant Technologies (Jan 2022 – Aug 2023)

  • Generative AI Pipelines: Expanded a unified data lakehouse on Apache Iceberg and Snowflake for large-scale model training on text, audio, and image data.
  • Real-Time Processing: Built real-time distributed data architecture with Apache Flink and Kafka, ensuring near-instant event processing and reducing latency by 60%.
  • AWS Rekognition Integration: Implemented real-time triggers with Lambda and Rekognition for image/video analysis, improving data verification accuracy by 25%.
  • ETL Optimization & Cloud Migration: Migrated legacy Pentaho workflows to Flink, cutting report generation time by 70% and maintaining 99.9% uptime.
  • Dynamic Pricing & Revenue Growth: Deployed a dynamic pricing model on SageMaker, boosting annual revenue by 35% and increasing customer satisfaction by 20%.
  • Governance & Compliance: Enforced HIPAA compliance in Snowflake/Iceberg, strengthening stakeholder trust and mitigating regulatory risks.

Software Data Engineer Intern @ ZF Friedrichshafen AG (Apr 2021 – Dec 2021)

  • Legacy Pipeline Modernization: Migrated a legacy SCD2 pipeline to a modern tech stack with Firebase Authentication and GCP Nearby-Search API.
  • API Performance: Improved API response times by 30%, supporting 10,000+ daily searches without compromising reliability.
  • BigQuery Integration: Orchestrated real-time patient data in BigQuery, ensuring high performance and data-protection compliance.

🎓 Education

Northeastern University
M.S. in Computer Software Engineering (Expected May 2025)

  • Relevant Coursework: Generative AI, High Performance Parallel Compute with Deep Learning, Big Data and Indexing
  • Activities & Achievements: Co-founder at CareWallet (Healthcare AI Startup), Project Lead at Google Developer Student Club

Ganpat University
B.S. in Computer Science and Engineering, Major in Big Data Analytics (July 2018 – May 2022)

  • Relevant Coursework: Probability & Statistics, Advanced Cloud Computing, Advanced Big Data Analytics
  • Activities & Achievements: Project Lead at Google Cloud Study Jam, 2x GCP Quest Leader at Google Cloud

🧠 What I’m Currently Learning / Working On

  • MLOps & Kubernetes: Automating end-to-end ML pipelines, including model versioning, deployment, and monitoring at scale.
  • Generative AI: Further refining LLMs and latent diffusion models for text-to-image and text-to-audio synthesis.
  • Real-Time Analytics: Experimenting with Apache Flink SQL for streaming data transformations and analytics.

📫 How to Reach Me


🌱 Fun Fact

I love combining my passion for photography with data visualization—capturing moments both in real life and through interesting analytics projects.


Last Updated: February 2025

Pinned Loading

  1. facebook/react facebook/react Public

    The library for web and native user interfaces.

    JavaScript 233k 47.7k

  2. facebook/infer facebook/infer Public

    A static analyzer for Java, C, C++, and Objective-C

    OCaml 15.1k 2k

  3. Advanced-Data-Predictive-Analytics Advanced-Data-Predictive-Analytics Public

    Advanced analytics which is used to make predictions about unknown Test-Cases From Test-Data. Predictive analytics uses many techniques from data mining, statistics, modeling, machine learning, and…

    Jupyter Notebook

  4. Advanced-MOOC-Result-Scraper- Advanced-MOOC-Result-Scraper- Public

    Advanced Automated Data-Mining Tool For MOOC Result to Scrap in one click.

    Python 1

  5. facebookarchive/react-360 facebookarchive/react-360 Public archive

    Create amazing 360 and VR content using React

    JavaScript 8.7k 1.2k