Skip to content
View AndyyyPhan's full-sized avatar

Block or report AndyyyPhan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AndyyyPhan/README.md

Andy Phan

ML Systems & Backend Engineer · Building intelligent, data-intensive systems that scale

I design and ship production ML pipelines, real-time backend services, and scalable data infrastructure. From fine-tuning transformers (RoBERTa, ResNet-50) to deploying threat detection microservices processing 500K+ daily events — I build systems that deliver measurable impact.

📍 CS @ University of Virginia · Graduating May 2026 · Open to new-grad SWE roles

LinkedIn Email


🎯 Highlights

  • Fine-tuned RoBERTa transformer for hate speech detection on 25K tweets — achieved 96.9% accuracy and 0.811 macro-F1, outperforming classical ML baselines
  • Built a real-time threat detection microservice (Python/FastAPI) integrated into a Zero Trust security platform — reduced mean time to detect by 40%
  • Engineered a data ingestion pipeline processing 500K+ daily OSINT events via Kafka + PostgreSQL with sub-second entity resolution
  • Trained ResNet-50 CNN on 80K images with GPU-accelerated pipeline and Grad-CAM validation — 93% accuracy, 18% recall improvement
  • Designed a geohash-based proximity matching system with dynamic precision, achieving sub-second queries while minimizing Firestore read costs

🚀 Featured Projects

Proximity-based social matching platform — Co-Founder & Lead Engineer

  • Designed a geohash spatial indexing system with dynamic precision and bounded neighbor expansion for sub-second location queries
  • Implemented real-time messaging with atomic sequence counters and Firestore transactions, ensuring strict ordering despite concurrent writes
  • Architected for scale: minimized Firestore read amplification while maintaining consistent message delivery

Stack: TypeScript, React, Firestore, Geohash Indexing


Transformer-based Twitter hate speech detection achieving 96.9% accuracy

  • Built an end-to-end NLP pipeline (25K tweets) with text normalization, stratified sampling, and TF-IDF + Logistic Regression baselines achieving 96.0% accuracy and 0.961 weighted F1
  • Fine-tuned RoBERTa-base with oversampling and focal loss, boosting macro-F1 from 0.794 → 0.811 and outperforming classical baselines
  • Implemented robust evaluation with stratified cross-validation and class-weighted metrics for imbalanced data

Stack: Python, PyTorch, Hugging Face Transformers, RoBERTa, scikit-learn


ResNet-50 CNN classifying fresh vs. rotten fruit with 93% accuracy

  • Fine-tuned ResNet-50 on an 80K-image FruitVision dataset with targeted data augmentation and Grad-CAM validation, achieving 93% accuracy and F1 > 0.90 across five fruit types
  • Engineered a GPU-accelerated training pipeline with stratified splitting, corruption detection, and adaptive LR scheduling
  • Reduced overfitting and increased per-class recall by 18% compared to baseline through systematic hyperparameter tuning

Stack: Python, PyTorch, ResNet-50, Grad-CAM, NumPy, Matplotlib


🛠 Tech Stack

Category Technologies
Languages Python, TypeScript, JavaScript (ES6+), Java, SQL
AI / ML PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers, LangChain, RAG
ML Techniques Fine-Tuning (LoRA), Transfer Learning, Reinforcement Learning (PPO, TD(λ)), Feature Engineering
Models RoBERTa, ResNet-50, Prophet, ARIMA, LLMs
Data & Scientific NumPy, Pandas, Matplotlib, Seaborn, Data Pipelines
Backend Node.js, Express, FastAPI, Flask, Django, GraphQL, REST APIs
Data & Infra Kafka, Redis, PostgreSQL, MongoDB, DynamoDB, Firestore
Cloud & MLOps AWS (Lambda, EC2, S3, API Gateway), Docker, Kubernetes, GitHub Actions, CI/CD, GPU Acceleration (CUDA)
Frontend React, Next.js, Tailwind CSS

💼 Experience

Software Engineer Intern (ML) · UVA Biocomplexity Institute · Nov 2025 – Present

  • Engineered automated weekly forecast submissions with 99.9% uptime for CDC health data
  • Built forecasting service delivering <500ms latency predictions across 52 jurisdictions

Software Engineer Intern (AI/ML Systems) · RIIG Technology · Aug 2025 – Dec 2025

  • Deployed real-time threat detection microservice, reducing mean time to detect by 40%
  • Built Kafka + PostgreSQL pipeline processing 500K+ daily OSINT events

Software Engineer Intern · Innova8 LLC · Jun 2025 – Aug 2025

  • Developed W3C Verifiable Credentials API for healthcare/education identity workflows
  • Built React/TypeScript dashboard reducing manual reporting overhead by 60%

Software Engineer Intern · SS Technology Consultants · Aug 2024 – Dec 2024

  • Shipped 3 production features improving task completion rates by 20%
  • Automated data migration workflows, saving 15+ hours/week

📚 Education

University of Virginia · B.S. Computer Science · May 2026
Minors in Data Science & Applied Mathematics · GPA: 3.76 · Dean's List ×6
AI Focal Path · Coursework: Algorithms, Software Engineering, Computer Systems, Databases

Teaching Assistant — Multivariable Calculus & Statistics (200+ students)


🔍 What I'm Looking For

Currently seeking new-grad Software Engineer or ML Engineer roles starting Summer 2026.

Interested in: ML/AI systems, backend infrastructure, data platforms, or full-stack product engineering at high-growth startups or established tech companies.


Check out my pinned repos below for code samples.

Pinned Loading

  1. Common-Grounds Common-Grounds Public

    A Flutter mobile app that connects students through proximity and shared interests. Find study partners, form study groups, and build campus connections with location-based matching, real-time chat…

    Dart

  2. ResNet50-Fruit-Freshness-Detection ResNet50-Fruit-Freshness-Detection Public

    Forked from avery32/DS-4002-Project-3

    Jupyter Notebook

  3. RoBERTa-Hate-Speech-Classifier RoBERTa-Hate-Speech-Classifier Public

    Forked from avery32/DS4002-Project-1

    Jupyter Notebook

  4. ChessOpeningTrainer ChessOpeningTrainer Public

    Program for practicing chess openings.

    Java

  5. MuscleMap MuscleMap Public

    MuscleMap is a web-based interactive muscle exercise guide that allows users to explore different muscle groups on a human body diagram and receive recommended exercises. Users can create workout p…

    PHP

  6. Dining-Hall-Reporter Dining-Hall-Reporter Public

    CS 3240 project. Whistleblowing app using Django framework. Report UVA dining halls with the Dining Hall Reporter!

    HTML