Skip to content
View YanCotta's full-sized avatar
🧬
Excelsior
🧬
Excelsior

Highlights

  • Pro

Block or report YanCotta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
YanCotta/README.md


YanCotta

Due to NDAs and corporate policies, the vast majority of my professional-grade code resides in private repositories. The projects showcased here are primarily my academic and personal side-projects where I experiment, build, and deploy end-to-end systems from scratch.


TL;DR - Achievements

• Architected a genomic data system at Embrapa, migrating from PostgreSQL to Neo4j to boost query performance by 87%.

1st Place Winner at the Reply Enterprise Challenge (FIAP NEXT 2025). I designed and built an end-to-end, production-grade AI Multi-Agentic platform, that is production-ready, achieving a 76% reduction in a key operational KPI.

• Trained SOTA models at Outlier using RLHF (collaborating with OpenAI, Meta & Anthropic), increasing model efficiency by 64%.

• Developed an award-winning National Resilience AI Platform (FIAP 2025 Global Solution Winner) from concept to deployment.

• Built a Full-Stack Invoice Automation System (React + RAG) in a ~15-day sprint, cutting manual work by >85%.

Led 2 global SuperDataScience teams to deploy end-to-end AI systems, managing both project architecture and team execution/leadership.


📑 Quick Navigation


About Me

My mission is to build and lead the high-impact teams that will architect the future. I am a strategist who uses AI to solve complex, global business challenges and deliver measurable, executive-level value.

My unique advantage is Systemic Thinking. My background isn't just in AI; it's in the complex, interconnected systems of Biology and Cognitive Science. This allows me to deconstruct multifaceted problems, see the connections others miss, and architect holistic, high-impact solutions—not just code.


Professional & Research Experience

R&D Intern (Data & Genomics) | Embrapa Gado de Leite | Juiz de Fora, MG | Sep 2025 - Present

  • Increased performance by 87% of genomic queries by migrating from PostgreSQL to Neo4j.
  • Architected a scalable MLOps pipeline for genomic analysis (Docker, Nextflow, FastAPI).
  • Optimized project presentations for stakeholders and executives responsible for laboratory budget and resources.

Key Areas: Genomics Bioinformatics Data Engineering Applied ML Neo4j MLOps


Collaborative Researcher (VLM & Deep Learning) | FrameNet Brasil / UFJF | Remote | Sep 2025 - Present

Federal research grant, conducted simultaneously with Embrapa position

  • Developing Vision-Language Models (VLMs) and Deep Learning solutions to automate semantic annotation of large-scale multimodal (text and image) datasets.
  • Architecting a scalable pipeline to transform unstructured data into structured knowledge for computational analysis.

Key Areas: Deep Learning Vision-Language Models NLP Semantic Annotation Python


AI Trainer (LLM Systems via RLHF) | Outlier | Remote | Nov 2024 - Sep 2025

  • Developed technical content to align Large Language Models (OpenAI, Meta, Anthropic), increasing model efficiency by 64% via RLHF in collaboration with technical teams.

Key Areas: RLHF Model Alignment AI Safety LLMs Quality Assurance


Data Analyst (Ecological Impact) | Impaakt | Remote | Feb 2022 - Oct 2024

  • Delivered 500+ data-driven ecological impact reports that influenced ESG (Environmental, Social, and Governance) ratings used by investment firms.

Key Areas: Environmental Science Sustainability Analysis Data Analysis Process Optimization AI Integration Impact Assessment


Research Assistant | Georgia State University | Atlanta, GA | Feb 2019 - Feb 2020

  • Increased research productivity by 84% by automating data collection and analysis workflows using Python.

Key Areas: Cognitive Sciences Philosophy of Mind Psychology Behavioral Analysis Research Methodology Data Analysis Data Science Python


Academic Background

AI Systems & Machine Learning Technologist | FIAP | 2024 - 2026 (expected)

Key Areas: AI Systems Architecture Machine Learning Engineering MLOps Edge AI IoT Development Software Engineering Data Engineering Cybersecurity Cloud Operations

Academic Excellence: GPA 4.0


Bachelor of Biological Sciences | UniAcademia | 2022 - 2025 (in progress)

Key Areas: Molecular Biology Genetics Computational Biology Research Methodology Laboratory Management Scientific Publishing

Academic Excellence: GPA 3.7 | Thesis: Epigenetics Antiaging Health Software Leveraging Machine Learning & Deep Learning Algorithms


Philosophy (Major) & Psychology (Minor) | Georgia State University | 2017 - 2020 (incomplete)

Key Areas: Cognitive Sciences Philosophy of Mind Psychology Human Behavior Research Methodology Academic Leadership

Academic Excellence: GPA 3.8 | Thesis: Differentiating Factual Belief, Imagination & Religious Credence - A Systematic Theory of Cognitive Attitudes

Additional Recognition: Columnist for "The Signal" (GSU's award-winning newspaper), Atlanta Campus Scholarship recipient, Dean's List, Honor Society member


Professional Recommendations

View all recommendations on LinkedIn

I've been fortunate to work with exceptional professionals who have recognized my technical capabilities, problem-solving approach, and collaborative leadership style. These recommendations span my work in:

  • AI/ML Engineering & Research
  • Data Science & Analytics
  • Project Leadership & Team Collaboration
  • Academic Research & Scientific Methodology

Featured Projects

This portfolio showcases end-to-end AI systems I've architected to solve real-world challenges. Each project demonstrates business impact, technical excellence, and production-ready implementation.


Smart Maintenance SaaS

🏆 1st PLACE WINNER - Reply Enterprise Challenge @ FIAP NEXT 2025 🏆

An end-to-end, production-grade predictive maintenance platform I built from scratch (investing hundreds of hours since March) to win Reply's annual enterprise challenge. This system uses a 12-agent event-driven architecture (FastAPI, Redis) and 17 ML models (trained on 6 real-world datasets like NASA, AI4I, XJTU) to predict equipment failures before they happen.

  • Business Value: Proven to reduce unplanned downtime by 40% and save R$ 100-500k per prevented failure.
  • Performance: Validated at 103.8 RPS with 3ms P99 latency under load.
  • Database: Achieved 37% faster dashboard queries using TimescaleDB continuous aggregates.
  • Stack: Python, FastAPI, TimescaleDB, MLflow, Docker, AWS, Streamlit.

Code/repository under an NDA contract


Invoice Automation System (Full-Stack & Multi-Agent)

Solo Development | AI-powered invoice processing automation

Business Goal: To eliminate the slow, error-prone manual process of invoice handling for small to medium businesses.

Solution & Impact: Built a full-stack system that automates the entire invoice processing pipeline. By mapping the user journey and applying RAG for intelligent error handling, the system reduced manual processing time by over 85%.

Technologies: React.jsNext.jsTypeScriptFastAPILangChainRAGFAISSDockerAWS S3PostgreSQL


🏆 Guardian System: National Resilience Platform (Award Winner)

Solo Development | My winning project for FIAP's 2025.1 Global Solution Challenge

Business Goal: To create a predictive system to manage and mitigate large-scale national crises like natural disasters.

Solution & Impact: I single-handedly architected and developed this award-winning multi-agent platform. Five autonomous "Guardian" agents for different threat domains, with a fully functional MVP for fire risk prediction using real-time IoT sensor data.

Technologies: Agentic AIPythonFastAPIDockerMicroPythonESP32IoTApache Spark


AI Platform for Anti-Aging (Thesis Project)

Solo Development | Personalized anti-aging recommendation system

Business Goal: To create a scalable HealthTech platform that provides personalized, data-driven health recommendations, moving beyond generic advice.

Solution & Impact: Developing an AI platform focused on Explainable AI (SHAP) and secure deployment (JWT). The system translates complex epigenetic data (BioPython) into actionable health insights. Analyzes genetic predispositions (SNPs) and lifestyle habits to generate personalized risk assessments.

Technologies: PyTorchScikit-learnBioPythonMLFlowSHAPDockerFastAPIReact


Community Projects & Leadership

As a Project Leader in the international SuperDataScience community, I led diverse teams of data scientists and ML engineers to deliver production-ready AI/ML platforms. I was responsible for aligning project priorities with stakeholders, defining KPIs, and managing deployment.

Leadership Experience: Project Lead for 2 projects | Project Member for 2 projects

GlucoTrack: Diabetes Risk Prediction Platform

Project Lead | Comprehensive diabetes risk assessment system using the CDC diabetes dataset

Led a diverse team of data scientists and ML engineers to deliver both beginner-friendly and advanced deep learning solutions.

Key Features: Built traditional ML models (Logistic Regression, Decision Trees) and advanced Feedforward Neural Networks with hyperparameter tuning. Includes model explainability tools and multiple deployment options.

Technologies: PythonScikit-learnDeep LearningStreamlitModel ExplainabilityHealthcare AIData Science

Live app: glucotrack.streamlit.app


MLPayGrade: ML Salary Prediction System

Project Lead | End-to-end salary prediction platform analyzing the 2024 machine learning job market

Coordinated a team of data scientists and ML engineers to build comprehensive solutions across multiple skill levels.

Key Features: Analyzes global salary trends and job feature impacts on compensation. Features both traditional ML pipelines and advanced deep learning on tabular data with embeddings and explainability.

Technologies: PythonScikit-learnDeep LearningTabular DataStreamlitJob Market AnalyticsData Science


EduSpend: Global Education Cost Prediction

Project Member | End-to-end machine learning platform to predict Total Cost of Attendance for international higher education

Key Features: Achieved a 96.44% R² score with an XGBoost Regressor, deployed via both a Streamlit web app and a FastAPI service, all containerized with Docker and automated with CI/CD.

Technologies: Scikit-learnXGBoostMLflowStreamlitFastAPIDockerCI/CDData Science


Smart Leaf: Deep Learning for Crop Disease

Project Member | Deep learning solution that classifies 14 different crop diseases across four species

Key Features: A Convolutional Neural Network (CNN) trained on my local machine, on over 13,000 images, using only modulerized python scripts (no notebooks), deployed via a user-friendly Streamlit interface for real-time predictions. Covers corn, potato, rice, and wheat diseases.

Technologies: Deep LearningComputer VisionCNNTensorFlowPyTorchStreamlitLocally Trained Neural Network


Explore More Projects

... and even more projects in my repositories, covering Data Science, Machine Learning, MLOps, LLMOps, IoT, AI engineering, bioinformatics, and more!

View All Repositories

Tech Stack & Tools

AI & Machine Learning

Agentic AI & LLMs

Architecture, Backend & APIs

Databases & Data Engineering

Cloud & MLOps

Frontend & Visualization

Testing & Code Quality

IoT & Edge AI


Certifications

View all certifications on LinkedIn

I maintain active certifications across AI/ML platforms, cloud infrastructure, and software development to ensure I stay current with industry-leading technologies and best practices.

Key Certifications Include:

  • Machine Learning & AI Engineering
  • Cloud Platform Expertise (AWS, Azure)
  • Data Science & Analytics
  • Software Development & DevOps
  • Specialized domain certifications in Bioinformatics and IoT

Publications

View all publications on LinkedIn

My research spans cognitive science, artificial intelligence, and computational biology, bridging theoretical frameworks with practical applications.

Research Areas:

  • Philosophy of Mind & Cognitive Attitudes
  • Machine Learning Applications in Health Sciences
  • Epigenetics & AI-Driven Personalized Medicine
  • Computational Biology & Genomics
  • AI Systems Architecture & Engineering

Global Communication


"The most flexible element is the one that controls the system."


Pinned Loading

  1. anti-aging-epigenetics-ml-app anti-aging-epigenetics-ml-app Public

    A thesis MVP for a personalized anti-aging system that analyzes genetic SNPs and lifestyle habits using ML models (Random Forest and Neural Networks) to provide risk assessments and actionable reco…

    Jupyter Notebook 1 1

  2. SmartCrops-IoT-ML-System SmartCrops-IoT-ML-System Public

    An IoT-ML project for smart agriculture: Dual ESP32 nodes (sensor via ESP-NOW, gateway to MQTT/Ubidots) collects temp, humidity, soil moisture data. ML Model analyzes crop yield and real-time plant…

    Jupyter Notebook 2

  3. global_solution_1_fiap global_solution_1_fiap Public

    Winner of FIAP'S Global Solution 2025.1 Challenge. This repository contains the architecture for a multi-agent system where five autonomous "Guardians" work in synergy to predict, manage, and respo…

    Python 2 1

  4. SDS-CP035-gluco-track SDS-CP035-gluco-track Public

    Forked from SuperDataScience-Community-Projects/SDS-CP035-gluco-track

    GlucoTrack is a machine learning and deep learning project focused on predicting a person’s risk level of diabetes

    Jupyter Notebook 1

  5. agentic_invoice_system_final_version agentic_invoice_system_final_version Public

    Technical test for Brim's AI Engineer role : implementation of a Multi-Agentic System for Invoice Automation. Due 02/28. Nextjs frontend implementation.

    Python 1

  6. FarmTech_System FarmTech_System Public

    Unified system for a smart/technological/automated farm in large scale

    Jupyter Notebook