Devin White Dev1nW

👋 Hi, I'm Devin!

I'm a Machine Learning Researcher with the Army Educational Outreach Program (AEOP). With over three years of combined professional and academic research experience, I specialize in investigating the emergent capabilities of Large Language Models (LLMs), particularly in interactive environments (like Atari!), alongside expertise in Reinforcement Learning (RL) and Reinforcement Learning from Human Feedback (RLHF), specifically Rating-based RL (RbRL). My focus is on leveraging human insights to enhance AI system learning and alignment.

🧠 Skills

Programming & Libraries: Python, PyTorch, TensorFlow, Stable Baselines3, Apple MLX, NumPy, Pandas, Matplotlib, Gymnasium
APIs & Tools: OpenAI API, Google Gemini API, Hugging Face API, Git & GitHub
AI/ML Concepts: Reinforcement Learning (RL), Reinforcement Learning from Human Feedback (RLHF), Rating-based RL (RbRL), Large Language Models (LLMs), Natural Language Processing (NLP)

🔭 Research Interests

Reinforcement Learning from Human Feedback (RLHF) and Alignment: Specifically focusing on using ratings, as in Rating-Based Reinforcement Learning.
Large Language Models (LLMs): Exploring emergent capabilities, agentic behavior, and applications, including using LLMs in playing Atari games, as in Atari-GPT.
Human-AI Interaction: Designing and studying systems where human input guides AI learning.

📚 Publications

RbRL2.0: Integrated Reward and Policy Learning for Rating-based Reinforcement Learning (Accepted, AAAI 2025 Bridge Program - Collaborative AI and Modeling of Humans) [ArXiv]
Performance Optimization of Ratings-Based Reinforcement Learning (Accepted, AAAI 2025 Bridge Program - Collaborative AI and Modeling of Humans) [ArXiv]
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games (Accepted, AAAI 2025 Workshop - Toward Knowledgable Foundation Models) [ArXiv] [Code]
Rating-Based Reinforcement Learning (AAAI 2024) [Paper] [Code] (Also presented at ICML 2023 Workshop - Many Facets of Preference-Based Learning [Workshop Paper])
Deep Reinforcement Learning-based Optimal Time-constrained Intercept Guidance (AIAA GNC 2024) [Paper]
Reinforcement Learning From Human Ratings (Master's Thesis) [Thesis]

💻 Key Projects

ASCII Breakout (LLM Agents): Implementation allowing various LLMs (Gemini, GPT-4o, Llama 3.2) to play ASCII Breakout.
ASCII Breakout MLX: Llama 3.2 playing ASCII Breakout, optimized for Apple Silicon using MLX.
RL-Driven nanoGPT for ASCII Breakout: Training a nanoGPT model from scratch using PPO to play ASCII Breakout.
Simplified RbRL & PbRL Implementations: Clear Python code for Rating-based and Preference-based RL.
Gemini Research Reviewer: AI tool using the Gemini API for feedback on research papers.
(See Projects Page on my website for more, including archived work)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Devin White Dev1nW

Achievements

Achievements

Block or report Dev1nW

👋 Hi, I'm Devin!

🧠 Skills

🔭 Research Interests

📚 Publications

💻 Key Projects

📫 Connect with me:

Pinned Loading