Asap7772

Anikait Singh Asap7772

I am a PhD student in Computer Science at Stanford University. My research interests are in scaling up decision-making methods such as reinforcement learning.

72 followers · 21 following

Achievements

Highlights

Organizations

Pinned Loading

fewshot-preference-optimization fewshot-preference-optimization Public

Few-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptation to user preferences with minimal labeled data, leveragin…

Python 8 1
understanding-rlhf understanding-rlhf Public

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy samplin…

Python 28 3
PTR PTR Public

This repository contains the implementation of the PTR algorithm described in the paper: Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning.

Python 29 3
OfflineRlWorkflow OfflineRlWorkflow Public

This repository accompanies the following paper: A Workflow for Offline Model-Free Robotic RL

Python 11 2
DeepCriminalize DeepCriminalize Public

Project that uses GAN's to develop a sketch artist like representation of a criminal. Winners of the Cal Hack Fellowship 2019

Python 2 1
Cal-QL Cal-QL Public

A method that learns a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q…

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anikait Singh Asap7772

Achievements

Achievements

Highlights

Organizations

Block or report Asap7772

Pinned Loading