-
Stanford University
- California
- asap7772.github.io
- @Anikait_Singh_
- in/asap7772
- https://huggingface.co/Asap7772
Highlights
- Pro
Pinned Loading
-
fewshot-preference-optimization
fewshot-preference-optimization PublicFew-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptation to user preferences with minimal labeled data, leveragin…
-
understanding-rlhf
understanding-rlhf PublicLearning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy samplin…
-
OfflineRlWorkflow
OfflineRlWorkflow PublicThis repository accompanies the following paper: A Workflow for Offline Model-Free Robotic RL
-
DeepCriminalize
DeepCriminalize PublicProject that uses GAN's to develop a sketch artist like representation of a criminal. Winners of the Cal Hack Fellowship 2019
-
Cal-QL
Cal-QL PublicA method that learns a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q…
Python
If the problem persists, check the GitHub status page or contact support.