- I’m currently a third-year PhD candidate from Georgia Tech.
- I am working on safety alignment for large language models. Particularly, I am interested in red-teaming attacks and defenses for LLMs.
I try to push myself to publish high quality papers in the periodicity of every three months. Here are the papers I wrote in 2024.
- [2024/2/2] Vaccine: Perturbation-aware alignment for large language model aginst harmful fine-tuning NeurIPS2024 [paper] [code]
- [2024/5/28] Lazy safety alignment for large language models against harmful fine-tuning NeurIPS2024 [paper] [code]
- [2024/8/18] Antidote: Post-fine-tuning safety alignment for large language models against harmful fine-tuning arXiv [paper]
- [2024/9/3] Booster: Tackling harmful fine-tuning for large language models via attenuating harmful perturbation arXiv [paper] [code] [Openreview]
- [2024/9/26] Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey arXiv [paper] [repo]