respailab
Popular repositories Loading
-
-
agentic-llm-unlearning
agentic-llm-unlearning Public[COLM 2025] Agents Are All You Need for LLM Unlearning
Python 3
-
-
unlearning-or-concealment
unlearning-or-concealment PublicWe expose a significant vulnerability in diffusion model unlearning methods, where an attacker can reverse the supposed erasure of concepts during the inference process. Our approach leverages a no…
Python 1
Repositories
- Antidote Public
[AAAI 2026] AntiDote is a bi-level adversarial training method that hardens open-weight LLMs against malicious fine-tuning using a hypernetwork , which generates harmful LoRA attacks, and the model learns to neutralize them while preserving capabilities.
respailab/Antidote’s past year of commit activity - respailab.github.io Public
respailab/respailab.github.io’s past year of commit activity - unlearning-or-concealment Public
We expose a significant vulnerability in diffusion model unlearning methods, where an attacker can reverse the supposed erasure of concepts during the inference process. Our approach leverages a novel Partial Diffusion Attack that operates across all layers of the model.
respailab/unlearning-or-concealment’s past year of commit activity - Machine-Unlearning-Workshop Public
respailab/Machine-Unlearning-Workshop’s past year of commit activity - trl Public Forked from huggingface/trl
Train transformer language models with reinforcement learning.
respailab/trl’s past year of commit activity
Most used topics
Loading…