Hi there ๐ This is Zhi-Yi's GitHub Profile
๐ I'm a researcher focused on AI safety, interpretability, and trustworthy machine learning.
๐ Currently, I'm a visiting research fellow at the University of Oxford working with Fazl Barez on scalable interpretability methods for LLM capability analysis and safety benchmarking. I'm also a research assistant at the @NYCU-RL-Bandits-Lab at National Yang Ming Chiao Tung University working with Ping-Chun Hsieh on RL backdoor attack detection and post-hoc interpretation of text-to-image model misbehavior, collaborating closely with Pin-Yu Chen from IBM Research. I'll be starting my PhD at CISPA Helmholtz Center for Information Security soon, where I'll work with Mario Fritz on trustworthy AI systems.
- AI safety & red-teaming
- Trustworthy text-to-image generation
- Reinforcement learning security
- Interpretability & mechanistic understanding
๐ CV / ๐ฆ Twitter / ๐ฑ GitHub / ๐ Google Scholar / ๐ผ LinkedIn / ๐ท Instagram / ๐งต Threads / ๐ Facebook
In my free time, I enjoy ๐running, ๐reading, and exploring ๐งdessert and โ๏ธcoffee shops.
I would like to connect if you have similar interests in all the things I've mentioned above (AI Safety research, running, reading, dessert, coffee). Please feel free reaching out to me at joycenerd.cs09[AT]nycu.edu.tw
You are the ๐ visitor who visits my profile ๐