Efficient Trial and Error for LLMs
Using small sub-agents to help the LLM explore, while maintaining the explotation of the LLM via its pretrained knowledge.
AI Safety and Alignment
Sample efficient agentic RL for LLMs, world models, and more.
Google Scholar · LinkedIn · GitHub · Email
I am currently a research associate and an incoming PhD in Nanyang Technological University (NTU), advised by Prof. Dacheng Tao. I am currently working on Agentic RL for LLMs. Before that, I received my M.S. from Tsinghua University (SIGS), advised by Prof. Xueqian Wang, and my B.Eng. from Xi'an Jiaotong University.
My recent work focuses on safety reasoning, lifelong safety alignment, and improving model behavior with synthetic feedback and experience.
Using small sub-agents to help the LLM explore, while maintaining the explotation of the LLM via its pretrained knowledge.
Training and evaluation methods that make language models reliably follow safety principles under distribution shift.
Eliciting and strengthening internal safety reasoning to improve robustness against jailbreak and adversarial prompts.