AI Safety and Alignment

Haoyu Wang

Sample efficient agentic RL for LLMs, world models, and more.

About

I am currently a research associate and an incoming PhD in Nanyang Technological University (NTU), advised by Prof. Dacheng Tao. I am currently working on Agentic RL for LLMs. Before that, I received my M.S. from Tsinghua University (SIGS), advised by Prof. Xueqian Wang, and my B.Eng. from Xi'an Jiaotong University.

My recent work focuses on safety reasoning, lifelong safety alignment, and improving model behavior with synthetic feedback and experience.

Research Focus

Efficient Trial and Error for LLMs

Using small sub-agents to help the LLM explore, while maintaining the explotation of the LLM via its pretrained knowledge.

LLM Safety Alignment

Training and evaluation methods that make language models reliably follow safety principles under distribution shift.

Safety Reasoning

Eliciting and strengthening internal safety reasoning to improve robustness against jailbreak and adversarial prompts.

Selected Publications

SCOUT paper thumbnail

Language-based Trial and Error Falls Behind in the Era of Experience

Preprint, 2026

Haoyu Wang, Guozheng Ma, Shugang Cui, Yilun Kong, Haotian Luo, Li Shen, Mengya Gao, Yichao Wu, Xiaogang Wang, Dacheng Tao

Lifelong safety alignment paper thumbnail

Lifelong Safety Alignment for Language Models

NeurIPS 2025

Haoyu Wang, Zeyu Qin, Yifei Zhao, Chao Du, Min Lin, Xueqian Wang, Tianyu Pang

Safety reasoning paper thumbnail

Safety Reasoning with Guidelines

ICML 2025

Haoyu Wang, Zeyu Qin, Li Shen, Xueqian Wang, Dacheng Tao, Minhao Cheng

News

Experience and Education