GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio

Cancan Chen; Huiying Xu; Jiaheng Zhang; Tianyi Wu; Wenjie Qu; Xinzhong Zhu; Yanpei Guo; Yibo Li; Yue Liu; Yufei He

arxiv: 2602.03328 · v2 · pith:PJ5C5SWJnew · submitted 2026-02-03 · 💻 cs.CR

GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio

Zhenhao Zhu , Yue Liu , Yanpei Guo , Wenjie Qu , Cancan Chen , Yufei He , Yibo Li , Yulin Chen

show 4 more authors

Tianyi Wu Huiying Xu Xinzhong Zhu Jiaheng Zhang

This is my paper

classification 💻 cs.CR

keywords guardrailguardreasoner-omnimodelaudioimagereasoningreasoning-basedtext

0 comments

read the original abstract

We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, video, and audio data. First, we construct a comprehensive training corpus comprising 181k samples spanning these four modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL with a concise correctness reward to preserve accurate reasoning while suppressing redundant generation. We release a suite of models scaled at 3B and 7B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
cs.CV 2026-05 unverdicted novelty 5.0

SafeLens presents a fast-and-slow video guardrail framework that filters the SafeWatch dataset to 2.4% and adds Chain-of-Thought traces to achieve state-of-the-art moderation performance at reduced inference cost.