pith. machine review for the scientific record. sign in

arxiv: 2412.15251 · v3 · submitted 2024-12-15 · 💻 cs.CL · cs.AI

Recognition: unknown

IPS: In-Prompt Process Supervision for Short Video Content Moderation

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords contentmllmssupervisionancillaryeffectivein-promptmoderationmultimodal
0
0 comments X
read the original abstract

Multimodal large language models (MLLMs) are effective at capturing the semantics of short video content; however, they often fail to attend to the policy-specific details required for reliable content moderation. To address this limitation, we introduce IPS, a novel framework that integrates In-prompt Process Supervision into MLLMs by introducing sequential reasoning over ancillary questions during fine-tuning. IPS consistently outperforms baseline MLLMs on public and proprietary benchmarks. Moreover, replacing human-annotated ancillary labels with MLLM-generated ones results in only marginal performance degradation, demonstrating robustness to noisy supervision and strong scalability with model-generated annotations. These findings establish IPS as a scalable and effective solution for complex multimodal classification in large-scale industrial settings.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.