" is generally preferable to

Sicong Leng, Jing Wang, Jiaxi Li, Hao Zhang, Zhiqiang Hu, Boqiang Zhang, Yuming Jiang, Hang Zhang, Xin Li, Lidong Bing, et al · 2025 · arXiv 2509.21268

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

cs.CL · 2026-05-19 · conditional · novelty 6.0

Staged post-training that first solidifies visual perception before visual and textual reasoning improves VLM accuracy and shortens reasoning traces on visual math and perception benchmarks.

Building a Precise Video Language with Human-AI Oversight

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

CHAI framework pairs AI pre-captions with expert human critiques to produce precise video descriptions, enabling open models to outperform closed ones like Gemini-3.1-Pro and improve fine-grained control in video generation models.

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

cs.CV · 2025-11-25 · unverdicted · novelty 6.0

LongVT adds native video-cropping tool calling to LMMs for interleaved multimodal chain-of-tool-thought reasoning on long videos and releases VideoSIAH data for training and evaluation.

Boosting Reasoning in Large Multimodal Models via Activation Replay

cs.CV · 2025-11-25 · unverdicted · novelty 6.0

Activation Replay boosts multimodal reasoning in post-trained LMMs by replaying low-entropy activations from base models to RLVR counterparts at test time via visual token manipulation.

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

cs.CV · 2026-04-30 · unverdicted · novelty 5.0

PRISM adds a distribution-alignment stage using black-box on-policy distillation against a perception-reasoning MoE discriminator, yielding +4.4 and +6.0 average accuracy gains over standard SFT-to-RLVR on Qwen3-VL 4B and 8B models.

citing papers explorer

Showing 6 of 6 citing papers.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 64
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models cs.CL · 2026-05-19 · conditional · none · ref 12
Staged post-training that first solidifies visual perception before visual and textual reasoning improves VLM accuracy and shortens reasoning traces on visual math and perception benchmarks.
Building a Precise Video Language with Human-AI Oversight cs.CV · 2026-04-22 · unverdicted · none · ref 28
CHAI framework pairs AI pre-captions with expert human critiques to produce precise video descriptions, enabling open models to outperform closed ones like Gemini-3.1-Pro and improve fine-grained control in video generation models.
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling cs.CV · 2025-11-25 · unverdicted · none · ref 20
LongVT adds native video-cropping tool calling to LMMs for interleaved multimodal chain-of-tool-thought reasoning on long videos and releases VideoSIAH data for training and evaluation.
Boosting Reasoning in Large Multimodal Models via Activation Replay cs.CV · 2025-11-25 · unverdicted · none · ref 19
Activation Replay boosts multimodal reasoning in post-trained LMMs by replaying low-entropy activations from base models to RLVR counterparts at test time via visual token manipulation.
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL cs.CV · 2026-04-30 · unverdicted · none · ref 4
PRISM adds a distribution-alignment stage using black-box on-policy distillation against a perception-reasoning MoE discriminator, yielding +4.4 and +6.0 average accuracy gains over standard SFT-to-RLVR on Qwen3-VL 4B and 8B models.

" is generally preferable to

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer