arXiv preprint arXiv:2505.23558 , year=

Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang, Wenyu Lv, Tong Mo, Weiping Li · 2025 · arXiv 2505.23558

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.

Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

RAPO uses an information-theoretic lower bound on visual gain to select high-entropy reflection anchors and optimizes a chain-masked KL surrogate, delivering gains over baselines on reasoning benchmarks across LVLM backbones.

Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Vision Inference Former adds a direct visual-to-output bridge that continuously injects visual semantics during MLLM decoding to sustain consistency and reduce modality imbalance.

PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

PDCR improves vision-language reasoning by computing separate normalized confidence advantages for perception steps and reasoning steps after unsupervised decomposition.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation

cs.CL · 2026-05-03

citing papers explorer

Showing 7 of 7 citing papers.

Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization cs.LG · 2026-05-12 · unverdicted · none · ref 49
DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.
Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning cs.CV · 2026-05-10 · unverdicted · none · ref 14
RAPO uses an information-theoretic lower bound on visual gain to select high-entropy reflection anchors and optimizes a chain-masked KL surrogate, delivering gains over baselines on reasoning benchmarks across LVLM backbones.
Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models cs.CV · 2026-05-18 · unverdicted · none · ref 6
Vision Inference Former adds a direct visual-to-output bridge that continuously injects visual semantics during MLLM decoding to sustain consistency and reduce modality imbalance.
PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning cs.CL · 2026-05-13 · unverdicted · none · ref 4
PDCR improves vision-language reasoning by computing separate normalized confidence advantages for perception steps and reasoning steps after unsupervised decomposition.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 249
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 92
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation cs.CL · 2026-05-03 · unreviewed · ref 91

arXiv preprint arXiv:2505.23558 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer