Scenealign: Aligning multimodal reasoning to scene graphs in complex visual scenes

Chuhan Wang, Xintong Li, Jennifer Yuntong Zhang, Junda Wu, Chengkai Huang, Lina Yao, Julian McAuley, Jingbo Shang · 2026 · arXiv 2601.05600

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MASS-DPO derives a Plackett-Luce-specific log-determinant Fisher information objective to select non-redundant negative samples, matching or exceeding multi-negative DPO performance with substantially fewer negatives across four benchmarks and three model families.

Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

cs.IR · 2026-04-30 · unverdicted · novelty 7.0

FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

MEG-RAG defines a new MEG metric based on Semantic Certainty Anchoring and trains a multimodal reranker to select evidence aligned with ground-truth semantic anchors, yielding higher accuracy and consistency on the M²RAG benchmark.

citing papers explorer

Showing 7 of 7 citing papers.

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking cs.LG · 2026-05-13 · unverdicted · none · ref 51
F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents cs.AI · 2026-05-11 · unverdicted · none · ref 58
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization cs.LG · 2026-05-11 · unverdicted · none · ref 61
MASS-DPO derives a Plackett-Luce-specific log-determinant Fisher information objective to select non-redundant negative samples, matching or exceeding multi-negative DPO performance with substantially fewer negatives across four benchmarks and three model families.
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck cs.LG · 2026-05-08 · unverdicted · none · ref 56
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG cs.IR · 2026-04-30 · unverdicted · none · ref 46
FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 116
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG cs.CL · 2026-04-27 · unverdicted · none · ref 18
MEG-RAG defines a new MEG metric based on Semantic Certainty Anchoring and trains a multimodal reranker to select evidence aligned with ground-truth semantic anchors, yielding higher accuracy and consistency on the M²RAG benchmark.

Scenealign: Aligning multimodal reasoning to scene graphs in complex visual scenes

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer