hub Mixed citations

Can mllms reason in multimodality? emma: An enhanced multimodal reasoning benchmark

Hao, Y · 2025 · arXiv 2501.05444

Mixed citation behavior. Most common role is background (60%).

13 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 dataset 2

citation-polarity summary

background 3 use dataset 2

representative citing papers

Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

RAPO uses an information-theoretic lower bound on visual gain to select high-entropy reflection anchors and optimizes a chain-masked KL surrogate, delivering gains over baselines on reasoning benchmarks across LVLM backbones.

OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.

Learning to Reason under Off-Policy Guidance

cs.LG · 2025-04-21 · unverdicted · novelty 6.0

LUFFY mixes off-policy reasoning traces into RLVR training via Mixed-Policy GRPO and regularized importance sampling, delivering over 6-point gains on math benchmarks and enabling training of weak models where on-policy RLVR fails.

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

cs.CV · 2025-03-21 · conditional · novelty 6.0

Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.

Reversing the Flow: Generation-to-Understanding Synergy in Large Multimodal Models

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

Generation-to-Understanding synergy lets multimodal models create self-generated visual edits as intermediate steps, improving performance on twelve benchmarks while revealing limits in task-aligned self-reflection.

Seed1.8 Model Card: Towards Generalized Real-World Agency

cs.AI · 2026-03-21 · unverdicted · novelty 5.0

Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

cs.AI · 2025-11-28 · unverdicted · novelty 5.0

AgroCoT is a new Chain-of-Thought VQA benchmark with 4759 samples to evaluate reasoning capabilities of vision-language models in agriculture.

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

cs.LG · 2025-09-16 · unverdicted · novelty 5.0

An 8B MLLM reaches state-of-the-art efficiency and performance under 30B by combining a unified 3D resampler, joint document-text training, and hybrid RL for reasoning modes.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 2.0

Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning

cs.AI · 2026-05-13

Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving

cs.CL · 2026-04-22

citing papers explorer

Showing 13 of 13 citing papers.

Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning cs.CV · 2026-05-10 · unverdicted · none · ref 41
RAPO uses an information-theoretic lower bound on visual gain to select high-entropy reflection anchors and optimizes a chain-masked KL surrogate, delivering gains over baselines on reasoning benchmarks across LVLM backbones.
OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving cs.CL · 2026-04-23 · unverdicted · none · ref 100
OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.
Learning to Reason under Off-Policy Guidance cs.LG · 2025-04-21 · unverdicted · none · ref 56
LUFFY mixes off-policy reasoning traces into RLVR training via Mixed-Policy GRPO and regularized importance sampling, delivering over 6-point gains on math benchmarks and enabling training of weak models where on-policy RLVR fails.
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles cs.CV · 2025-03-21 · conditional · none · ref 22
Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.
Reversing the Flow: Generation-to-Understanding Synergy in Large Multimodal Models cs.CV · 2026-05-15 · unverdicted · none · ref 13
Generation-to-Understanding synergy lets multimodal models create self-generated visual edits as intermediate steps, improving performance on twelve benchmarks while revealing limits in task-aligned self-reflection.
Seed1.8 Model Card: Towards Generalized Real-World Agency cs.AI · 2026-03-21 · unverdicted · none · ref 26
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.
AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture cs.AI · 2025-11-28 · unverdicted · none · ref 17
AgroCoT is a new Chain-of-Thought VQA benchmark with 4759 samples to evaluate reasoning capabilities of vision-language models in agriculture.
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe cs.LG · 2025-09-16 · unverdicted · none · ref 53
An 8B MLLM reaches state-of-the-art efficiency and performance under 30B by combining a unified 3D resampler, joint document-text training, and hybrid RL for reasoning modes.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 241
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey cs.CV · 2025-03-16 · unverdicted · none · ref 167
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 59
Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.
Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning cs.AI · 2026-05-13 · unreviewed · ref 94
Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving cs.CL · 2026-04-22 · unreviewed · ref 84

Can mllms reason in multimodality? emma: An enhanced multimodal reasoning benchmark

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer