Ares: Multimodal adaptive reasoning via difficulty-aware token-level entropy shaping

Shuang Chen, Yue Guo, Yimeng Ye, Shijue Huang, Wenbo Hu, Haoxi Li, Manyuan Zhang, Jiayu Chen, Song Guo, Nanyun Peng · 2025 · arXiv 2510.08457

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.

Gen-Searcher: Reinforcing Agentic Search for Image Generation

cs.CV · 2026-03-30 · unverdicted · novelty 7.0 · 2 refs

Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

cs.CV · 2026-01-30 · unverdicted · novelty 7.0

CamReasoner uses structured O-T-A reasoning and RL on 56k samples to lift camera movement classification from 73.8% to 78.4% and VQA from 60.9% to 74.5% on Qwen2.5-VL-7B.

OneThinker: All-in-one Reasoning Model for Image and Video

cs.CV · 2025-12-02 · unverdicted · novelty 5.0

OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.

Flow-OPD: On-Policy Distillation for Flow Matching Models

cs.CV · 2026-05-08 · 4 refs

citing papers explorer

Showing 5 of 5 citing papers.

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding cs.CV · 2026-04-14 · unverdicted · none · ref 10
Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.
Gen-Searcher: Reinforcing Agentic Search for Image Generation cs.CV · 2026-03-30 · unverdicted · none · ref 31 · 2 links
Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.
CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning cs.CV · 2026-01-30 · unverdicted · none · ref 5
CamReasoner uses structured O-T-A reasoning and RL on 56k samples to lift camera movement classification from 73.8% to 78.4% and VQA from 60.9% to 74.5% on Qwen2.5-VL-7B.
OneThinker: All-in-one Reasoning Model for Image and Video cs.CV · 2025-12-02 · unverdicted · none · ref 35
OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.
Flow-OPD: On-Policy Distillation for Flow Matching Models cs.CV · 2026-05-08 · unreviewed · ref 8 · 4 links

Ares: Multimodal adaptive reasoning via difficulty-aware token-level entropy shaping

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer