In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, H · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts

cs.CV · 2026-05-12 · unverdicted · novelty 8.0 · 2 refs

An MLLM-guided architecture with a mixture of frequency experts and relational alignment loss achieves state-of-the-art all-in-one image restoration, outperforming prior methods by up to 1.35 dB on the CDD11 dataset.

Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.

SurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA?

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

SurgCheck benchmark reveals that vision-language models for surgical VQA often depend on linguistic shortcuts rather than visual reasoning, shown by consistent performance drops on less-biased questions.

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

V-tableR1 uses a critic VLM for dense step-level feedback and a new PGPO algorithm to shift multimodal table reasoning from pattern matching to verifiable logical steps, achieving SOTA accuracy with a 4B open-source model.

ReflectCAP: Detailed Image Captioning with Reflective Memory

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.

ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards

cs.CV · 2026-04-22 · unverdicted · novelty 5.0

A sandbox-trained multimodal search agent with process-oriented rewards transfers zero-shot to real Google Search and outperforms prior methods on FVQA, InfoSeek, and MMSearch.

citing papers explorer

Showing 6 of 6 citing papers.

Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts cs.CV · 2026-05-12 · unverdicted · none · ref 28 · 2 links
An MLLM-guided architecture with a mixture of frequency experts and relational alignment loss achieves state-of-the-art all-in-one image restoration, outperforming prior methods by up to 1.35 dB on the CDD11 dataset.
Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation cs.CL · 2026-04-14 · unverdicted · none · ref 26
DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.
SurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA? cs.CV · 2026-05-03 · unverdicted · none · ref 11
SurgCheck benchmark reveals that vision-language models for surgical VQA often depend on linguistic shortcuts rather than visual reasoning, shown by consistent performance drops on less-biased questions.
V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization cs.AI · 2026-04-22 · unverdicted · none · ref 18
V-tableR1 uses a critic VLM for dense step-level feedback and a new PGPO algorithm to shift multimodal table reasoning from pattern matching to verifiable logical steps, achieving SOTA accuracy with a 4B open-source model.
ReflectCAP: Detailed Image Captioning with Reflective Memory cs.AI · 2026-04-14 · unverdicted · none · ref 20
ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.
ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards cs.CV · 2026-04-22 · unverdicted · none · ref 20
A sandbox-trained multimodal search agent with process-oriented rewards transfers zero-shot to real Google Search and outperforms prior methods on FVQA, InfoSeek, and MMSearch.

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer