Title resolution pending

Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee · 2023

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

VisPCO uses continuous relaxation, straight-through estimators, and budget-aware Pareto-frontier learning to automatically discover optimal visual token pruning configurations that approximate grid-search results across VLMs and benchmarks.

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

EgoEsportsQA is a new egocentric video QA benchmark from esports matches that shows state-of-the-art Video-LLMs reach only 71.58% accuracy and struggle more with tactical reasoning than basic perception.

V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models

cs.CL · 2025-09-18 · conditional · novelty 7.0

V-SEAM combines concept-level visual semantic editing with attention head modulation to identify positive and negative contributors across object, attribute, and relationship levels, then uses this to improve VLM performance on VQA benchmarks.

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

cs.AI · 2026-04-18 · unverdicted · novelty 6.0

Omni-modal LLMs exhibit visual preference that emerges in mid-to-late layers, enabling hallucination detection without task-specific training.

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.

Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation

cs.CV · 2026-04-11 · unverdicted · novelty 5.0

DaID mitigates MLLM hallucinations by attention-guided selection of dual layers that calibrate token generation using internal perceptual discrepancies.

citing papers explorer

Showing 6 of 6 citing papers.

VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models cs.CV · 2026-04-16 · unverdicted · none · ref 17
VisPCO uses continuous relaxation, straight-through estimators, and budget-aware Pareto-frontier learning to automatically discover optimal visual token pruning configurations that approximate grid-search results across VLMs and benchmarks.
EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports cs.CV · 2026-04-14 · unverdicted · none · ref 31
EgoEsportsQA is a new egocentric video QA benchmark from esports matches that shows state-of-the-art Video-LLMs reach only 71.58% accuracy and struggle more with tactical reasoning than basic perception.
V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models cs.CL · 2025-09-18 · conditional · none · ref 27
V-SEAM combines concept-level visual semantic editing with attention head modulation to identify positive and negative contributors across object, attribute, and relationship levels, then uses this to improve VLM performance on VQA benchmarks.
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models cs.AI · 2026-04-18 · unverdicted · none · ref 28
Omni-modal LLMs exhibit visual preference that emerges in mid-to-late layers, enabling hallucination detection without task-specific training.
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models cs.CL · 2026-05-12 · unverdicted · none · ref 15
OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.
Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation cs.CV · 2026-04-11 · unverdicted · none · ref 23
DaID mitigates MLLM hallucinations by attention-guided selection of dual layers that calibrate token generation using internal perceptual discrepancies.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer