hub

Analyzing and mitigating object hallucination in large vision-language models

Zhou, Y · 2023 · arXiv 2310.00754

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights

cs.AI · 2026-05-11 · conditional · novelty 7.0

TRIVIA+ is a new long-context RAG hallucination benchmark with four noisy label variants that shows current detectors have substantial room for improvement and are hindered by label noise.

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

DO-Bench is a controlled benchmark that attributes VLM object hallucination errors to textual prior pressure, perceptual limits, or their interaction via two diagnostic dimensions and metrics.

Letting the neural code speak: Automated characterization of monkey visual neurons through human language

q-bio.NC · 2026-05-12 · unverdicted · novelty 6.0

Natural-language descriptions generated and verified through generative models and digital twins capture the selectivity of most neurons in macaque V1 and V4.

Through the Lens of Character: Resolving Modality-Role Interference in Multimodal Role-Playing Agent

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

CAVI framework uses character-guided token pruning, orthogonal feature modulation, and modality-adaptive role steering to resolve modality-role interference in multimodal RPAs.

CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.

Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.

ReflectCAP: Detailed Image Captioning with Reflective Memory

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.

Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

UE-DPO quantifies epistemic uncertainty from grounding failures to direct more learning pressure on hard visual tokens in preferred samples while easing penalties on dispreferred ones.

Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

cs.CV · 2026-04-22 · unverdicted · novelty 5.0

MPD reduces hallucinations in LVLMs by 23.4% while retaining 97.4% of general capability through semantic disentanglement and selective parameter updates.

VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

VCE mitigates object hallucination in LVLMs by decomposing activation patterns from contrastive visual inputs via SVD to suppress hallucination subspaces through targeted parameter edits.

Hallucination of Multimodal Large Language Models: A Survey

cs.CV · 2024-04-29 · accept · novelty 5.0

The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

cs.LG · 2024-02-18 · unverdicted · novelty 5.0

POVID generates AI-created preference data to fine-tune vision-language models with DPO, reducing hallucinations and improving benchmark scores.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

cs.CV · 2024-02-27 · unverdicted · novelty 2.0

The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

citing papers explorer

Showing 16 of 16 citing papers.

Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights cs.AI · 2026-05-11 · conditional · none · ref 8
TRIVIA+ is a new long-context RAG hallucination benchmark with four noisy label variants that shows current detectors have substantial room for improvement and are hindered by label noise.
DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models cs.CV · 2026-04-18 · unverdicted · none · ref 44
DO-Bench is a controlled benchmark that attributes VLM object hallucination errors to textual prior pressure, perceptual limits, or their interaction via two diagnostic dimensions and metrics.
Letting the neural code speak: Automated characterization of monkey visual neurons through human language q-bio.NC · 2026-05-12 · unverdicted · none · ref 111
Natural-language descriptions generated and verified through generative models and digital twins capture the selectivity of most neurons in macaque V1 and V4.
Through the Lens of Character: Resolving Modality-Role Interference in Multimodal Role-Playing Agent cs.CV · 2026-05-10 · unverdicted · none · ref 32
CAVI framework uses character-guided token pruning, orthogonal feature modulation, and modality-adaptive role steering to resolve modality-role interference in multimodal RPAs.
CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering cs.CV · 2026-05-06 · unverdicted · none · ref 55
CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.
Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time cs.LG · 2026-05-03 · unverdicted · none · ref 39
LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 255
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models cs.CV · 2026-04-14 · unverdicted · none · ref 41
Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.
ReflectCAP: Detailed Image Captioning with Reflective Memory cs.AI · 2026-04-14 · unverdicted · none · ref 41
ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models cs.LG · 2026-05-06 · unverdicted · none · ref 47
UE-DPO quantifies epistemic uncertainty from grounding failures to direct more learning pressure on hard visual tokens in preferred samples while easing penalties on dispreferred ones.
Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation cs.CV · 2026-04-22 · unverdicted · none · ref 149
MPD reduces hallucinations in LVLMs by 23.4% while retaining 97.4% of general capability through semantic disentanglement and selective parameter updates.
VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing cs.CV · 2026-04-21 · unverdicted · none · ref 18
VCE mitigates object hallucination in LVLMs by decomposing activation patterns from contrastive visual inputs via SVD to suppress hallucination subspaces through targeted parameter edits.
Hallucination of Multimodal Large Language Models: A Survey cs.CV · 2024-04-29 · accept · none · ref 232
The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning cs.LG · 2024-02-18 · unverdicted · none · ref 185
POVID generates AI-created preference data to fine-tune vision-language models with DPO, reducing hallucinations and improving benchmark scores.
A Survey on Multimodal Large Language Models cs.CV · 2023-06-23 · accept · none · ref 169
This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models cs.CV · 2024-02-27 · unverdicted · none · ref 125
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

Analyzing and mitigating object hallucination in large vision-language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer