Canonical reference

Videohallucer: Evaluating intrinsic and extrinsic hallucinations in large video-language models

Videohallucer: Evaluating intrinsic, extrinsic hallucinations in large video-language models , author= · 2024 · arXiv 2406.16338

Canonical reference. 80% of citing Pith papers cite this work as background.

12 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 12 citing papers

citation-role summary

background 4 dataset 1

citation-polarity summary

background 4 use dataset 1

representative citing papers

When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models

cs.CV · 2026-04-19 · unverdicted · novelty 8.0

VLMs hallucinate by prioritizing contradictory on-screen text over visual content, addressed via the VisualTextTrap benchmark with 6,057 human-validated samples and the VTHM-MoE dual-encoder framework using dimension-specific experts and adaptive routing.

MoHallBench: A Benchmark for Motion Hallucination in Video Large Language Models

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

MoHallBench is a new benchmark evaluating motion hallucination in VideoLLMs from co-occurrence priors, sequential inference, and similarity confusion, revealing decoupling from action recognition performance.

No Place to Hide: Benchmarking Video Hallucination with Background-Controlled Pairs

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces VidPair-Halluc benchmark of 1K background-controlled adversarial video pairs and 11K QA pairs generated via PairFlow pipeline to evaluate hallucination in LVMs.

TOC-Bench: A Temporal Object Consistency Benchmark for Video Large Language Models

cs.CV · 2026-05-11 · conditional · novelty 7.0 · 2 refs

TOC-Bench is a new diagnostic benchmark that reveals major weaknesses in temporal object consistency for Video-LLMs, including event counting, ordering, identity reasoning, and hallucination avoidance.

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

cs.CV · 2024-11-25 · unverdicted · novelty 7.0

VidHal is a new benchmark that evaluates VLLM temporal hallucinations through a caption ordering task on videos with varying hallucination levels.

OmniHalluc-L: Counterfactual Benchmarking and Modality-Perturbation Reliability Calibration for Long-Form Omni Hallucination

cs.MM · 2026-06-02 · unverdicted · novelty 6.0

OmniHalluc-L benchmark shows open-weight omni models at 32-41% strict-pair accuracy on long-form hallucination, raised to 36-51% by Modality-Perturbation Reliability Calibration that fuses audio-negative probe shifts with native confidence.

When Vision Speaks for Sound

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Video MLLMs show an audio-visual Clever Hans effect relying on visual-acoustic correlations rather than audio verification; Thud interventions diagnose it and a 10K-sample preference alignment improves intervention performance by 28 points.

From Priors to Perception: Grounding Video-LLMs in Physical Reality

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Video-LLMs fail physical reasoning due to semantic prior dominance rather than perception deficits; a new programmatic adversarial curriculum and visual-anchored reasoning chain enable substantial gains via standard LoRA fine-tuning.

Video-ToC: Video Tree-of-Cue Reasoning

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Video-ToC adds tree-guided cue localization, demand-based RL rewards, and automated datasets to video LLMs, reporting better results than prior methods on six understanding benchmarks plus a hallucination test.

Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.

Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.

MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models

cs.CV · 2026-06-10 · unverdicted · novelty 5.0

MultiToP mitigates hallucinations in video multimodal models by training a Visual Token Patcher with information-guided rank calibration to selectively replace unreliable tokens, yielding 50.60% F1 gain on Vript-HAL and 18.58% accuracy gain on ActivityNet-QA.

citing papers explorer

Showing 12 of 12 citing papers.

When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models cs.CV · 2026-04-19 · unverdicted · none · ref 62
VLMs hallucinate by prioritizing contradictory on-screen text over visual content, addressed via the VisualTextTrap benchmark with 6,057 human-validated samples and the VTHM-MoE dual-encoder framework using dimension-specific experts and adaptive routing.
MoHallBench: A Benchmark for Motion Hallucination in Video Large Language Models cs.CV · 2026-07-01 · unverdicted · none · ref 38
MoHallBench is a new benchmark evaluating motion hallucination in VideoLLMs from co-occurrence priors, sequential inference, and similarity confusion, revealing decoupling from action recognition performance.
No Place to Hide: Benchmarking Video Hallucination with Background-Controlled Pairs cs.CV · 2026-06-30 · unverdicted · none · ref 71
Introduces VidPair-Halluc benchmark of 1K background-controlled adversarial video pairs and 11K QA pairs generated via PairFlow pipeline to evaluate hallucination in LVMs.
TOC-Bench: A Temporal Object Consistency Benchmark for Video Large Language Models cs.CV · 2026-05-11 · conditional · none · ref 40 · 2 links
TOC-Bench is a new diagnostic benchmark that reveals major weaknesses in temporal object consistency for Video-LLMs, including event counting, ordering, identity reasoning, and hallucination avoidance.
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs cs.CV · 2024-11-25 · unverdicted · none · ref 53
VidHal is a new benchmark that evaluates VLLM temporal hallucinations through a caption ordering task on videos with varying hallucination levels.
OmniHalluc-L: Counterfactual Benchmarking and Modality-Perturbation Reliability Calibration for Long-Form Omni Hallucination cs.MM · 2026-06-02 · unverdicted · none · ref 2
OmniHalluc-L benchmark shows open-weight omni models at 32-41% strict-pair accuracy on long-form hallucination, raised to 36-51% by Modality-Perturbation Reliability Calibration that fuses audio-negative probe shifts with native confidence.
When Vision Speaks for Sound cs.CV · 2026-05-13 · unverdicted · none · ref 64
Video MLLMs show an audio-visual Clever Hans effect relying on visual-acoustic correlations rather than audio verification; Thud interventions diagnose it and a 10K-sample preference alignment improves intervention performance by 28 points.
From Priors to Perception: Grounding Video-LLMs in Physical Reality cs.CV · 2026-05-06 · unverdicted · none · ref 38
Video-LLMs fail physical reasoning due to semantic prior dominance rather than perception deficits; a new programmatic adversarial curriculum and visual-anchored reasoning chain enable substantial gains via standard LoRA fine-tuning.
Video-ToC: Video Tree-of-Cue Reasoning cs.CV · 2026-04-22 · unverdicted · none · ref 15
Video-ToC adds tree-guided cue localization, demand-based RL rewards, and automated datasets to video LLMs, reporting better results than prior methods on six understanding benchmarks plus a hallucination test.
Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation cs.SE · 2026-04-20 · unverdicted · none · ref 63
Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models cs.CV · 2026-04-14 · unverdicted · none · ref 33
Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.
MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models cs.CV · 2026-06-10 · unverdicted · none · ref 17
MultiToP mitigates hallucinations in video multimodal models by training a Visual Token Patcher with information-guided rank calibration to selectively replace unreliable tokens, yielding 50.60% F1 gain on Vript-HAL and 18.58% accuracy gain on ActivityNet-QA.

Videohallucer: Evaluating intrinsic and extrinsic hallucinations in large video-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer