hub

Look twice before you answer: Memory-space visual retracing for hallucination mitigation in multimodal large language models

Look twice before you answer: Memory-space visual retracing for hallucination mitigation in multimodal large language models , author= · 2024 · arXiv 2410.03577

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

RL post-training on hallucination-forced multimodal data improves reasoning performance and can outperform standard training.

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

cs.CV · 2025-12-14 · unverdicted · novelty 7.0

DMLR performs dynamic visual-textual interleaving in latent space using confidence-guided latent policy gradient optimization and a dynamic visual injection strategy, yielding improved multimodal reasoning on benchmarks.

ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

ADAPT reduces MLLM hallucinations 40-60% by aligning cross-attention dynamics via visual anchors, supervised inference, and preference tuning while preserving general capabilities.

Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

VGID constructs an intervention-induced teacher distribution via visual perturbation plus textual in-context unlearning and distills it into the student MLLM to achieve parameter-level forgetting.

Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination

cs.MM · 2026-05-11 · unverdicted · novelty 6.0

LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.

Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

cs.CV · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.

STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.

Boosting Reasoning in Large Multimodal Models via Activation Replay

cs.CV · 2025-11-25 · unverdicted · novelty 6.0

Activation Replay boosts multimodal reasoning in post-trained LMMs by replaying low-entropy activations from base models to RLVR counterparts at test time via visual token manipulation.

Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

cs.CV · 2026-06-25 · unverdicted · novelty 5.0

Fox detects risky attention heads in LVLMs using visual attention entropy and severs hallucination shortcuts via numerical logit saturation and conflict-gated decoding, outperforming prior methods by 29.1%.

Consistency as Inductive Bias: Learning Cross-View Invariance for Robust Multimodal Reasoning

cs.CV · 2026-06-29 · unverdicted · novelty 4.0

ConsistRoll enforces cross-view consistency during RLVR training for MLLMs by joint rewards on grouped original and augmented views, yielding robustness gains on math, general, and hallucination benchmarks.

Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models

cs.CV · 2026-05-03 · unverdicted · novelty 4.0

Introduces self-captioning and a Multimodal Interaction Gate to amplify redundant multimodal interactions, reporting 38.3% reduction in visual-induced errors and 16.8% consistency improvement.

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 2.0

Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

citing papers explorer

Showing 12 of 12 citing papers after filters.

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models cs.LG · 2026-04-03 · unverdicted · none · ref 45
RL post-training on hallucination-forced multimodal data improves reasoning performance and can outperform standard training.
ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs cs.CV · 2026-06-30 · unverdicted · none · ref 44
ADAPT reduces MLLM hallucinations 40-60% by aligning cross-attention dynamics via visual anchors, supervised inference, and preference tuning while preserving general capabilities.
Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning cs.CV · 2026-05-26 · unverdicted · none · ref 10
VGID constructs an intervention-induced teacher distribution via visual perturbation plus textual in-context unlearning and distills it into the student MLLM to achieve parameter-level forgetting.
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination cs.MM · 2026-05-11 · unverdicted · none · ref 54
LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.
Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time cs.LG · 2026-05-03 · unverdicted · none · ref 40
LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs cs.CV · 2026-05-01 · unverdicted · none · ref 98 · 2 links
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 292
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models cs.CV · 2026-04-14 · unverdicted · none · ref 43
Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.
STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models cs.CV · 2026-04-03 · unverdicted · none · ref 55
STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.
Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding cs.CV · 2026-06-25 · unverdicted · none · ref 89
Fox detects risky attention heads in LVLMs using visual attention entropy and severs hallucination shortcuts via numerical logit saturation and conflict-gated decoding, outperforming prior methods by 29.1%.
Consistency as Inductive Bias: Learning Cross-View Invariance for Robust Multimodal Reasoning cs.CV · 2026-06-29 · unverdicted · none · ref 58
ConsistRoll enforces cross-view consistency during RLVR training for MLLMs by joint rewards on grouped original and augmented views, yielding robustness gains on math, general, and hallucination benchmarks.
Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models cs.CV · 2026-05-03 · unverdicted · none · ref 24
Introduces self-captioning and a Multimodal Interaction Gate to amplify redundant multimodal interactions, reporting 38.3% reduction in visual-induced errors and 16.8% consistency improvement.

Look twice before you answer: Memory-space visual retracing for hallucination mitigation in multimodal large language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer