hub

Look twice before you answer: Memory-space visual retracing for hallucination mitiga- tion in multimodal large language models

Zou, X · 2024 · arXiv 2410.03577

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

RL post-training on hallucination-forced multimodal data improves reasoning performance and can outperform standard training.

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

cs.CV · 2025-12-14 · unverdicted · novelty 7.0

DMLR performs dynamic visual-textual interleaving in latent space using confidence-guided latent policy gradient optimization and a dynamic visual injection strategy, yielding improved multimodal reasoning on benchmarks.

Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination

cs.MM · 2026-05-11 · unverdicted · novelty 6.0

LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.

Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

cs.CV · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.

STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.

Boosting Reasoning in Large Multimodal Models via Activation Replay

cs.CV · 2025-11-25 · unverdicted · novelty 6.0

Activation Replay boosts multimodal reasoning in post-trained LMMs by replaying low-entropy activations from base models to RLVR counterparts at test time via visual token manipulation.

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 2.0

Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.

Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models

cs.CV · 2026-05-03

citing papers explorer

Showing 11 of 11 citing papers.

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models cs.LG · 2026-04-03 · unverdicted · none · ref 45
RL post-training on hallucination-forced multimodal data improves reasoning performance and can outperform standard training.
Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space cs.CV · 2025-12-14 · unverdicted · none · ref 30
DMLR performs dynamic visual-textual interleaving in latent space using confidence-guided latent policy gradient optimization and a dynamic visual injection strategy, yielding improved multimodal reasoning on benchmarks.
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination cs.MM · 2026-05-11 · unverdicted · none · ref 54
LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.
Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time cs.LG · 2026-05-03 · unverdicted · none · ref 40
LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs cs.CV · 2026-05-01 · unverdicted · none · ref 98 · 2 links
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 292
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models cs.CV · 2026-04-14 · unverdicted · none · ref 43
Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.
STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models cs.CV · 2026-04-03 · unverdicted · none · ref 55
STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.
Boosting Reasoning in Large Multimodal Models via Activation Replay cs.CV · 2025-11-25 · unverdicted · none · ref 69
Activation Replay boosts multimodal reasoning in post-trained LMMs by replaying low-entropy activations from base models to RLVR counterparts at test time via visual token manipulation.
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 269
Position paper claims multimodal LLMs can significantly advance scientific reasoning and proposes a four-stage roadmap plus challenges and suggestions.
Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models cs.CV · 2026-05-03 · unreviewed · ref 24

Look twice before you answer: Memory-space visual retracing for hallucination mitiga- tion in multimodal large language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer