{"total":22,"items":[{"citing_arxiv_id":"2605.21300","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens","primary_cat":"cs.CV","submitted_at":"2026-05-20T15:29:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Reweighting training emphasis toward image-negative tokens and filtering hallucinated data reduces object hallucination in LVLMs across three model variants.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12485","ref_index":108,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Letting the neural code speak: Automated characterization of monkey visual neurons through human language","primary_cat":"q-bio.NC","submitted_at":"2026-05-12T17:58:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Natural language descriptions generated via a closed-loop pipeline with digital twins capture the selectivity of most neurons in macaque V1 and V4, with synthesized images driving 96% of V4 neurons into the top or bottom 5% of natural-image response distributions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11330","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights","primary_cat":"cs.AI","submitted_at":"2026-05-11T23:33:58+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TRIVIA+ is a new long-context RAG hallucination benchmark with four noisy label variants that shows current detectors have substantial room for improvement and are hindered by label noise.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09443","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Through the Lens of Character: Resolving Modality-Role Interference in Multimodal Role-Playing Agent","primary_cat":"cs.CV","submitted_at":"2026-05-10T09:40:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CAVI framework uses character-guided token pruning, orthogonal feature modulation, and modality-adaptive role steering to resolve modality-role interference in multimodal RPAs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence, 2025. [31] Yufang Liu, Tao Ji, Changzhi Sun, Yuanbin Wu, and Aimin Zhou. Investigating and mitigating object hallucinations in pretrained vision-language (clip) models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024. [32] Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, and Huaxiu Yao. Analyzing and mitigating object hallucination in large vision-language models.arXiv preprint arXiv:2310.00754, 2023. [33] Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, and Enhong Chen. Woodpecker: Hallucination correction for multimodal large language models."},{"citing_arxiv_id":"2605.04874","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-06T13:08:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"UE-DPO quantifies epistemic uncertainty from grounding failures to direct more learning pressure on hard visual tokens in preferred samples while easing penalties on dispreferred ones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04641","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering","primary_cat":"cs.CV","submitted_at":"2026-05-06T08:32:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01766","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time","primary_cat":"cs.LG","submitted_at":"2026-05-03T07:58:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21027","ref_index":255,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering","primary_cat":"cs.AI","submitted_at":"2026-04-22T19:18:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20366","ref_index":149,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation","primary_cat":"cs.CV","submitted_at":"2026-04-22T09:02:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MPD reduces hallucinations in LVLMs by 23.4% while retaining 97.4% of general capability through semantic disentanglement and selective parameter updates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19412","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing","primary_cat":"cs.CV","submitted_at":"2026-04-21T12:40:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"VCE mitigates object hallucination in LVLMs by decomposing activation patterns from contrastive visual inputs via SVD to suppress hallucination subspaces through targeted parameter edits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22822","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2026-04-18T06:54:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DO-Bench is a controlled benchmark that attributes VLM object hallucination errors to textual prior pressure, perceptual limits, or their interaction via two diagnostic dimensions and metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12582","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models","primary_cat":"cs.CV","submitted_at":"2026-04-14T11:05:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Decoder-side Temporal Rebalancing (DTR) reduces hallucinations in Video-LLMs by mitigating over-dominance of a single anchor frame during inference without training or auxiliary models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12357","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ReflectCAP: Detailed Image Captioning with Reflective Memory","primary_cat":"cs.AI","submitted_at":"2026-04-14T06:47:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.12455","ref_index":85,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mitigating Object Hallucinations via Sentence-Level Early Intervention","primary_cat":"cs.CV","submitted_at":"2025-07-16T17:55:43+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SENTINEL reduces MLLM object hallucinations by over 90% via sentence-level early intervention with detector-bootstrapped preference data and C-DPO loss, outperforming prior SOTA on hallucination and capability benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.09522","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding","primary_cat":"cs.CV","submitted_at":"2025-06-11T08:46:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ReVisiT refines LVLM output distributions during decoding by projecting selected vision tokens into text space via context-aware constrained divergence minimization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.21472","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration","primary_cat":"cs.CV","submitted_at":"2025-05-27T17:45:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CAAC mitigates hallucinations in LVLMs via Visual-Token Calibration and Adaptive Attention Re-Scaling guided by model confidence, showing gains on CHAIR, AMBER, and POPE especially in long-form generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.10185","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Detecting and Evaluating Medical Hallucinations in Large Vision Language Models","primary_cat":"cs.CV","submitted_at":"2024-06-14T17:14:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Presents Med-HallMark benchmark, MediHall Score metric, and MediHallDetector model for hallucination detection and evaluation in medical LVLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.18930","ref_index":232,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Hallucination of Multimodal Large Language Models: A Survey","primary_cat":"cs.CV","submitted_at":"2024-04-29T17:59:41+00:00","verdict":"ACCEPT","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2402.17177","ref_index":125,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models","primary_cat":"cs.CV","submitted_at":"2024-02-27T03:30:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Yacoob, and L. Wang, \"Mitigating hallucination in large multi-modal models via robust instruction tuning,\" 2023. [124] L. Wang, J. He, S. Li, N. Liu, and E.-P. Lim, \"Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites,\" in International Conference on Multimedia Modeling, pp. 32-45, Springer, 2024. [125] Y . Zhou, C. Cui, J. Yoon, L. Zhang, Z. Deng, C. Finn, M. Bansal, and H. Yao, \"Analyzing and mitigating object hallucination in large vision-language models,\" arXiv preprint arXiv:2310.00754, 2023. [126] I. O. Gallegos, R. A. Rossi, J. Barrow, M. M. Tanjim, S. Kim, F. Dernoncourt, T. Yu, R. Zhang, and N. K. Ahmed, \"Bias and fairness in large language models: A survey,\" arXiv preprint"},{"citing_arxiv_id":"2402.11411","ref_index":185,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Aligning Modalities in Vision Large Language Models via Preference Fine-tuning","primary_cat":"cs.LG","submitted_at":"2024-02-18T00:56:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"POVID generates AI-created preference data to fine-tune vision-language models with DPO, reducing hallucinations and improving benchmark scores.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.03568","ref_index":189,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agent AI: Surveying the Horizons of Multimodal Interaction","primary_cat":"cs.AI","submitted_at":"2024-01-07T19:11:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2306.13549","ref_index":169,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Survey on Multimodal Large Language Models","primary_cat":"cs.CV","submitted_at":"2023-06-23T15:21:52+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"LLM, respectively. ditional supervised learning paradigms that learn implicit patterns from abundant data, the crux of ICL is to learn from analogy [168]. Specifically, in the ICL setting, LLMs learn from a few examples along with an optional instruction and extrapolate to new questions, thereby solving complex and unseen tasks in a few-shot manner [22], [169], [170]. (2) ICL is usually implemented in a training-free manner [168] and thus can be flexibly integrated into different frameworks at the inference stage. A closely related technique to ICL is instruction-tuning (see §3.2), which is shown empirically to enhance the ICL ability [19]. In the context of MLLM, ICL has been extended to more modalities, leading to Multimodal ICL (M-ICL)."}],"limit":50,"offset":0}