ITIScore evaluates MLLM image captions via image-to-text-to-image reconstruction consistency and aligns with human judgments on a new 40K-caption benchmark.
Image captioning evaluation in the age of multimodal llms: Challenges and future perspectives.arXiv preprint arXiv:2503.14604,
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
FPBench evaluates 20 MLLMs across 8 fingerprint tasks on 7 datasets and shows fine-tuning vision and language encoders improves performance by 7-39%.
DaID mitigates MLLM hallucinations by attention-guided selection of dual layers that calibrate token generation using internal perceptual discrepancies.
TCAP detects backdoor samples in MLLM fine-tuning via tri-component attention profiling, GMM-based head identification, and EM vote aggregation.
citing papers explorer
-
ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs
ITIScore evaluates MLLM image captions via image-to-text-to-image reconstruction consistency and aligns with human judgments on a new 40K-caption benchmark.
-
FPBench: A Comprehensive Benchmark of Multimodal Large Language Models for Fingerprint Analysis
FPBench evaluates 20 MLLMs across 8 fingerprint tasks on 7 datasets and shows fine-tuning vision and language encoders improves performance by 7-39%.
-
Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation
DaID mitigates MLLM hallucinations by attention-guided selection of dual layers that calibrate token generation using internal perceptual discrepancies.
-
TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning
TCAP detects backdoor samples in MLLM fine-tuning via tri-component attention profiling, GMM-based head identification, and EM vote aggregation.