Multimodal llms see sentiment

· 2025 · cs.CV · arXiv 2508.16873

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Understanding how visual content conveys sentiment is increasingly important in a digital landscape dominated by imagery. However, sentiment perception depends on complex scene-level semantics, making this a challenging task for computational models. This paper examines how Multimodal Large Language Models (MLLMs) perform sentiment analysis in images through a systematic, evaluation-driven study encompassing three perspectives: (i) direct sentiment classification from images using MLLMs; (ii) sentiment analysis on MLLM-generated descriptions using pre-trained LLMs; and (iii) fine-tuning these LLMs on sentiment-labeled descriptions to assess performance and generalization. Experiments on a recent benchmark show that a two-stage MLLM description-mediated pipeline can substantially improve prediction accuracy under several evaluation settings, particularly when the LLM component is fine-tuned. Across different agreement thresholds and sentiment granularities, the strongest configurations of this pipeline outperform lexicon-, CNN-, and Transformer-based baselines in our benchmark by up to 30.9%, 64.8%, and 42.4%, respectively. In cross-dataset evaluation, the proposed pipeline - without training or fine-tuning on the target dataset - still surpasses the best in-domain baseline by over 8%. Overall, the study provides a comprehensive assessment of MLLM description-mediated sentiment analysis, clarifying the conditions under which it is effective, the scenarios in which it fails, and its comparison with traditional vision-based approaches, while also providing a reproducible benchmark resource for future research.

representative citing papers

Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception

cs.CL · 2026-05-27 · unverdicted · novelty 4.0

Large-scale analysis of 59,808 annotations shows persona prompting produces convergent captions but systematically varying justifications tied to socioeconomic and political attributes in multimodal LLM urban perception outputs.

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

cs.CL · 2026-04-30 · unverdicted · novelty 4.0 · 2 refs

Persona prompting in multimodal LLMs for urban sentiment yields high within-persona stability but limited cross-persona variation, with no-persona models often matching or exceeding persona-conditioned agreement to human labels.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception cs.CL · 2026-05-27 · unverdicted · none · ref 7 · internal anchor
Large-scale analysis of 59,808 annotations shows persona prompting produces convergent captions but systematically varying justifications tied to socioeconomic and political attributes in multimodal LLM urban perception outputs.
Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception cs.CL · 2026-04-30 · unverdicted · none · ref 17 · 2 links · internal anchor
Persona prompting in multimodal LLMs for urban sentiment yields high within-persona stability but limited cross-persona variation, with no-persona models often matching or exceeding persona-conditioned agreement to human labels.

Multimodal llms see sentiment

fields

years

verdicts

representative citing papers

citing papers explorer