Multimodal LLMs See Sentiment

Bogdan T. Nassu; John Harrison; Myriam R. Delgado; Neemias B. da Silva; Rodrigo Minetto; Thiago H. Silva

arxiv: 2508.16873 · v3 · pith:LWTDSD6Snew · submitted 2025-08-23 · 💻 cs.CV · cs.SI

Multimodal LLMs See Sentiment

Neemias B. da Silva , John Harrison , Rodrigo Minetto , Myriam R. Delgado , Bogdan T. Nassu , Thiago H. Silva This is my paper

classification 💻 cs.CV cs.SI

keywords sentimentanalysisbenchmarkllmspipelinedescription-mediateddescriptionsevaluation

0 comments

read the original abstract

Understanding how visual content conveys sentiment is increasingly important in a digital landscape dominated by imagery. However, sentiment perception depends on complex scene-level semantics, making this a challenging task for computational models. This paper examines how Multimodal Large Language Models (MLLMs) perform sentiment analysis in images through a systematic, evaluation-driven study encompassing three perspectives: (i) direct sentiment classification from images using MLLMs; (ii) sentiment analysis on MLLM-generated descriptions using pre-trained LLMs; and (iii) fine-tuning these LLMs on sentiment-labeled descriptions to assess performance and generalization. Experiments on a recent benchmark show that a two-stage MLLM description-mediated pipeline can substantially improve prediction accuracy under several evaluation settings, particularly when the LLM component is fine-tuned. Across different agreement thresholds and sentiment granularities, the strongest configurations of this pipeline outperform lexicon-, CNN-, and Transformer-based baselines in our benchmark by up to 30.9%, 64.8%, and 42.4%, respectively. In cross-dataset evaluation, the proposed pipeline - without training or fine-tuning on the target dataset - still surpasses the best in-domain baseline by over 8%. Overall, the study provides a comprehensive assessment of MLLM description-mediated sentiment analysis, clarifying the conditions under which it is effective, the scenarios in which it fails, and its comparison with traditional vision-based approaches, while also providing a reproducible benchmark resource for future research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception
cs.CL 2026-04 conditional novelty 5.0

Persona prompting creates stable but minimally differentiated LLM behavior on urban sentiment tasks, with a no-persona baseline frequently matching or exceeding persona-conditioned agreement to human labels.
Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception
cs.CL 2026-04 unverdicted novelty 4.0

Persona prompting in multimodal LLMs for urban sentiment yields high within-persona stability but limited cross-persona variation, with no-persona models often matching or exceeding persona-conditioned agreement to hu...