Proposes AGSR and the FAB-G supervised multi-agent framework that predicts attribute salience from human annotations to constrain MLLM emotion reasoning, yielding gains on EmoArt and cross-dataset tests.
Building a large scale dataset for image emotion recognition: The fine print and the benchmark
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A zero-training VLM framework generates music from images via ABC notation, multi-modal RAG, and self-refinement while providing text and visual explanations for the outputs.
citing papers explorer
-
Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models
Proposes AGSR and the FAB-G supervised multi-agent framework that predicts attribute salience from human annotations to constrain MLLM emotion reasoning, yielding gains on EmoArt and cross-dataset tests.
-
Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach
A zero-training VLM framework generates music from images via ABC notation, multi-modal RAG, and self-refinement while providing text and visual explanations for the outputs.