UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries

Lingsen Zhang; Liqiang Nie; Rui Shao; Tao Tan; Yijie Zhu; Zitong Yu

arxiv: 2507.23372 · v2 · pith:2RARSBWWnew · submitted 2025-07-31 · 💻 cs.CV

UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries

Yijie Zhu , Lingsen Zhang , Zitong Yu , Rui Shao , Tao Tan , Liqiang Nie This is my paper

classification 💻 cs.CV

keywords emotionalunderstandinggenerationtasksuniemoenhanceexpertfeedback

0 comments

read the original abstract

Emotional understanding and generation are often treated as separate tasks, yet they are inherently complementary and can mutually enhance each other. In this paper, we propose the UniEmo, a unified framework that seamlessly integrates these two tasks. The key challenge lies in the abstract nature of emotions, necessitating the extraction of visual representations beneficial for both tasks. To address this, we propose a hierarchical emotional understanding chain with learnable expert queries that progressively extracts multi-scale emotional features, thereby serving as a foundational step for unification. Simultaneously, we fuse these expert queries and emotional representations to guide the diffusion model in generating emotion-evoking images. To enhance the diversity and fidelity of the generated emotional images, we further introduce the emotional correlation coefficient and emotional condition loss into the fusion process. This step facilitates fusion and alignment for emotional generation guided by the understanding. In turn, we demonstrate that joint training allows the generation component to provide implicit feedback to the understanding part. Furthermore, we propose a novel data filtering algorithm to select high-quality and diverse emotional images generated by the well-trained model, which explicitly feedback into the understanding part. Together, these generation-driven dual feedback processes enhance the model's understanding capacity. Extensive experiments show that UniEmo significantly outperforms state-of-the-art methods in both emotional understanding and generation tasks. The code for the proposed method is available at https://github.com/JiuTian-VL/UniEmo.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition
cs.LG 2026-05 unverdicted novelty 6.0

HyperEmo-RAG uses hierarchical hyperbolic embeddings and graph-based evidence injection to outperform prior methods in multimodal emotion recognition.
ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.
AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
cs.CV 2026-04 unverdicted novelty 5.0

AffectAgent deploys a query planner, evidence filter, and emotion generator as collaborative agents trained via MAPPO with shared reward, plus MB-MoE and RAAF modules, to achieve superior multimodal emotion recognitio...