Intelligent Agents with Emotional Intelligence: Current Trends, Challenges, and Future Prospects
Pith reviewed 2026-05-18 08:08 UTC · model grok-4.3
The pith
This survey supplies a unified map of artificial emotional intelligence spanning recognition from multiple inputs, cognitive processing for decisions, and generation of emotional outputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that artificial emotion intelligence rests on three linked parts: emotion understanding via multimodal data processing, affective cognition that applies cognitive appraisal, emotion mapping, and adaptive modulation inside decision-making, learning, and reasoning, plus the synthesis of emotional expressions in text, speech, and facial channels, and that reviewing these parts together with their challenges and generative-technology prospects supplies the missing comprehensive picture.
What carries the argument
The holistic overview that joins multimodal emotion understanding, affective cognition with appraisal and modulation, and cross-modal emotional expression synthesis.
If this is right
- Developers gain a single reference point for combining multimodal input processing with cognitive modulation when building new agents.
- Catalogued challenges point to concrete next steps for improving current state-of-the-art methods in expression synthesis.
- Attention to generative technologies supplies a direct path for creating richer and more varied emotional outputs in future agents.
- The overall structure supports more effective integration of emotional capabilities into systems used across multiple sectors of society.
Where Pith is reading between the lines
- Testing the three-part structure in controlled user studies with real agents would reveal whether the described integration actually improves interaction quality.
- Extending the challenge analysis to include privacy and manipulation risks would connect the survey to emerging ethical questions in affective computing.
- Applying the same review lens to existing commercial chat systems could produce a practical scorecard of their current emotional intelligence levels.
- Linking the affective cognition section to established psychological models of human emotion could generate testable predictions for agent behavior.
Load-bearing premise
That earlier reviews left a large enough gap in covering emotion understanding, elicitation, and expression for one new survey to close it completely.
What would settle it
A check that finds several major recent papers on multimodal affective systems or generative emotion generation that receive no analysis or citation in the survey would show the overview falls short of being holistic.
Figures
read the original abstract
The development of agents with emotional intelligence is becoming increasingly vital due to their significant role in human-computer interaction and the growing integration of computer systems across various sectors of society. Affective computing aims to design intelligent systems that can recognize, evoke, and express human emotions, thereby emulating human emotional intelligence. While previous reviews have focused on specific aspects of this field, there has been limited comprehensive research that encompasses emotion understanding, elicitation, and expression, along with the related challenges. This survey addresses this gap by providing a holistic overview of core components of artificial emotion intelligence. It covers emotion understanding through multimodal data processing, as well as affective cognition, which includes cognitive appraisal, emotion mapping, and adaptive modulation in decision-making, learning, and reasoning. Additionally, it addresses the synthesis of emotional expression across text, speech, and facial modalities to enhance human-agent interaction. This paper identifies and analyzes the key challenges and issues encountered in the development of affective systems, covering state-of-the-art methodologies designed to address them. Finally, we highlight promising future directions, with particular emphasis on the potential of generative technologies to advance affective computing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This survey paper claims to provide a holistic overview of artificial emotional intelligence in intelligent agents to address gaps in previous reviews. It covers core components including emotion understanding through multimodal data processing, affective cognition with cognitive appraisal, emotion mapping, and adaptive modulation for decision-making, learning, and reasoning, as well as the synthesis of emotional expressions in text, speech, and facial modalities. The paper also identifies challenges in developing affective systems and discusses future directions, emphasizing generative technologies.
Significance. If the literature synthesis is thorough and unbiased, the paper could be significant as a consolidating reference in affective computing and human-computer interaction. It integrates fragmented topics into a unified framework, potentially informing the design of emotionally intelligent agents and highlighting pathways for advancement through generative AI, which is timely given the rapid development in the field.
major comments (1)
- The claim that this survey addresses the gap by providing a 'holistic overview' of emotion understanding, elicitation, expression, challenges, and future directions is load-bearing but unsupported by any disclosed literature review methodology. There is no mention of search strategy, databases consulted, inclusion/exclusion criteria, or the total number of papers reviewed, which prevents verification that the coverage is comprehensive rather than selective.
minor comments (2)
- While the abstract outlines the structure clearly, it would benefit from indicating the approximate number of references or the time span of the literature covered to better convey the scope of the survey.
- Ensure consistent use of terminology for 'affective cognition' and 'emotional intelligence' to avoid potential confusion for readers new to the field.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We agree that transparency in the literature review process is important for a survey paper and will address this by adding explicit methodological details in the revised manuscript.
read point-by-point responses
-
Referee: The claim that this survey addresses the gap by providing a 'holistic overview' of emotion understanding, elicitation, expression, challenges, and future directions is load-bearing but unsupported by any disclosed literature review methodology. There is no mention of search strategy, databases consulted, inclusion/exclusion criteria, or the total number of papers reviewed, which prevents verification that the coverage is comprehensive rather than selective.
Authors: We acknowledge this observation and agree that disclosing the review methodology strengthens the paper's credibility as a survey. In the revised version, we will insert a new subsection (likely in Section 1 or as a dedicated 'Review Methodology' section) that details the search strategy, databases consulted (including IEEE Xplore, ACM Digital Library, Google Scholar, and arXiv), key search terms and combinations, inclusion/exclusion criteria (e.g., peer-reviewed publications from 2015–2024 focusing on affective computing in agents), and the approximate number of papers initially retrieved and finally included. This addition will enable readers to evaluate the scope and potential selectivity of our synthesis while preserving the existing structure and contributions of the survey. revision: yes
Circularity Check
No circularity: literature survey with no derivations or fitted predictions
full rationale
This paper is a survey providing a holistic overview of affective computing components, challenges, and future directions. It contains no mathematical derivations, equations, empirical predictions, or parameter-fitting steps that could reduce to inputs by construction. The claim of addressing a gap in prior reviews is a direct statement of scope and contribution rather than a self-referential or fitted result. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in a manner that creates circularity. The absence of explicit search methodology is a transparency issue but does not equate to circular reasoning under the defined criteria.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Section 2 Methodology: Systematic Literature Review Process... 298 studies... emotion understanding (32%), emotional expression synthesis (30%), affective cognition (25%)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
3.2 Challenges... Data-related, Model-related, Problem Nature, Multi-Modal, Usage of LLMs/FMs
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion
A multimodal fusion model using cognitive appraisal theory, transformers, and fuzzy logic predicts video-induced pleasure levels with 0.6624 accuracy by inferring appraisal variables.
Reference graph
Works this paper leans on
-
[1]
R. W. Picard,Affective Computing. MIT Press, 2000
work page 2000
-
[2]
M. Jeon, “Emotions and affect in human factors and human–computer interaction: taxonomy, theories, approaches, and methods,” inEmotions and Affect in Human Factors and Human-Computer Interaction. Elsevier, 2017, pp. 3–26
work page 2017
-
[3]
Humanizing the role of artificial intelligence in revolutionizing emo- tional intelligence,
K. Subramani and G. Manoharan, “Humanizing the role of artificial intelligence in revolutionizing emo- tional intelligence,” in2024 3rd International Conference on Computational Modelling, Simulation and Optimization (ICCMSO), June 2024, pp. 237–242. IEEE, 2024
work page 2024
-
[4]
D. Schuller and B. W. Schuller, ”The age of artificial emotional intelligence,”Computer, vol. 51, no. 9, pp. 38–46, 2018
work page 2018
- [5]
-
[6]
R. Chen, J. Wang, L.-C. Yu, and X. Zhang, ”Decoupled variational autoencoder with interactive atten- tion for affective text generation,”Eng. Appl. Artif. Intell., vol. 123, p. 106447, 2023
work page 2023
-
[7]
Plutchik, ”A general psychoevolutionary theory of emotion,” inTheories of Emotion, pp
R. Plutchik, ”A general psychoevolutionary theory of emotion,” inTheories of Emotion, pp. 3–33, 1980
work page 1980
- [8]
-
[9]
Truong, ”Textual emotion detection–A systematic literature review,” 2024
V. Truong, ”Textual emotion detection–A systematic literature review,” 2024. 46
work page 2024
-
[10]
Recent Advances in Multimodal Affective Computing: An NLP Perspective
Hu G, Xin Y, Lyu W, Huang H, Sun C, Zhu Z, Gui L, Cai R, Cambria E, Seifi H. Recent trends of multimodal affective computing: A survey from NLP perspective.arXiv preprint arXiv:2409.07388. 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
A. Gatt and E. Krahmer, ”Survey of the state of the art in natural language generation: Core tasks, applications and evaluation,”J. Artif. Intell. Res., vol. 61, pp. 65–170, 2018
work page 2018
-
[12]
Ma F, Xie Y, Li Y, He Y, Zhang Y, Ren H, Liu Z, et al. A review of human emotion synthesis based on generative technology.IEEE Transactions on Affective Computing. 2025
work page 2025
-
[13]
Zhang Y, Yang X, Xu X, Gao Z, Huang Y, Mu S, Feng S, et al. Affective computing in the era of large language models: A survey from the nlp perspective.arXiv preprint arXiv:2408.04638. 2024
-
[14]
Li Y, Sun Q, Schlicher M, Lim Y W, Schuller B W. Artificial Emotion: A Survey of Theories and Debates on Realising Emotion in Artificial Intelligence.arXiv preprint arXiv:2508.10286. 2025
-
[15]
Mobbs R, Makris D, Argyriou V. Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities.arXiv preprint arXiv:2502.06803. 2025
-
[16]
H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, ”Emotional chatting machine: Emotional conversa- tion generation with internal and external memory,” inProc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018
work page 2018
-
[17]
H. Xue, Y. Liang, B. Mu, S. Zhang, M. Chen, Q. Chen, and L. Xie, ”E-chat: Emotion-sensitive spoken dialogue system with large language models,” inProc. IEEE 14th Int. Symp. Chin. Spoken Lang. Process. (ISCSLP), pp. 586–590, 2024
work page 2024
-
[18]
Z. Feng, ”Bridging emotional gaps in textual interactions: A study on the role of emotion analysis services,” Ph.D. dissertation, Purdue Univ., 2024
work page 2024
-
[19]
A. Abilbekov, S. Mussakhojayeva, R. Yeshpanov, and H. A. Varol, ”Kazemotts: A dataset for kazakh emotional text-to-speech synthesis,”arXiv preprint arXiv:2404.01033, 2024
-
[20]
Z. Fu, X. Tan, N. Peng, D. Zhao, and R. Yan, ”Style transfer in text: Exploration and evaluation,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018
work page 2018
-
[21]
Z. Song, X. Zheng, L. Liu, M. Xu, and X.-J. Huang, ”Generating responses with a specific emotion in dialog,” inProc. 57th Annu. Meet. Assoc. Comput. Linguist., pp. 3685–3695, 2019
work page 2019
-
[22]
X. Li, K. Song, S. Feng, D. Wang, and Y. Zhang, ”A co-attention neural network model for emotion cause analysis with emotional context awareness,” inProc. 2018 Conf. Empir. Methods Nat. Lang. Process., pp. 4752–4757, 2018
work page 2018
-
[23]
Y. Lee, A. Rabiee, and S.-Y. Lee, ”Emotional end-to-end neural speech synthesizer,”arXiv preprint arXiv:1711.05447, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [24]
- [25]
- [26]
- [27]
- [28]
-
[29]
Y. Li, Q. Pan, S. Wang, T. Yang, and E. Cambria, ”A generative model for category text generation,” Inf. Sci., vol. 450, pp. 301–315, 2018. 47
work page 2018
-
[30]
K. Wang and X. Wan, ”Sentigan: Generating sentimental texts via mixture adversarial networks,” in IJCAI, pp. 4446–4452, 2018
work page 2018
-
[31]
J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. Wang, ”Long text generation via adversarial training with leaked information,” inProc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018
work page 2018
-
[32]
D. Sinclair and W. Pye, ”Towards emotion-based synthetic consciousness: Using llms to estimate emotion probability vectors,”arXiv preprint arXiv:2310.10673, 2023
-
[33]
U. Bhattacharya, N. Rewkowski, A. Banerjee, P. Guhan, A. Bera, and D. Manocha, ”Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents,” in2021 IEEE Virtual Reality 3D User Interfaces (VR), pp. 1–10, 2021
work page 2021
-
[34]
N. Peng, M. Ghazvininejad, J. May, and K. Knight, ”Towards controllable story generation,” inProc. 1st Workshop Storytelling, pp. 43–49, 2018
work page 2018
-
[35]
A. Chatterjee, K. N. Narahari, M. Joshi, and P. Agrawal, ”Semeval-2019 task 3: Emocontext contextual emotion detection in text,” inProc. 13th Int. Workshop Semantic Evaluation, pp. 39–48, 2019
work page 2019
- [36]
-
[37]
X. Luo, S. Takamichi, T. Koriyama, Y. Saito, and H. Saruwatari, ”Controllable text-to-speech synthesis using prosodic features and emotion soft-label,” 2021
work page 2021
-
[38]
X. Luo, S. Takamichi, Y. Saito, T. Koriyama, H. Saruwatari et al., ”Emotion-controllable speech syn- thesis using emotion soft label, utterance-level prosodic factors, and word-level prominence,”APSIPA Trans. Signal Inf. Process., vol. 13, no. 1, 2024
work page 2024
-
[39]
Y. Zhang and M. Huang, ”Overview of the ntcir-14 short text generation subtask: emotion generation challenge,” inProc. 14th NTCIR Conf., 2019
work page 2019
-
[40]
N. Passalis and S. Doropoulos, ”Deepsing: Generating sentiment-aware visual stories using cross-modal music translation,”Expert Syst. Appl., vol. 164, p. 114059, 2021
work page 2021
-
[41]
D. Alnuhait, Q. Wu, and Z. Yu, ”Facechat: An emotion-aware face-to-face dialogue framework,”arXiv preprint arXiv:2303.07316, 2023
-
[42]
S. De, I. Bostan, and N. Sastry, ”Making social platforms accessible: Emotion-aware speech generation with integrated text analysis,” inInt. Conf. Adv. Soc. Netw. Anal. Mining, pp. 101–116, 2024
work page 2024
- [43]
-
[44]
H. Yang, Y. Zhao, and B. Qin, ”Face-sensitive image-to-emotional-text cross-modal translation for mul- timodal aspect-based sentiment analysis,” inProc. 2022 Conf. Empir. Methods Nat. Lang. Process., pp. 3324–3335, 2022
work page 2022
-
[45]
K. Sailunaz and R. Alhajj, ”Emotion and sentiment analysis from twitter text,”J. Comput. Sci., vol. 36, p. 101003, 2019
work page 2019
-
[46]
E. Saravia, H.-C. T. Liu, Y.-H. Huang, J. Wu, and Y.-S. Chen, ”Carer: Contextualized affect representa- tions for emotion recognition,” inProc. 2018 Conf. Empir. Methods Nat. Lang. Process., pp. 3687–3697, 2018
work page 2018
-
[47]
Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
R. Xia and Z. Ding, ”Emotion-cause pair extraction: A new task to emotion analysis in texts,”arXiv preprint arXiv:1906.01267, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[48]
Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, ”Toward controlled generation of text,” in Int. Conf. Mach. Learn., pp. 1587–1596, 2017
work page 2017
- [49]
-
[50]
Y. Wang, W. Song, W. Tao, A. Liotta, D. Yang et al., ”A systematic review on affective computing: Emotion models, databases, and recent advances,”Inf. Fusion, vol. 83, pp. 19–52, 2022
work page 2022
-
[51]
J. Li, X. Wang, G. Lv, and Z. Zeng, ”Graphcfc: A directed graph based cross-modal feature comple- mentation approach for multimodal conversational emotion recognition,”IEEE Trans. Multimedia, vol. 26, pp. 77–89, 2023
work page 2023
- [52]
-
[53]
Affect-LM: A Neural Language Model for Customizable Affective Text Generation
S. Ghosh, M. Chollet, E. Laksana, L.-P. Morency, and S. Scherer, ”Affect-lm: A neural language model for customizable affective text generation,”arXiv preprint arXiv:1704.06851, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [54]
- [55]
-
[56]
S. Yoon, S. Byun, and K. Jung, ”Multimodal speech emotion recognition using audio and text,” in2018 IEEE Spoken Lang. Technol. Workshop (SLT), pp. 112–118, 2018
work page 2018
-
[57]
G. H. De Rosa and J. P. Papa, ”A survey on text generation using generative adversarial networks,” Pattern Recognit., vol. 119, p. 108098, 2021
work page 2021
- [58]
- [59]
-
[60]
K. Yang and D. Klein, ”Fudge: Controlled text generation with future discriminators,”arXiv preprint arXiv:2104.05218, 2021
-
[61]
Y. M. Resendiz and R. Klinger, ”Affective natural language generation of event descriptions through fine-grained appraisal conditions,” 2023
work page 2023
-
[62]
P. Achlioptas, M. Ovsjanikov, K. Haydarov, M. Elhoseiny, and L. J. Guibas, ”Artemis: Affective lan- guage for visual art,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 11569–11579, 2021
work page 2021
-
[63]
A. Jia, Y. He, Y. Zhang, S. Uprety, D. Song, and C. Lioma, ”Beyond emotion: A multi-modal dataset for human desire understanding,” inProc. 2022 Conf. North Am. Chapter Assoc. Comput. Linguist., pp. 1512–1522, 2022
work page 2022
-
[64]
M. Manivannan, V. Nethrapalli, and M. Cartwright, ”Emotioncaps: Enhancing audio captioning through emotion-augmented data generation,”arXiv preprint arXiv:2410.12028, 2024
- [65]
-
[66]
T. Schmidt, K. Dennerlein, and C. Wolff, ”Using deep learning for emotion analysis of 18th and 19th century german plays,” 2021
work page 2021
-
[67]
L. A. M. Bostan and R. Klinger, ”An analysis of annotated corpora for emotion classification in text,” 2018
work page 2018
-
[68]
H. Liu, Z. Zhu, N. Iwamoto, Y. Peng, Z. Li, Y. Zhou, E. Bozkurt, and B. Zheng, ”Beat: A large- scale semantic and emotional multimodal dataset for conversational gestures synthesis,” inEur. Conf. Comput. Vis., pp. 612–630, 2022
work page 2022
-
[69]
V. K. Jain, S. Kumar, and S. L. Fernandes, ”Extraction of emotions from multilingual text using intelligent text processing and computational linguistics,”J. Comput. Sci., vol. 21, pp. 316–326, 2017. 49
work page 2017
-
[70]
MojiTalk: Generating Emotional Responses at Scale
X. Zhou and W. Y. Wang, ”Mojitalk: Generating emotional responses at scale,”arXiv preprint arXiv:1711.04090, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[71]
M. Firdaus, G. Singh, A. Ekbal, and P. Bhattacharyya, ”Multi-step prompting for few-shot emotion- grounded conversations,” inProc. 32nd ACM Int. Conf. Inf. Knowl. Manage., pp. 3886–3891, 2023
work page 2023
-
[72]
K. L. Tan, C. P. Lee, and K. M. Lim, ”A survey of sentiment analysis: Approaches, datasets, and future research,”Appl. Sci., vol. 13, no. 7, p. 4550, 2023
work page 2023
-
[73]
Bridging paintings and music– exploring emotion based music generation through paintings,
T. Hisariya, H. Zhang, and J. Liang, ”Bridging paintings and music–exploring emotion based music generation through paintings,”arXiv preprint arXiv:2409.07827, 2024
-
[74]
H.-Y. Shum, X.-D. He, and D. Li, ”From eliza to xiaoice: Challenges and opportunities with social chatbots,”Front. Inf. Technol. Electron. Eng., vol. 19, pp. 10–26, 2018
work page 2018
- [75]
-
[76]
M. Firdaus, U. Jain, A. Ekbal, and P. Bhattacharyya, ”SEPRG: Sentiment aware emotion controlled personalized response generation,” inProc. 14th Int. Conf. Nat. Lang. Gener., pp. 353–363, 2021
work page 2021
- [78]
- [79]
- [81]
-
[82]
S. M. Mohammad et al., ”SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection,” inProc. 2024 Conf. Empir. Methods Nat. Lang. Process., pp. 20939–20962, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.