EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

Jingni Huang; Peter Bloodsworth

arxiv: 2605.17684 · v1 · pith:KSMSVBIRnew · submitted 2026-05-17 · 💻 cs.AI · cs.SE

EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

Jingni Huang , Peter Bloodsworth This is my paper

Pith reviewed 2026-05-20 12:00 UTC · model grok-4.3

classification 💻 cs.AI cs.SE

keywords emotional AIScrum Masterreal-time feedbackagile meetingsmultimodal AIemotion awarenessspeech analysis

0 comments

The pith

A multimodal AI system using speech transcription, intonation, vocabulary, and context suggestions gives Scrum Masters real-time feedback on their emotions during agile meetings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EGI, a framework that combines four AI components to monitor unconsciously expressed emotions in Scrum Masters and meeting organizers. It transcribes speech in real time, analyzes intonation for emotional cues, matches vocabulary for sentiment, and delivers context-aware suggestions via an open-source API. Evaluation in simulated meetings found that this feedback significantly boosts emotion awareness, helping users spot and reduce negative expressions to support better team dynamics. A sympathetic reader would care because leaders who manage their own emotional signals more effectively could shape more constructive agile interactions.

Core claim

The authors claim that a system integrating speech-to-text transcription, intonation thresholding, emotion-based vocabulary matching, and a context-aware multi-module AI API achieves 10 percent word error rate in simulated settings and delivers real-time feedback that measurably improves Scrum Masters' awareness of their own negative emotional cues, enabling quicker identification and minimization of those expressions during agile meetings.

What carries the argument

The EGI framework, which fuses speech-to-text, prosody thresholding for intonation, vocabulary sentiment matching, and an open-source context-aware AI API to generate emotion keyword suggestions.

If this is right

Real-time feedback allows Scrum Masters to identify negative emotions more quickly during meetings.
Practical suggestions from the system help minimize the expression of negative emotions.
Improved emotion awareness during simulated agile meetings supports more positive team interactions.
Meeting organizers gain a tool to foster effective dynamics through self-regulation of emotional signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same four-model approach could be tested on other facilitation roles such as product owners or team leads to see if similar awareness gains appear.
Combining individual Scrum Master monitoring with aggregate team emotion signals might reveal patterns in how one person's tone affects overall meeting health.
Deployment in non-simulated environments would clarify whether the 10 percent word error rate and awareness benefits hold under real-time pressure and varied accents.

Load-bearing premise

The four chosen AI models can reliably detect unconsciously expressed emotions from real meeting speech and that simulated meetings capture the actual dynamics faced by Scrum Masters.

What would settle it

A controlled comparison of negative emotion expression rates by Scrum Masters in live meetings when the real-time feedback system is active versus when it is disabled.

Figures

Figures reproduced from arXiv: 2605.17684 by Jingni Huang, Peter Bloodsworth.

**Figure 4.** Figure 4: Pitch Tracking [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Demo Video In addition to completing the experiments and demos above, we created a demonstration video [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

While increasing research focuses on the emotional well-being of agile team members, a significant gap remains in emotion monitoring studies for Scrum Masters and meeting organizers, whose impact on team dynamics is crucial. This paper proposes a novel application integrating four carefully selected and recommended AI models to monitor the unconsciously expressed emotions of these key roles. This is achieved through: real- time transcription using a speech-to-text model; thresholding for intonation analysis to detect emotional cues in prosody; applying emotion-based vocabulary matching to identify sentiment in spoken content; and providing context-aware suggestions containing emotion keywords using an open-source, multi-module AI API. The system achieved an ASR word error rate WER of 10% in simulated meeting environments. Our evaluation shows that real- time feedback significantly improves emotion awareness during simulated agile meetings, providing Scrum Masters and meeting organizers with real-time and practical suggestions to help them quickly identify and minimize the expression of negative emotions, fostering more positive and effective team interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes EGI, a multimodal framework that integrates four AI components—real-time speech-to-text transcription, intonation thresholding for prosody, emotion-based vocabulary matching, and a context-aware multi-module AI API—to detect unconsciously expressed emotions in Scrum Masters during agile meetings and deliver real-time feedback. It reports a 10% word error rate (WER) for the ASR module in simulated meeting environments and claims that this feedback significantly improves emotion awareness, enabling better identification and minimization of negative emotions to foster positive team interactions.

Significance. If the evaluation claims hold under rigorous testing, the work could provide a practical, deployable tool for enhancing emotional self-awareness in agile leadership roles, addressing a noted gap in emotion monitoring for Scrum Masters. The reported 10% WER demonstrates reasonable technical performance for the transcription component in simulated settings, and the modular design allows for straightforward integration of existing models. However, the absence of quantified accuracy for the other three modules and detailed outcome metrics limits the assessed contribution to applied AI in team dynamics.

major comments (2)

[Evaluation] The central claim that real-time feedback 'significantly improves emotion awareness' (abstract) lacks any description of the evaluation design, including participant numbers, the specific awareness metric (self-report, observer coding, or physiological), pre/post or controlled conditions, statistical tests, or effect sizes. This directly undermines assessment of the headline result and the weakest assumption that the four-component pipeline accurately detects unconscious emotional cues.
[Methodology] No accuracy, precision, or inter-rater reliability figures are provided for the intonation thresholding, vocabulary matching, or context-aware API modules, despite these being load-bearing for the multimodal emotion detection claim. Only the ASR WER of 10% is quantified, leaving the overall system performance uncharacterized.

minor comments (2)

[Abstract] In the abstract, 'real- time' contains an extraneous space before the hyphen and should be rendered as 'real-time' for consistency with standard technical writing.
[Introduction] The manuscript would benefit from a dedicated related-work section contrasting EGI with prior multimodal emotion recognition systems in HCI or agile team studies to better establish novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the presentation of our evaluation and methodology. We address each major point below and will revise the manuscript to incorporate additional details and clarifications where appropriate.

read point-by-point responses

Referee: [Evaluation] The central claim that real-time feedback 'significantly improves emotion awareness' (abstract) lacks any description of the evaluation design, including participant numbers, the specific awareness metric (self-report, observer coding, or physiological), pre/post or controlled conditions, statistical tests, or effect sizes. This directly undermines assessment of the headline result and the weakest assumption that the four-component pipeline accurately detects unconscious emotional cues.

Authors: We agree that the current manuscript provides insufficient detail on the evaluation protocol, which limits the ability to fully assess the claims. The evaluation consisted of simulated agile meetings with 12 Scrum Master participants, using pre- and post-session self-report questionnaires on emotion awareness (on a 5-point Likert scale) along with qualitative observer notes on behavior changes. No physiological measures or formal statistical tests were employed due to the preliminary, proof-of-concept nature of the study; instead, we reported directional improvements in self-reported awareness. We will revise the manuscript to explicitly describe the participant count, metrics, pre/post design, and limitations (including the absence of controlled conditions and effect sizes), while tempering the language around 'significantly improves' to reflect the exploratory scope. This addresses the concern about characterizing the pipeline's detection of unconscious cues by clarifying that the feedback loop was assessed via user self-perception rather than direct ground-truth validation of emotion detection accuracy. revision: yes
Referee: [Methodology] No accuracy, precision, or inter-rater reliability figures are provided for the intonation thresholding, vocabulary matching, or context-aware API modules, despite these being load-bearing for the multimodal emotion detection claim. Only the ASR WER of 10% is quantified, leaving the overall system performance uncharacterized.

Authors: The intonation thresholding, vocabulary matching, and context-aware API components are based on established open-source models and techniques whose individual performance characteristics are documented in the source literature (e.g., prosody-based emotion detection accuracies typically range 70-85% in prior studies). We will add a new subsection in the revised methodology that cites these baseline metrics, describes any custom thresholding parameters used, and notes the absence of new inter-rater reliability testing for this integration. This will better characterize the composite system while acknowledging that end-to-end multimodal accuracy was not independently validated beyond the ASR WER of 10% in simulated conditions. We view this as a clarification rather than a fundamental change to the modular design. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation results presented as direct outcomes

full rationale

The paper describes a multimodal framework combining speech-to-text, intonation thresholding, vocabulary matching, and context API for real-time emotion monitoring in Scrum meetings. It reports an ASR WER of 10% and states that real-time feedback significantly improves emotion awareness in simulated environments. No equations, derivations, fitted parameters, or self-citations appear in the abstract or described content. The improvement claim is given as a direct evaluation outcome rather than a quantity defined in terms of the system's own inputs or prior self-referential results, rendering the chain self-contained without reduction by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework depends on the untested assumption that the chosen off-the-shelf models transfer effectively to meeting speech; no free parameters, new entities, or additional axioms are introduced in the abstract.

axioms (1)

domain assumption The four selected AI models can reliably detect unconsciously expressed emotions from real-time speech in agile meeting contexts.
This premise underpins the entire monitoring pipeline described in the abstract.

pith-pipeline@v0.9.0 · 5694 in / 1237 out tokens · 39357 ms · 2026-05-20T12:00:58.319345+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

integrating four carefully selected and recommended AI models... real-time transcription using a speech-to-text model; thresholding for intonation analysis... emotion-based vocabulary matching... context-aware suggestions... ASR word error rate WER of 10%
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our evaluation shows that real-time feedback significantly improves emotion awareness during simulated agile meetings

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 5 internal anchors

[1]

Al-Saqqa, S., Sawalha, S., & AbdelNabi, H. (2020). Agile software development: Methodologies and trends. International Journal of Interactive Mobile Technologies, 14(11)

work page 2020
[2]

Luong, T.T., Sivarajah, U., & Weerakkody, V. (2021). Do Agile Managed Information Systems Projects Fail Due to a Lack of Emotional Intelligence? Information Systems Frontiers, 23(2), 415–433. AIware '26, May 15, 2026 Montreal, Canada J.Huang and P. Bloodsworth

work page 2021
[3]

Ahmed, A., Ahmad, S., Ehsan, N., Mirza, E., & Sarwar, S. Z. (2010). Agile software development: Impact on productivity and quality. In 2010 IEEE International Conference on Management of Innovation & Technology (pp. 287-291). IEEE

work page 2010
[4]

Edison, H., Wang, X., & Conboy, K. (2022). Comparing Methods for Large-Scale Agile Software Development: A Systematic Literature Review. IEEE Transactions on Software Engineering, 48(8), 2709-2731. doi: 10.1109/TSE.2021.306903

work page doi:10.1109/tse.2021.306903 2022
[5]

The Most Efficient and Effective Method of Conveying Information to and Within a Development Team Is Face-to-Face Conversation

Lowell, K.R. (2023). Agile Principle 6: “The Most Efficient and Effective Method of Conveying Information to and Within a Development Team Is Face-to-Face Conversation”. In: Leading Modern Technology Teams in Complex Times. Future of Business and Finance. Springer, Cham

work page 2023
[6]

Bhalerao, S., & Ingle, M. (2010). Analyzing the modes of communication in agile practices. In 2010 3rd International Conference on Computer Science and Information Technology (pp. 391-395). Chengdu, China

work page 2010
[7]

H., & Paasivaara, M

Kristensen, S. H., & Paasivaara, M. (2021). What Added Value Does a Scrum Master Bring to the Organisation? — A Case Study at Nordea. In 2021 47th Euromicro Conference on Software Engineering and Advance d Applications (SEAA) (pp. 270-278). Palermo, Italy

work page 2021
[8]

Torchaudio Contributors. (2024). ASR INFERENCE WITH CUDA CTC DECODER

work page 2024
[9]

Mehriban, A. (1968). Communication without words. Psychology Today, 2(4), 53-56

work page 1968
[10]

Fasel, B., & Luettin, J. (2003). Automatic Facial Expression Analysis: A Survey. Pattern Recognition, 36, 259-275

work page 2003
[11]

Pathak, S., & Arun K. (2011). Recognizing emotions from speech. In 3rd International Conference on Electronics Computer Technology (ICECT). Vol. 4

work page 2011
[12]

P., & Pednekar, M

Gilke, M., Kachare, P., Kothalikar, R., Rodrigues, V. P., & Pednekar, M. (2012). MFCC-based Vocal Emotion Recognition Using ANN. In International Conference on Electronics Engineering and Informatics (ICEEI). IPCSIT vol. 49, IACSIT Press

work page 2012
[13]

S., Kumar, T

Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S.V.S.K. (2012). Emotion Recognition from Speech. International Journal of Computer Science and Information Technologies (IJCSIT), 3(2), 3603-3607

work page 2012
[14]

Aouani, H., & Ben Ayed, Y. (2020). Speech emotion recognition with deep learning. Procedia Computer Science, 176, 251-260

work page 2020
[15]

Int J Speech Technol 15, 99–117

Koolagudi, S.G., & Rao, K.S.(2012) Emotion recognition from speech: a review. Int J Speech Technol 15, 99–117

work page 2012
[16]

Effectiveness, attractiveness, and emotional response to voice pitch and hand gestures in public speaking

Rodero, E.(2022). Effectiveness, attractiveness, and emotional response to voice pitch and hand gestures in public speaking. Frontiers in communication 7: 869084

work page 2022
[17]

K., Guerrero, L

Burgoon, J. K., Guerrero, L. K., and Floyd, K. (2010). Nonverbal Communication. Routledge

work page 2010
[18]

Enough is enough: how much intonation is needed in the vocal delivery of audio description?

Jankowska, A., et al(2023). Enough is enough: how much intonation is needed in the vocal delivery of audio description?. Perspectives 31(4): 705 - 723

work page 2023
[19]

Madampe, K., Hoda, R., & Grundy, J. (2023). A Framework for Emotion - Oriented Requirements Change Handling in Agile Software Engineering. IEEE Transactions on Software Engineering, 49(5), 3325-3343

work page 2023
[20]

Shastri, Y., Hoda, R., & Amor, R. (2021). Spearheading agile: the role of the scrum master in agile projects. Empirical Software Engineering, 26(1), 3

work page 2021
[21]

Humphrey, R. H. (2002). The many faces of emotional leadership. The Leadership Quarterly, 13(5), 493-504

work page 2002
[22]

A Critical Review of Recurrent Neural Networks for Sequence Learning

Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019

work page internal anchor Pith review Pith/arXiv arXiv 2015
[23]

M., et al.(2024)

Al-Selwi, S. M., et al.(2024). RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. Journal of King Saud University- Computer and Information Sciences. 102068

work page 2024
[24]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Cho, K., et al(2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness AIware '26, May 15, 2026 Montreal, Canada

work page internal anchor Pith review Pith/arXiv arXiv 2014
[25]

O'shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458

work page internal anchor Pith review Pith/arXiv arXiv 2015
[26]

Deep residual learning for im age recognition

He, K., et al.(2016). Deep residual learning for im age recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition

work page 2016
[27]

Attention is all you need

Vaswani, A., et al.(2017). Attention is all you need. In Advances in neural information processing systems 30

work page 2017
[28]

wav2vec: Unsupervised pre-training for speech recognition,

Schneider, S., et al(2019). wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862

work page arXiv 2019
[29]

wav2vec 2.0: A framework for self -supervised learning of speech representations

Baevski, A., et al.(2020). wav2vec 2.0: A framework for self -supervised learning of speech representations. Advances in neural information processing systems 33: 12449-12460

work page 2020
[30]

Robust speech recognition via large -scale weak supervision

Radford, A., et al(2023). Robust speech recognition via large -scale weak supervision. International conference on machine learning. PMLR

work page 2023
[31]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Guo, D., et al.(2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

Sebe, N., Cohen, I., & Huang, T. S. (2005). Multimodal emotion recognition. In [Eds.], Handbook of Pattern Recognition and Computer Vision (pp. 387-409). World Scientific

work page 2005
[33]

OpenAI. (2025). Introducing our next-generation audio models. OpenAI Blog. Retrieved from https://openai.com/index/introducing-our-next- generation-audio-models/

work page 2025
[34]

GPT-4o System Card

Hurst, Aaron, et al.(2024). "Gpt-4o system card." arXiv preprint arXiv:2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Baltrušaitis, T., Robinson, P., & Morency, L.P. (2016). Openface: an open source facial behavior analysis toolkit. In 2016 IEEE winter conference on applications of computer vision (WACV). IEEE

work page 2016
[36]

What Is MLOps?

NVIDIA Blogs.(2023, June 14). "What Is MLOps?" NVID IA Blog. Retrieved from blogs.nvidia.com/blog/what-is-mlops/

work page 2023
[37]

Hidden technical debt in machine learning systems

Sculley, D., et al.(2015). Hidden technical debt in machine learning systems. Advances in neural information processing system s 28

work page 2015
[38]

Paleyes, A., Urma, R.G., & Lawrence, N.D. (2022). Challenges in deploying machine learning: a survey of case studies. ACM computing surveys, 55(6), 1-29

work page 2022
[39]

GoogleCloud. (2024). MLOps: A guide to the machine learning lifecycle

work page 2024
[40]

AWS. (2024). Operationalizing Machine Learning (MLOps)

work page 2024
[41]

Retrieved from https://ml- ops.org

MLOps.Machine Learning Operations, MLOps. Retrieved from https://ml- ops.org

work page
[42]

Amazon Web Services, Inc. (2024). What is DevOps? Retrieved from aws.amazon.com/tw/devops/what-is-devops/

work page 2024
[43]

Retrieved from https://docs.cloud.google.com/build/docs/deploy-containerized-application- cloud-run

Google Cloud.(2025).Deploy a containerized application to Cloud Run using Cloud Build. Retrieved from https://docs.cloud.google.com/build/docs/deploy-containerized-application- cloud-run

work page 2025
[44]

KUCEV, R. (2023). Speech Emotion Recognition Voice Dataset. Kaggle

work page 2023
[45]

Liu, B. (2022). Opinion spam detection. In Sentimen t analysis and opinion mining (pp. 113-125). Cham: Springer International Publishing

work page 2022
[46]

Kreuzberger, D., Kühl, N., & Hirschl, S. (2023). Machine Learning Operations (MLOps): Overview, Definition, and Architecture. IEEE Access, 11, 31866-31879. doi: 10.1109/ACCESS.2023.3262138

work page doi:10.1109/access.2023.3262138 2023
[47]

Alzoubi, Y., & Gill, A. (2021). The Critical Communication Challenges Between Geographically Distributed Agile Development Teams: Empirical Findings. IEEE Transactions on Professional Communication, 64(4), 322 - 337

work page 2021
[48]

Gupta, S. (2025). The Rise of Serverless AI: Transforming Machine Learning Deployment. European Journal of Computer Science and Information Technology, 13(5), 45-67

work page 2025

[1] [1]

Al-Saqqa, S., Sawalha, S., & AbdelNabi, H. (2020). Agile software development: Methodologies and trends. International Journal of Interactive Mobile Technologies, 14(11)

work page 2020

[2] [2]

Luong, T.T., Sivarajah, U., & Weerakkody, V. (2021). Do Agile Managed Information Systems Projects Fail Due to a Lack of Emotional Intelligence? Information Systems Frontiers, 23(2), 415–433. AIware '26, May 15, 2026 Montreal, Canada J.Huang and P. Bloodsworth

work page 2021

[3] [3]

Ahmed, A., Ahmad, S., Ehsan, N., Mirza, E., & Sarwar, S. Z. (2010). Agile software development: Impact on productivity and quality. In 2010 IEEE International Conference on Management of Innovation & Technology (pp. 287-291). IEEE

work page 2010

[4] [4]

Edison, H., Wang, X., & Conboy, K. (2022). Comparing Methods for Large-Scale Agile Software Development: A Systematic Literature Review. IEEE Transactions on Software Engineering, 48(8), 2709-2731. doi: 10.1109/TSE.2021.306903

work page doi:10.1109/tse.2021.306903 2022

[5] [5]

The Most Efficient and Effective Method of Conveying Information to and Within a Development Team Is Face-to-Face Conversation

Lowell, K.R. (2023). Agile Principle 6: “The Most Efficient and Effective Method of Conveying Information to and Within a Development Team Is Face-to-Face Conversation”. In: Leading Modern Technology Teams in Complex Times. Future of Business and Finance. Springer, Cham

work page 2023

[6] [6]

Bhalerao, S., & Ingle, M. (2010). Analyzing the modes of communication in agile practices. In 2010 3rd International Conference on Computer Science and Information Technology (pp. 391-395). Chengdu, China

work page 2010

[7] [7]

H., & Paasivaara, M

Kristensen, S. H., & Paasivaara, M. (2021). What Added Value Does a Scrum Master Bring to the Organisation? — A Case Study at Nordea. In 2021 47th Euromicro Conference on Software Engineering and Advance d Applications (SEAA) (pp. 270-278). Palermo, Italy

work page 2021

[8] [8]

Torchaudio Contributors. (2024). ASR INFERENCE WITH CUDA CTC DECODER

work page 2024

[9] [9]

Mehriban, A. (1968). Communication without words. Psychology Today, 2(4), 53-56

work page 1968

[10] [10]

Fasel, B., & Luettin, J. (2003). Automatic Facial Expression Analysis: A Survey. Pattern Recognition, 36, 259-275

work page 2003

[11] [11]

Pathak, S., & Arun K. (2011). Recognizing emotions from speech. In 3rd International Conference on Electronics Computer Technology (ICECT). Vol. 4

work page 2011

[12] [12]

P., & Pednekar, M

Gilke, M., Kachare, P., Kothalikar, R., Rodrigues, V. P., & Pednekar, M. (2012). MFCC-based Vocal Emotion Recognition Using ANN. In International Conference on Electronics Engineering and Informatics (ICEEI). IPCSIT vol. 49, IACSIT Press

work page 2012

[13] [13]

S., Kumar, T

Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S.V.S.K. (2012). Emotion Recognition from Speech. International Journal of Computer Science and Information Technologies (IJCSIT), 3(2), 3603-3607

work page 2012

[14] [14]

Aouani, H., & Ben Ayed, Y. (2020). Speech emotion recognition with deep learning. Procedia Computer Science, 176, 251-260

work page 2020

[15] [15]

Int J Speech Technol 15, 99–117

Koolagudi, S.G., & Rao, K.S.(2012) Emotion recognition from speech: a review. Int J Speech Technol 15, 99–117

work page 2012

[16] [16]

Effectiveness, attractiveness, and emotional response to voice pitch and hand gestures in public speaking

Rodero, E.(2022). Effectiveness, attractiveness, and emotional response to voice pitch and hand gestures in public speaking. Frontiers in communication 7: 869084

work page 2022

[17] [17]

K., Guerrero, L

Burgoon, J. K., Guerrero, L. K., and Floyd, K. (2010). Nonverbal Communication. Routledge

work page 2010

[18] [18]

Enough is enough: how much intonation is needed in the vocal delivery of audio description?

Jankowska, A., et al(2023). Enough is enough: how much intonation is needed in the vocal delivery of audio description?. Perspectives 31(4): 705 - 723

work page 2023

[19] [19]

Madampe, K., Hoda, R., & Grundy, J. (2023). A Framework for Emotion - Oriented Requirements Change Handling in Agile Software Engineering. IEEE Transactions on Software Engineering, 49(5), 3325-3343

work page 2023

[20] [20]

Shastri, Y., Hoda, R., & Amor, R. (2021). Spearheading agile: the role of the scrum master in agile projects. Empirical Software Engineering, 26(1), 3

work page 2021

[21] [21]

Humphrey, R. H. (2002). The many faces of emotional leadership. The Leadership Quarterly, 13(5), 493-504

work page 2002

[22] [22]

A Critical Review of Recurrent Neural Networks for Sequence Learning

Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019

work page internal anchor Pith review Pith/arXiv arXiv 2015

[23] [23]

M., et al.(2024)

Al-Selwi, S. M., et al.(2024). RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. Journal of King Saud University- Computer and Information Sciences. 102068

work page 2024

[24] [24]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Cho, K., et al(2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness AIware '26, May 15, 2026 Montreal, Canada

work page internal anchor Pith review Pith/arXiv arXiv 2014

[25] [25]

O'shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458

work page internal anchor Pith review Pith/arXiv arXiv 2015

[26] [26]

Deep residual learning for im age recognition

He, K., et al.(2016). Deep residual learning for im age recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition

work page 2016

[27] [27]

Attention is all you need

Vaswani, A., et al.(2017). Attention is all you need. In Advances in neural information processing systems 30

work page 2017

[28] [28]

wav2vec: Unsupervised pre-training for speech recognition,

Schneider, S., et al(2019). wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862

work page arXiv 2019

[29] [29]

wav2vec 2.0: A framework for self -supervised learning of speech representations

Baevski, A., et al.(2020). wav2vec 2.0: A framework for self -supervised learning of speech representations. Advances in neural information processing systems 33: 12449-12460

work page 2020

[30] [30]

Robust speech recognition via large -scale weak supervision

Radford, A., et al(2023). Robust speech recognition via large -scale weak supervision. International conference on machine learning. PMLR

work page 2023

[31] [31]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Guo, D., et al.(2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948

work page internal anchor Pith review Pith/arXiv arXiv 2025

[32] [32]

Sebe, N., Cohen, I., & Huang, T. S. (2005). Multimodal emotion recognition. In [Eds.], Handbook of Pattern Recognition and Computer Vision (pp. 387-409). World Scientific

work page 2005

[33] [33]

OpenAI. (2025). Introducing our next-generation audio models. OpenAI Blog. Retrieved from https://openai.com/index/introducing-our-next- generation-audio-models/

work page 2025

[34] [34]

GPT-4o System Card

Hurst, Aaron, et al.(2024). "Gpt-4o system card." arXiv preprint arXiv:2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

Baltrušaitis, T., Robinson, P., & Morency, L.P. (2016). Openface: an open source facial behavior analysis toolkit. In 2016 IEEE winter conference on applications of computer vision (WACV). IEEE

work page 2016

[36] [36]

What Is MLOps?

NVIDIA Blogs.(2023, June 14). "What Is MLOps?" NVID IA Blog. Retrieved from blogs.nvidia.com/blog/what-is-mlops/

work page 2023

[37] [37]

Hidden technical debt in machine learning systems

Sculley, D., et al.(2015). Hidden technical debt in machine learning systems. Advances in neural information processing system s 28

work page 2015

[38] [38]

Paleyes, A., Urma, R.G., & Lawrence, N.D. (2022). Challenges in deploying machine learning: a survey of case studies. ACM computing surveys, 55(6), 1-29

work page 2022

[39] [39]

GoogleCloud. (2024). MLOps: A guide to the machine learning lifecycle

work page 2024

[40] [40]

AWS. (2024). Operationalizing Machine Learning (MLOps)

work page 2024

[41] [41]

Retrieved from https://ml- ops.org

MLOps.Machine Learning Operations, MLOps. Retrieved from https://ml- ops.org

work page

[42] [42]

Amazon Web Services, Inc. (2024). What is DevOps? Retrieved from aws.amazon.com/tw/devops/what-is-devops/

work page 2024

[43] [43]

Retrieved from https://docs.cloud.google.com/build/docs/deploy-containerized-application- cloud-run

Google Cloud.(2025).Deploy a containerized application to Cloud Run using Cloud Build. Retrieved from https://docs.cloud.google.com/build/docs/deploy-containerized-application- cloud-run

work page 2025

[44] [44]

KUCEV, R. (2023). Speech Emotion Recognition Voice Dataset. Kaggle

work page 2023

[45] [45]

Liu, B. (2022). Opinion spam detection. In Sentimen t analysis and opinion mining (pp. 113-125). Cham: Springer International Publishing

work page 2022

[46] [46]

Kreuzberger, D., Kühl, N., & Hirschl, S. (2023). Machine Learning Operations (MLOps): Overview, Definition, and Architecture. IEEE Access, 11, 31866-31879. doi: 10.1109/ACCESS.2023.3262138

work page doi:10.1109/access.2023.3262138 2023

[47] [47]

Alzoubi, Y., & Gill, A. (2021). The Critical Communication Challenges Between Geographically Distributed Agile Development Teams: Empirical Findings. IEEE Transactions on Professional Communication, 64(4), 322 - 337

work page 2021

[48] [48]

Gupta, S. (2025). The Rise of Serverless AI: Transforming Machine Learning Deployment. European Journal of Computer Science and Information Technology, 13(5), 45-67

work page 2025