arxiv: 2604.14204 · v1 · submitted 2026-04-03 · 💻 cs.SD · cs.AI· eess.AS

Recognition: 1 theorem link

· Lean Theorem

Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition

Chengling Guo , Yuntao Shou , Tao Meng , Wei Ai , Yun Tan , Keqin Li

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:06 UTC · model grok-4.3

classification 💻 cs.SD cs.AIeess.AS

keywords conversational emotion recognitionmultimodal fusionfeature disentanglementgraph neural networksFourier graphhypergraph modelingspeaker interactions

0 comments

The pith

A dual-branch graph framework disentangles shared and unique multimodal features to recognize emotions in conversation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to improve multimodal emotion recognition in conversations by separating features that stay the same across text, audio, and video from those that are unique to each modality. It uses a shared encoder plus modality-specific encoders to create these two spaces, then routes the invariant features through a Fourier graph network for global consistency and the specific features through a speaker-aware hypergraph for high-order interactions. A frequency contrastive loss sharpens the invariant branch while a speaker-consistency constraint keeps the specific branch coherent. The two branches are fused at the end for final utterance-level predictions. If the separation works, the method should reduce redundancy and alignment problems that hurt current systems.

Core claim

The central claim is that dual-space feature disentanglement combined with dual-branch graph learning—Fourier graph neural network on modality-invariant representations plus speaker-aware hypergraph on modality-specific representations, with added contrastive and consistency objectives—captures complementary cross-modal patterns more effectively than prior approaches, leading to higher accuracy on standard conversation emotion datasets.

What carries the argument

Dual-branch graph learning with shared and modality-specific encoders that produce disentangled invariant and specific feature spaces, modeled respectively by a Fourier graph neural network and a speaker-aware hypergraph.

Load-bearing premise

Separating modality-invariant and modality-specific representations through shared and specific encoders, together with Fourier modeling and speaker constraints, will reliably extract useful complementary patterns without creating alignment errors or discarding important cues.

What would settle it

Training and testing the full model against an ablated version that removes the shared/specific encoder split or the Fourier branch on the IEMOCAP and MELD datasets; if accuracy does not drop measurably, the disentanglement step is not doing the claimed work.

Figures

Figures reproduced from arXiv: 2604.14204 by Chengling Guo, Keqin Li, Tao Meng, Wei Ai, Yun Tan, Yuntao Shou.

**Figure 1.** Figure 1: Overall architecture of the proposed framework. 3.3 Dual-Space Feature Disentanglement Multimodal inputs encode both shared information across modalities and modalityspecific cues. Directly fusing heterogeneous features can introduce redundancy and weaken discriminative signals. To address this, we propose a dual-space feature disentanglement mechanism that projects each modal feature into a modality-inva… view at source ↗

read the original abstract

Multimodal emotion recognition in conversations aims to infer utterance-level emotions by jointly modeling textual, acoustic, and visual cues within context. Despite recent progress, key challenges remain, including redundant cross-modal information, imperfect semantic alignment, and insufficient modeling of high-order speaker interactions. To address these issues, we propose a framework that combines dual-space feature disentanglement with dual-branch graph learning. A shared encoder and modality-specific encoders are used to separate modality-invariant and modality-specific representations. The invariant features are modeled by a Fourier graph neural network to capture global consistency and complementary patterns, with a frequency-domain contrastive objective to enhance discriminability. In parallel, a speaker-aware hypergraph is constructed over modality-specific features to model high-order interactions, along with a speaker-consistency constraint to maintain coherent semantics. Finally, the two branches are fused for utterance-level emotion prediction. Experiments on IEMOCAP and MELD demonstrate that the proposed method achieves superior performance over strong baselines, validating its effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs Fourier GNNs on disentangled invariant features with a speaker hypergraph on specific features for multimodal conversation emotion recognition, and reports gains on IEMOCAP and MELD, but the abstract leaves the robustness of those gains unclear.

read the letter

The main takeaway is that this work gives a clean way to split modality-invariant and modality-specific cues in conversational emotion recognition, then routes the invariants through a Fourier graph neural network with contrastive loss while sending the specifics through a speaker-aware hypergraph with a consistency constraint before fusing the branches for prediction. That dual-branch split directly targets the usual problems of cross-modal redundancy, alignment gaps, and high-order speaker turns, and the architecture description hangs together without obvious internal contradictions. The combination of frequency-domain modeling for globals and hypergraph modeling for speaker interactions is a fresh technical pairing not already in the cited prior work, so the paper earns credit for putting those pieces together in one framework. Experiments on the standard IEMOCAP and MELD sets show better numbers than strong baselines, which is the expected evidence for this subfield. The soft spot is that the abstract gives no error bars, no statistical tests, no ablation breakdowns, and no exact split details, so it is hard to judge how much the new components actually move the needle versus how much comes from tuning or data handling. If the full paper supplies those checks and they hold, the gains look credible; without them the superiority claim stays provisional. This is aimed at people already working on multimodal dialogue modeling and graph methods for affective computing. A reader who needs a practical next step on disentanglement plus graph learning for conversations will find usable ideas here. The paper shows clear thinking on its own terms and deserves a serious referee to check the missing validation pieces rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a disentangled dual-branch graph learning framework for multimodal conversational emotion recognition. It uses a shared encoder together with modality-specific encoders to separate invariant and specific representations. Invariant features are modeled via a Fourier graph neural network with a frequency-domain contrastive objective, while specific features are processed by a speaker-aware hypergraph equipped with a speaker-consistency constraint. The two branches are fused to produce utterance-level emotion predictions. Experiments on IEMOCAP and MELD report superior performance relative to strong baselines.

Significance. If the reported gains prove robust, the work would offer a concrete architecture for mitigating cross-modal redundancy and modeling high-order speaker interactions in conversational emotion recognition. The explicit separation of invariant and specific streams combined with Fourier-domain graph processing and hypergraph speaker modeling constitutes a coherent technical contribution that could be adopted or extended in subsequent multimodal graph-learning studies.

major comments (2)

[§4 Experiments] §4 Experiments: the superiority claims on IEMOCAP and MELD are presented without error bars, statistical significance tests, or explicit data-split descriptions; these omissions are load-bearing because they prevent verification that the observed margins are reliable rather than artifacts of a single run or particular split.
[§3.2–3.3] §3.2–3.3: the frequency-domain contrastive loss and speaker-consistency constraint are introduced to preserve discriminability and coherence, yet no ablation isolates their individual contributions or quantifies whether disentanglement introduces alignment errors; this directly affects the central claim that the dual-branch design reliably captures complementary patterns.

minor comments (2)

[§3] Notation for the shared encoder output and modality-specific outputs should be introduced once and used consistently in all equations and figures.
[Figures] Figure captions should explicitly state the number of modalities and the exact fusion operation used at inference time.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We agree that the points raised are important for strengthening the empirical rigor and interpretability of our work. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [§4 Experiments] §4 Experiments: the superiority claims on IEMOCAP and MELD are presented without error bars, statistical significance tests, or explicit data-split descriptions; these omissions are load-bearing because they prevent verification that the observed margins are reliable rather than artifacts of a single run or particular split.

Authors: We fully agree that error bars, statistical significance testing, and explicit data-split descriptions are necessary to establish the reliability of the reported gains. In the revised manuscript we will report mean performance and standard deviations over multiple random seeds (e.g., 5 runs), include paired statistical tests (t-tests or Wilcoxon) with p-values against the strongest baselines, and provide a clear description of the train/validation/test splits used for both IEMOCAP and MELD, following the standard protocols in the literature. These additions will directly address the concern that the margins could be artifacts of a single run or split. revision: yes
Referee: [§3.2–3.3] §3.2–3.3: the frequency-domain contrastive loss and speaker-consistency constraint are introduced to preserve discriminability and coherence, yet no ablation isolates their individual contributions or quantifies whether disentanglement introduces alignment errors; this directly affects the central claim that the dual-branch design reliably captures complementary patterns.

Authors: We acknowledge that the current manuscript lacks ablations isolating the frequency-domain contrastive loss and the speaker-consistency constraint, as well as any quantitative assessment of possible alignment errors introduced by disentanglement. In the revision we will add a dedicated ablation study that removes each component individually (and in combination) and reports the resulting performance drops on both datasets. We will also include an analysis of cross-modal alignment quality (e.g., via cosine similarity or mutual information between the invariant and specific streams) to examine whether disentanglement introduces measurable alignment degradation. These experiments will provide direct evidence for the contribution of each design choice and for the reliability of the dual-branch separation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper describes a dual-branch architecture using shared/specific encoders for disentanglement, Fourier GNN on invariant features with contrastive loss, and speaker hypergraph on specific features with consistency constraint, followed by fusion for prediction. All load-bearing steps are architectural choices directly addressing stated challenges (redundancy, alignment, high-order interactions) and are validated via external benchmarks (IEMOCAP, MELD) against baselines. No equations reduce outputs to inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness or ansatz is imported via self-citation. The experimental superiority claim rests on independent evaluation rather than internal redefinition.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim depends on standard assumptions from graph neural networks and contrastive learning plus newly introduced architectural choices whose effectiveness is demonstrated only empirically on two datasets.

free parameters (1)

hyperparameters for encoders and graph layers
Typical deep learning tuning parameters required to achieve reported performance.

axioms (2)

domain assumption Modality-invariant and modality-specific features can be cleanly separated by shared and modality-specific encoders
Invoked in the dual-space disentanglement step.
standard math Fourier graph neural networks capture global consistency and complementary patterns in frequency domain
Used for the invariant branch modeling.

invented entities (1)

speaker-aware hypergraph no independent evidence
purpose: To model high-order speaker interactions over modality-specific features
Newly constructed component without external independent validation.

pith-pipeline@v0.9.0 · 5482 in / 1327 out tokens · 43352 ms · 2026-05-13T19:06:01.922563+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dual-space feature disentanglement with dual-branch graph learning... Fourier graph neural network... speaker-aware hypergraph

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

ACM Transactions on Multimedia Computing, Communications and Applications (2025)

Zhang, S., Liu, J., Jiao, Y., Zhang, Y., Chen, L., Li, K.: A multimodal seman- tic fusion network with cross-modal alignment for multimodal sentiment analysis. ACM Transactions on Multimedia Computing, Communications and Applications (2025)

work page 2025
[2]

Neural Networks p

Shou, Y., Meng, T., Ai, W., Yin, N., Li, K.: Cilf-ciae: Clip-driven image–language fusion for correcting inverse age estimation. Neural Networks p. 108518 (2025)

work page 2025
[3]

arXiv preprint arXiv:2603.26840 (2026)

Shou, Y., Zhou, J., Meng, T., Ai, W., Li, K.: Dual-branch graph domain adaptation for cross-scenario multi-modal emotion recognition. arXiv preprint arXiv:2603.26840 (2026)

work page arXiv 2026
[4]

ACM Trans- actions on Information Systems44(2), 1–48 (2026)

Shou, Y., Meng, T., Ai, W., Fu, F., Yin, N., Li, K.: A comprehensive survey on multi-modal conversational emotion recognition with deep learning. ACM Trans- actions on Information Systems44(2), 1–48 (2026)

work page 2026
[5]

Neurocomputing501, 629–639 (2022)

Shou, Y., Meng, T., Ai, W., Yang, S., Li, K.: Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis. Neurocomputing501, 629–639 (2022)

work page 2022
[6]

Information Fusion112, 102,590 (2024)

Shou, Y., Meng, T., Ai, W., Zhang, F., Yin, N., Li, K.: Adversarial alignment and graph fusion via information bottleneck for multimodal emotion recognition in conversations. Information Fusion112, 102,590 (2024)

work page 2024
[7]

IEEE Transactions on Affective Computing16(2), 1177–1189 (2024)

Shou, Y., Liu, H., Cao, X., Meng, D., Dong, B.: A low-rank matching attention based cross-modal feature fusion method for conversational emotion recognition. IEEE Transactions on Affective Computing16(2), 1177–1189 (2024)

work page 2024
[8]

IEEE Transactions on Artificial Intelligence5(12), 6472–6487 (2024) 14 Chengling Guo et al

Meng, T., Shou, Y., Ai, W., Yin, N., Li, K.: Deep imbalanced learning for mul- timodal emotion recognition in conversations. IEEE Transactions on Artificial Intelligence5(12), 6472–6487 (2024) 14 Chengling Guo et al

work page 2024
[9]

Pattern Recognition158, 110,974 (2025)

Shou, Y., Cao, X., Liu, H., Meng, D.: Masked contrastive graph representation learning for age estimation. Pattern Recognition158, 110,974 (2025)

work page 2025
[10]

Neurocomputing569, 127,109 (2024)

Meng, T., Shou, Y., Ai, W., Du, J., Liu, H., Li, K.: A multi-message passing framework based on heterogeneous graphs in conversational emotion recognition. Neurocomputing569, 127,109 (2024)

work page 2024
[11]

Computer Science Review59, 100,854 (2026)

Shou, Y., Ai, W., Meng, T., Li, K.: Graph diffusion models: A comprehensive survey of methods and applications. Computer Science Review59, 100,854 (2026)

work page 2026
[12]

In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp

Yang, X., Ramesh, P., Chitta, R., Madhvanath, S., Bernal, E.A., Luo, J.: Deep multimodal representation learning from temporal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5447–5455 (2017)

work page 2017
[13]

arXiv preprint arXiv:2407.00119 (2024)

Shou, Y., Ai, W., Du, J., Meng, T., Liu, H., Yin, N.: Efficient long-distance la- tent relation-aware graph neural network for multi-modal emotion recognition in conversations. arXiv preprint arXiv:2407.00119 (2024)

work page arXiv 2024
[14]

IEEE Transactions on Neural Networks and Learning Systems (2025)

Shou, Y., Cao, X., Meng, D.: Spegcl: Self-supervised graph spectrum contrastive learning without positive samples. IEEE Transactions on Neural Networks and Learning Systems (2025)

work page 2025
[15]

Computer Science Review60, 100,893 (2026)

Ai, W., Tan, Y., Shou, Y., Meng, T., Chen, H., He, Z., Li, K.: The paradigm shift: A comprehensive survey on large vision language models for multimodal fake news detection. Computer Science Review60, 100,893 (2026)

work page 2026
[16]

arXiv preprint arXiv:1707.07250 (2017)

Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017)

work page arXiv 2017
[17]

Neural Networks184, 107,094 (2025)

Shou, Y., Lan, H., Cao, X.: Contrastive graph representation learning with adver- sarial cross-view reconstruction and information bottleneck. Neural Networks184, 107,094 (2025)

work page 2025
[18]

In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp

Shou, Y., Meng, T., Ai, W., Li, K.: Revisiting multi-modal emotion learning with broad state space models and probability-guidance fusion. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 509–

work page
[19]

In: Proceedings of the 31st Interna- tional Conference on Computational Linguistics, pp

Shou, Y., Meng, T., Ai, W., Li, K.: Dynamic graph neural ode network for multi- modal emotion recognition in conversation. In: Proceedings of the 31st Interna- tional Conference on Computational Linguistics, pp. 256–268 (2025)

work page 2025
[20]

Infor- mation Fusion110, 102,454 (2024)

Sun, Y., Liu, Z., Sheng, Q.Z., Chu, D., Yu, J., Sun, H.: Similar modality completion- based multimodal sentiment analysis under uncertain missing modalities. Infor- mation Fusion110, 102,454 (2024)

work page 2024
[21]

arXiv preprint arXiv:2401.12987 (2024)

Yun, T., Lim, H., Lee, J., Song, M.: Telme: Teacher-leading multimodal fusion network for emotion recognition in conversation. arXiv preprint arXiv:2401.12987 (2024)

work page arXiv 2024
[22]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp

Li, Y., Wang, Y., Cui, Z.: Decoupled multimodal distilling for emotion recogni- tion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6631–6640 (2023)

work page 2023
[23]

In: Proceedings of the ACM web conference 2022, pp

Feng, S., Jing, B., Zhu, Y., Tong, H.: Adversarial graph contrastive learning with information regularization. In: Proceedings of the ACM web conference 2022, pp. 1362–1371 (2022)

work page 2022
[24]

In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25

Shou, Y., Yao, J., Meng, T., Ai, W., Chen, C., Li, K.: Gsdnet: Revisiting incom- plete multimodality-diffusion emotion recognition from the perspective of graph spectrum. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25. International Joint Conferences on Artificial Intelligence Organization, pp. 618...

work page 2025
[25]

In: 2023 IEEE 29th International Confer- ence on Parallel and Distributed Systems (ICPADS), pp

Shou, Y., Ai, W., Meng, T., Zhang, F., Li, K.: Graphunet: Graph make strong en- coders for remote sensing segmentation. In: 2023 IEEE 29th International Confer- ence on Parallel and Distributed Systems (ICPADS), pp. 2734–2737. IEEE (2023) Hamiltonian Mechanics 15

work page 2023
[26]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Shou, Y., Cao, X., Yan, P., Hui, Q., Zhao, Q., Meng, D.: Graph domain adaptation with dual-branch encoder and two-level alignment for whole slide image-based survival prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19,925–19,935 (2025)

work page 2025
[27]

Multimodal large language models meet multimodal emotion recognition and reasoning: A survey,

Shou, Y., Meng, T., Ai, W., Li, K.: Multimodal large language models meet multimodal emotion recognition and reasoning: A survey. arXiv preprint arXiv:2509.24322 (2025)

work page arXiv 2025
[28]

Deep Graph Infomax

Veliˇ ckovi´ c, P., Fedus, W., Hamilton, W.L., Li` o, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax. arXiv preprint arXiv:1809.10341 (2018)

work page Pith review arXiv 2018
[29]

IEEE Transactions on Knowledge and Data Engineering36(11), 6305–6316 (2024)

Zhang, X., Tan, Q., Huang, X., Li, B.: Graph contrastive learning with personalized augmentation. IEEE Transactions on Knowledge and Data Engineering36(11), 6305–6316 (2024)

work page 2024
[30]

In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp

Qiu, J., Chen, Q., Dong, Y., Zhang, J., Yang, H., Ding, M., Wang, K., Tang, J.: Gcc: Graph contrastive coding for graph neural network pre-training. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1150–1160 (2020)

work page 2020
[31]

arXiv preprint arXiv:1908.01000 (2019)

Sun, F.Y., Hoffmann, J., Verma, V., Tang, J.: Infograph: Unsupervised and semi- supervised graph-level representation learning via mutual information maximiza- tion. arXiv preprint arXiv:1908.01000 (2019)

work page arXiv 1908
[32]

Advances in neural information processing systems33, 5812–5823 (2020)

You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learn- ing with augmentations. Advances in neural information processing systems33, 5812–5823 (2020)

work page 2020
[33]

In: International conference on machine learning, pp

Hassani, K., Khasahmadi, A.H.: Contrastive multi-view representation learning on graphs. In: International conference on machine learning, pp. 4116–4126. PMLR (2020)

work page 2020
[34]

Language resources and evaluation42, 335–359 (2008)

Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S., Narayanan, S.S.: Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation42, 335–359 (2008)

work page 2008
[35]

In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: Meld: A multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL (2019)

work page 2019
[36]

In: Pro- ceedings of the AAAI conference on artificial intelligence, vol

Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gelbukh, A., Cambria, E.: Dialoguernn: An attentive rnn for emotion detection in conversations. In: Pro- ceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 6818–6825 (2019)

work page 2019
[37]

In: Proceedings of the AAAI conference on artificial intelligence, vol

Yang, L., Shen, Y., Mao, Y., Cai, L.: Hybrid curriculum learning for emotion recognition in conversation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 36, pp. 11,595–11,603 (2022)

work page 2022
[38]

Knowledge-Based Systems236, 107,751 (2022)

Ma, H., Wang, J., Lin, H., Pan, X., Zhang, Y., Yang, Z.: A multi-view network for real-time emotion recognition in conversations. Knowledge-Based Systems236, 107,751 (2022)

work page 2022
[39]

arXiv preprint arXiv:2107.06779 (2021)

Hu, J., Liu, Y., Zhao, J., Jin, Q.: Mmgcn: Multimodal fusion via deep graph convolution network for emotion recognition in conversation. arXiv preprint arXiv:2107.06779 (2021)

work page arXiv 2021
[40]

In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp

Hu, D., Hou, X., Wei, L., Jiang, L., Mo, Y.: Mm-dfn: Multimodal dynamic fusion network for emotion recognition in conversations. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7037–7041. IEEE (2022) 16 Chengling Guo et al

work page 2022
[41]

IEEE Trans- actions on affective computing15(1), 130–143 (2023)

Li, J., Wang, X., Lv, G., Zeng, Z.: Ga2mif: Graph and attention based two-stage multi-source information fusion for conversational emotion detection. IEEE Trans- actions on affective computing15(1), 130–143 (2023)

work page 2023
[42]

In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp

Zhang, X., Li, Y.: A cross-modality context fusion and semantic refinement network for emotion recognition in conversation. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13,099–13,110 (2023)

work page 2023