Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis

Chunlei Meng; Hao Zhang; Jiaxuan Lu; Jiekai Wu; Kangan Qian; Rong Fu; Simon Fong; Ziming Wang

arxiv: 2602.16144 · v3 · submitted 2026-02-18 · 💻 cs.CL · cs.LG

Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis

Rong Fu , Ziming Wang , Chunlei Meng , Jiaxuan Lu , Jiekai Wu , Kangan Qian , Hao Zhang , Simon Fong This is my paper

Pith reviewed 2026-05-15 21:50 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords multimodal sentiment analysismodality deletionmachine unlearningprivacy preservationcertifiable deletionmissing data reconstructionrepresentation learning

0 comments

The pith

Missing-by-Design certifies deletion of specific modalities from multimodal sentiment models via targeted parameter updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Missing-by-Design (MBD), a framework that lets multimodal sentiment systems respond to requests for removing particular input types such as audio or video. It first learns embeddings that separate modality properties and trains a generator to reconstruct missing channels while keeping signals useful for the sentiment task. When a deletion request arrives, the method uses saliency to pick influential parameters and applies a calibrated Gaussian update that produces a machine-verifiable certificate of removal. This process supports accurate predictions even when inputs are incomplete. A sympathetic reader would care because it gives a concrete way to honor privacy demands without retraining the entire model each time.

Core claim

MBD establishes that property-aware representation learning combined with generator-based reconstruction and a saliency-driven Gaussian parameter update can produce a machine-verifiable Modality Deletion Certificate confirming removal of modality-specific information, while delivering competitive predictive performance on incomplete inputs and a practical privacy-utility trade-off as an efficient alternative to full retraining.

What carries the argument

The Modality Deletion Certificate generated by saliency-driven candidate selection followed by a calibrated Gaussian update on model parameters, which certifies removal of modality-specific information.

If this is right

Multimodal models maintain strong predictive performance when one or more modalities are missing by using the generator-based reconstruction.
Deletion requests can be fulfilled through targeted parameter changes without requiring full model retraining.
The resulting certificate supplies machine-verifiable proof that modality-specific signals have been removed.
A practical privacy-utility balance is achieved on standard benchmark datasets for sentiment analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the certificate holds under scrutiny, similar surgical updates could be adopted for other multimodal tasks where selective removal of input types is required.
Regulatory frameworks might eventually mandate such certified deletion mechanisms for handling user requests in deployed multimodal systems.
Robustness tests against reconstruction attacks on the updated parameters could be run to check for any undetected residual modality information.

Load-bearing premise

That saliency-driven candidate selection followed by a calibrated Gaussian update produces a machine-verifiable certificate that actually removes all modality-specific information without hidden leakage.

What would settle it

An experiment that trains a separate recovery model on the updated parameters and measures whether it can still predict information from the deleted modality above chance level on held-out test data.

Figures

Figures reproduced from arXiv: 2602.16144 by Chunlei Meng, Hao Zhang, Jiaxuan Lu, Jiekai Wu, Kangan Qian, Rong Fu, Simon Fong, Ziming Wang.

**Figure 2.** Figure 2: Privacy–utility trade-off after certified audio deletion. Plotted curves show binary accuracy (Acc2) together [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Training trajectories for the principal loss terms (averaged across three seeds). [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: t-SNE visualization of reconstructed embeddings (left: without property embedding pathway; right: with [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Cumulative privacy budget under sequential modality deletions. The solid curve shows the theoretical [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: SwiftPrune proxy Lbq versus the true leave-one-out increment ∆Lq. Each point corresponds to a candidate parameter q. The dashed line is y = x and the shaded band indicates ±8% around y = x. Spearman ρ = 0.87 and 98% of points fall inside the ±8% band, which visually confirms that the proxy tracks the true increments closely and does not systematically over-estimate them. A.9 A.13 Proof of the pointwise err… view at source ↗

read the original abstract

As multimodal systems increasingly process sensitive personal data, the ability to selectively revoke specific data modalities has become a critical requirement for privacy compliance and user autonomy. We present Missing-by-Design (MBD), a unified framework for revocable multimodal sentiment analysis that combines structured representation learning with a certifiable parameter-modification pipeline. Revocability is critical in privacy-sensitive applications where users or regulators may request removal of modality-specific information. MBD learns property-aware embeddings and employs generator-based reconstruction to recover missing channels while preserving task-relevant signals. For deletion requests, the framework applies saliency-driven candidate selection and a calibrated Gaussian update to produce a machine-verifiable Modality Deletion Certificate. Experiments on benchmark datasets show that MBD achieves strong predictive performance under incomplete inputs and delivers a practical privacy-utility trade-off, positioning surgical unlearning as an efficient alternative to full retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MBD packages saliency selection with a calibrated Gaussian update into a deletion pipeline for multimodal sentiment, but the certificate claim rests on an unshown guarantee that cross-modal leakage is eliminated.

read the letter

The main point is that this paper gives a concrete way to surgically remove one modality from a trained multimodal sentiment model and attach a claimed machine-verifiable certificate to the change. It does this by first learning property-aware embeddings, using generators to reconstruct missing channels, then applying saliency to pick parameters and a calibrated Gaussian update to produce the Modality Deletion Certificate. The experiments report that predictive performance stays reasonable under incomplete inputs and that the privacy-utility trade-off looks workable compared with full retraining. That combination of components for this specific task is new enough to notice, and the efficiency argument is straightforward. The soft spot is the certificate itself. The abstract states that the update yields a verifiable certificate with no hidden leakage, yet supplies no derivation, bound on mutual information, or quantitative check that entangled visual-textual signals in sentiment data are actually excised. A saliency-driven Gaussian tweak may leave residual predictive power that the certificate does not detect, which is exactly the load-bearing assumption. The paper is aimed at people working on privacy mechanisms for multimodal systems rather than the broader field. Readers who care about unlearning or compliance requirements will get something concrete from the pipeline description and the reported numbers. The idea is timely and the experimental framing is clear, so it deserves a serious referee even though the certificate soundness needs direct evidence.

Referee Report

3 major / 2 minor

Summary. The paper introduces Missing-by-Design (MBD), a unified framework for revocable multimodal sentiment analysis. It combines property-aware embeddings and generator-based reconstruction to handle missing modalities while preserving task signals, and for deletion requests applies saliency-driven candidate selection followed by a calibrated Gaussian parameter update to generate a machine-verifiable Modality Deletion Certificate. Experiments on benchmark datasets are claimed to show strong predictive performance under incomplete inputs together with a practical privacy-utility trade-off, positioning the method as an efficient surgical-unlearning alternative to full retraining.

Significance. If the Modality Deletion Certificate can be shown to eliminate modality-specific information without residual leakage, the framework would supply a concrete, efficient mechanism for user-driven modality revocation in privacy-sensitive multimodal systems. This would be a meaningful contribution to certifiable unlearning, especially for applications such as sentiment analysis that process entangled personal data. The reported efficiency gains over retraining would be a clear practical advantage once the soundness claims are substantiated.

major comments (3)

Abstract: The assertion that the calibrated Gaussian update produces a 'machine-verifiable Modality Deletion Certificate' is unsupported by any equations, proof sketch, or bound on residual mutual information; without such a derivation the certifiability claim cannot be evaluated.
Method section (saliency-driven candidate selection and Gaussian update): No argument is supplied showing that the update eliminates cross-modal correlations typical in sentiment analysis; the procedure may leave predictive information in the retained embedding space that the certificate does not detect.
Experiments section: No quantitative results on certificate soundness (e.g., post-deletion mutual-information estimates, modality-specific probe accuracy, or leakage metrics) are reported, leaving the central privacy guarantee unverified.

minor comments (2)

Abstract: The acronym 'MBD' is introduced without an explicit expansion on first use.
Notation: The term 'property-aware embeddings' is used without a formal definition or reference to the precise loss terms that enforce the property.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which help us improve the clarity and rigor of our work on Missing-by-Design. We address each major comment below, proposing revisions to substantiate the certifiability claims.

read point-by-point responses

Referee: Abstract: The assertion that the calibrated Gaussian update produces a 'machine-verifiable Modality Deletion Certificate' is unsupported by any equations, proof sketch, or bound on residual mutual information; without such a derivation the certifiability claim cannot be evaluated.

Authors: We agree that the abstract's claim requires stronger theoretical support. In the revised manuscript, we will expand the Method section with a formal derivation of the Modality Deletion Certificate, including a proof sketch that the calibrated Gaussian update bounds the residual mutual information between the deleted modality and the model parameters to a negligible level, thereby making the certificate machine-verifiable through verification of the update parameters. revision: yes
Referee: Method section (saliency-driven candidate selection and Gaussian update): No argument is supplied showing that the update eliminates cross-modal correlations typical in sentiment analysis; the procedure may leave predictive information in the retained embedding space that the certificate does not detect.

Authors: The saliency-driven candidate selection identifies parameters with high influence on modality-specific predictions, and the subsequent Gaussian update is designed to perturb these parameters in a way that disrupts cross-modal correlations. While we believe the procedure achieves this based on the property-aware embeddings, we acknowledge the lack of an explicit argument. We will add a theoretical analysis in the revision demonstrating that the update reduces cross-modal mutual information, with the certificate serving as verification of this reduction. revision: partial
Referee: Experiments section: No quantitative results on certificate soundness (e.g., post-deletion mutual-information estimates, modality-specific probe accuracy, or leakage metrics) are reported, leaving the central privacy guarantee unverified.

Authors: We concur that empirical evidence for the certificate's soundness is essential. The current experiments focus on predictive performance and efficiency, but we will include additional results in the revised version, such as post-deletion mutual information estimates between modalities and probe classifier accuracies for modality-specific information, to quantify the leakage and validate the privacy guarantees. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the MBD derivation chain

full rationale

The paper describes a framework that learns property-aware embeddings, uses generator-based reconstruction for missing channels, and applies saliency-driven candidate selection plus calibrated Gaussian update to generate a Modality Deletion Certificate. No equations or steps in the provided description reduce the certificate or the claimed removal of modality-specific information to a quantity defined by the same fitted parameters or by self-citation chains that bear the central load. The privacy-utility claims rest on experimental results on benchmark datasets rather than self-referential definitions or imported uniqueness theorems, leaving the derivation self-contained against external validation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework assumes standard properties of Gaussian perturbations and saliency maps; no new physical constants or ad-hoc entities beyond the certificate itself are introduced in the abstract.

free parameters (1)

Gaussian calibration parameter
The abstract mentions a 'calibrated Gaussian update' whose scale must be chosen to balance deletion strength and task performance.

axioms (1)

domain assumption Saliency scores accurately identify parameters carrying modality-specific information
Invoked when selecting candidates for the deletion update.

invented entities (1)

Modality Deletion Certificate no independent evidence
purpose: Machine-verifiable proof that a chosen modality has been removed
New ledger entry whose validity is asserted but not derived in the abstract.

pith-pipeline@v0.9.0 · 5467 in / 1274 out tokens · 13895 ms · 2026-05-15T21:50:06.789945+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer
cs.CL 2026-05 unverdicted novelty 5.0

EGAD adaptively distills LLM knowledge at the token level by using entropy to create a curriculum from low- to high-entropy tokens, adjust temperature, and switch between logits-only and feature-based branches.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper

[1]

A systematic literature review on incomplete multimodal learning: techniques and challenges.Systems Science & Control Engineering, 13(1): 2467083, 2025

Yifan Zhan, Rui Yang, Junxian You, Mengjie Huang, Weibo Liu, and Xiaohui Liu. A systematic literature review on incomplete multimodal learning: techniques and challenges.Systems Science & Control Engineering, 13(1): 2467083, 2025

work page 2025
[2]

Found in translation: Learning robust joint representations by cyclic translations between modalities

Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, and Barnabás Póczos. Found in translation: Learning robust joint representations by cyclic translations between modalities. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 6892–6899, 2019

work page 2019
[3]

Multimodal and multi-view models for emotion recognition

Gustavo Aguilar, Viktor Rozgic, Weiran Wang, and Chao Wang. Multimodal and multi-view models for emotion recognition. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 991–1002, 2019

work page 2019
[4]

Enhancing sentence representation with visually-supervised multimodal pre-training

Zhe Li, Laurence T Yang, Xin Nie, BoCheng Ren, and Xianjun Deng. Enhancing sentence representation with visually-supervised multimodal pre-training. InProceedings of the 31st ACM International Conference on Multimedia, pages 5686–5695, 2023

work page 2023
[5]

Incomplete multimodality-diffused emotion recognition.Advances in Neural Information Processing Systems, 36:17117–17128, 2023

Yuanzhi Wang, Yong Li, and Zhen Cui. Incomplete multimodality-diffused emotion recognition.Advances in Neural Information Processing Systems, 36:17117–17128, 2023

work page 2023
[6]

Gcnet: Graph completion network for incomplete multimodal learning in conversation.IEEE Transactions on pattern analysis and machine intelligence, 45(7): 8419–8432, 2023

Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation.IEEE Transactions on pattern analysis and machine intelligence, 45(7): 8419–8432, 2023

work page 2023
[7]

Grmi: Graph representation learning of multimodal data with incompleteness

Xian Xu, Xiao Xu, Xiang Li, and Guotong Xie. Grmi: Graph representation learning of multimodal data with incompleteness. InInternational Conference on Database Systems for Advanced Applications, pages 286–296. Springer, 2023

work page 2023
[8]

Ada2i: Enhancing modality balance for multimodal conversational emotion recognition

Cam-Van Thi Nguyen, The-Son Le, Anh-Tuan Mai, and Duc-Trong Le. Ada2i: Enhancing modality balance for multimodal conversational emotion recognition. InProceedings of the 32nd ACM International Conference on Multimedia, pages 9330–9339, 2024

work page 2024
[9]

Patient-centered and practical privacy to support ai for healthcare

Ruixuan Liu, Hong Kyu Lee, Sivasubramanium V Bhavani, Xiaoqian Jiang, Lucila Ohno-Machado, and Li Xiong. Patient-centered and practical privacy to support ai for healthcare. In2024 IEEE 6th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA), pages 265–272. IEEE, 2024

work page 2024
[10]

A survey on security and privacy of large multimodal deep learning models: Teaching and learning perspective

Md Abdur Rahman, Lamyaa Alqahtani, Amna Albooq, and Alaa Ainousah. A survey on security and privacy of large multimodal deep learning models: Teaching and learning perspective. In2024 21st Learning and Technology Conference (L&T), pages 13–18. IEEE, 2024

work page 2024
[11]

Privacy protection in deep multi-modal retrieval

Peng-Fei Zhang, Yang Li, Zi Huang, and Hongzhi Yin. Privacy protection in deep multi-modal retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 634–643, 2021

work page 2021
[12]

Affective computing and emotional data: Challenges and implications in privacy regulations, the ai act, and ethics in large language models.arXiv preprint arXiv:2509.20153, 2025

Nicola Fabiano. Affective computing and emotional data: Challenges and implications in privacy regulations, the ai act, and ethics in large language models.arXiv preprint arXiv:2509.20153, 2025

work page arXiv 2025
[13]

Unimse: Towards unified multimodal sentiment analysis and emotion recognition.arXiv preprint arXiv:2211.11256, 2022

Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, and Yongbin Li. Unimse: Towards unified multimodal sentiment analysis and emotion recognition.arXiv preprint arXiv:2211.11256, 2022

work page arXiv 2022
[14]

Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning.Ieee Access, 11: 14742–14751, 2023

Hoai-Duy Le, Guee-Sang Lee, Soo-Hyung Kim, Seungwon Kim, and Hyung-Jeong Yang. Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning.Ieee Access, 11: 14742–14751, 2023

work page 2023
[15]

Memo- cmt: multimodal emotion recognition using cross-modal transformer-based feature fusion.Scientific reports, 15 (1):5473, 2025

Mustaqeem Khan, Phuong-Nam Tran, Nhat Truong Pham, Abdulmotaleb El Saddik, and Alice Othmani. Memo- cmt: multimodal emotion recognition using cross-modal transformer-based feature fusion.Scientific reports, 15 (1):5473, 2025. 13 Missing-by-Design

work page 2025
[16]

Cross-modal fine-grained alignment and fusion network for multimodal aspect-based sentiment analysis.Information Processing & Management, 60 (6):103508, 2023

Luwei Xiao, Xingjiao Wu, Shuwen Yang, Junjie Xu, Jie Zhou, and Liang He. Cross-modal fine-grained alignment and fusion network for multimodal aspect-based sentiment analysis.Information Processing & Management, 60 (6):103508, 2023

work page 2023
[17]

Pmr: Prototypical modal rebalance for multimodal learning

Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junxiao Wang, and Song Guo. Pmr: Prototypical modal rebalance for multimodal learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20029–20038, 2023

work page 2023
[18]

Enhanced experts with uncertainty- aware routing for multimodal sentiment analysis

Zixian Gao, Disen Hu, Xun Jiang, Huimin Lu, Heng Tao Shen, and Xing Xu. Enhanced experts with uncertainty- aware routing for multimodal sentiment analysis. InProceedings of the 32nd ACM International Conference on Multimedia, pages 9650–9659, 2024

work page 2024
[19]

Tmdc: A two-stage modality denoising and complementation framework for multimodal sentiment analysis with missing and noisy modalities.arXiv preprint arXiv:2511.10325, 2025

Yan Zhuang, Minhao Liu, Yanru Zhang, Jiawen Deng, and Fuji Ren. Tmdc: A two-stage modality denoising and complementation framework for multimodal sentiment analysis with missing and noisy modalities.arXiv preprint arXiv:2511.10325, 2025

work page arXiv 2025
[20]

Msaf-cf: A multimodal sentiment analysis framework based on feature enhancement and cross-fusion.IEEE Access, 2025

Zhongliang Wei, Ruofan Chen, and Jing Sun. Msaf-cf: A multimodal sentiment analysis framework based on feature enhancement and cross-fusion.IEEE Access, 2025

work page 2025
[21]

Meta-learning for incomplete multimodal sentiment analysis

Geng Tu, Tianhao Wu, Xuan Luo, Xi Zeng, Wenjie Li, and Ruifeng Xu. Meta-learning for incomplete multimodal sentiment analysis. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2911–2915, 2025

work page 2025
[22]

Proxy-driven robust multimodal sentiment analysis with incomplete data

Aoqiang Zhu, Min Hu, Xiaohua Wang, Jiaoyun Yang, Yiming Tang, and Ning An. Proxy-driven robust multimodal sentiment analysis with incomplete data. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22123–22138, 2025

work page 2025
[23]

A multimodal fusion network for student emotion recognition based on transformer and tensor product

Ao Xiang, Zongqing Qi, Han Wang, Qin Yang, and Danqing Ma. A multimodal fusion network for student emotion recognition based on transformer and tensor product. In2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE), pages 1–4. IEEE, 2024

work page 2024
[24]

Learning from the global view: Supervised contrastive learning of multimodal representation.Information Fusion, 100:101920, 2023

Sijie Mai, Ying Zeng, and Haifeng Hu. Learning from the global view: Supervised contrastive learning of multimodal representation.Information Fusion, 100:101920, 2023

work page 2023
[25]

Confede: Contrastive feature decomposition for multimodal sentiment analysis

Jiuding Yang, Yakun Yu, Di Niu, Weidong Guo, and Yu Xu. Confede: Contrastive feature decomposition for multimodal sentiment analysis. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7617–7630, 2023

work page 2023
[26]

Disentanglement translation network for multimodal sentiment analysis.Information Fusion, 102:102031, 2024

Ying Zeng, Wenjun Yan, Sijie Mai, and Haifeng Hu. Disentanglement translation network for multimodal sentiment analysis.Information Fusion, 102:102031, 2024

work page 2024
[27]

Rui Liu, Haolin Zuo, Zheng Lian, Björn W Schuller, and Haizhou Li. Contrastive learning based modality-invariant feature acquisition for robust multimodal emotion recognition with missing modalities.IEEE Transactions on Affective Computing, 15(4):1856–1873, 2024

work page 2024
[28]

Multimodal sentiment analysis with unimodal label generation and modality decomposition.Information Fusion, 116:102787, 2025

Linan Zhu, Hongyan Zhao, Zhechao Zhu, Chenwei Zhang, and Xiangjie Kong. Multimodal sentiment analysis with unimodal label generation and modality decomposition.Information Fusion, 116:102787, 2025

work page 2025
[29]

Hessian-Free Online Certified Unlearn- ing, February 2025

Xinbao Qiao, Meng Zhang, Ming Tang, and Ermin Wei. Hessian-free online certified unlearning.arXiv preprint arXiv:2404.01712, 2024

work page arXiv 2024
[30]

Single image unlearning: Efficient machine unlearning in multimodal large language models.Advances in Neural Information Processing Systems, 37:35414–35453, 2024

Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi, and Fan Liu. Single image unlearning: Efficient machine unlearning in multimodal large language models.Advances in Neural Information Processing Systems, 37:35414–35453, 2024

work page 2024
[31]

Multidelete for multimodal machine unlearning

Jiali Cheng and Hadi Amiri. Multidelete for multimodal machine unlearning. InEuropean Conference on Computer Vision, pages 165–184. Springer, 2024

work page 2024
[32]

Certified minimax unlearning with generalization rates and deletion capacity.Advances in Neural Information Processing Systems, 36:62821–62852, 2023

Jiaqi Liu, Jian Lou, Zhan Qin, and Kui Ren. Certified minimax unlearning with generalization rates and deletion capacity.Advances in Neural Information Processing Systems, 36:62821–62852, 2023

work page 2023
[33]

Gaussian certified unlearning in high dimensions: A hypothesis testing approach.arXiv preprint arXiv:2510.13094, 2025

Aaradhya Pandey, Arnab Auddy, Haolin Zou, Arian Maleki, and Sanjeev Kulkarni. Gaussian certified unlearning in high dimensions: A hypothesis testing approach.arXiv preprint arXiv:2510.13094, 2025. 14 Missing-by-Design

work page arXiv 2025
[34]

Modality-aware neuron pruning for unlearning in multimodal large language models.arXiv preprint arXiv:2502.15910, 2025

Zheyuan Liu, Guangyao Dou, Xiangchi Yuan, Chunhui Zhang, Zhaoxuan Tan, and Meng Jiang. Modality-aware neuron pruning for unlearning in multimodal large language models.arXiv preprint arXiv:2502.15910, 2025

work page arXiv 2025
[35]

Protecting privacy in multimodal large language models with mllmu-bench

Zheyuan Liu, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng, Yongle Yuan, and Meng Jiang. Protecting privacy in multimodal large language models with mllmu-bench. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4...

work page 2025
[36]

Practical membership inference attacks against large-scale multi-modal models: A pilot study

Myeongseob Ko, Ming Jin, Chenguang Wang, and Ruoxi Jia. Practical membership inference attacks against large-scale multi-modal models: A pilot study. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4871–4881, 2023

work page 2023
[37]

Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

Lu Wang, Tianyuan Zhang, Yang Qu, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu, and Dacheng Tao. Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

work page arXiv 2025
[38]

Can textual unlearning solve cross-modality safety alignment? InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 9830–9844, 2024

Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael B Abu-Ghazaleh, M Salman Asif, Yue Dong, Amit Roy-Chowdhury, and Chengyu Song. Can textual unlearning solve cross-modality safety alignment? InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 9830–9844, 2024

work page 2024
[39]

Towards benign memory forgetting for selective multimodal large language model unlearning.arXiv preprint arXiv:2511.20196, 2025

Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees GM Snoek, and Meng Wang. Towards benign memory forgetting for selective multimodal large language model unlearning.arXiv preprint arXiv:2511.20196, 2025

work page arXiv 2025
[40]

User-controlled privacy: Taint, track, and control.Proceedings on Privacy Enhancing Technologies, 2024

François Hublet, David Basin, and Sr ¯dan Krsti´c. User-controlled privacy: Taint, track, and control.Proceedings on Privacy Enhancing Technologies, 2024

work page 2024
[41]

Cross-modal privacy-preserving synthesis and mixture-of-experts ensemble for robust asd prediction.Frontiers in Neuroinformatics, 19:1679196, 2025

J Revathy and Karthiga M. Cross-modal privacy-preserving synthesis and mixture-of-experts ensemble for robust asd prediction.Frontiers in Neuroinformatics, 19:1679196, 2025

work page 2025
[42]

Privacy-preserving multimodal sentiment analysis.IEEE Internet of Things Journal, 2025

Honghui Xu, Wei Li, Daniel Takabi, Daehee Seo, and Zhipeng Cai. Privacy-preserving multimodal sentiment analysis.IEEE Internet of Things Journal, 2025

work page 2025
[43]

Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages.IEEE Intelligent Systems, 31(6):82–88, 2016

Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages.IEEE Intelligent Systems, 31(6):82–88, 2016

work page 2016
[44]

Memory fusion network for multi-view sequential learning

Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. Memory fusion network for multi-view sequential learning. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[45]

Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis.IEEE Transactions on Affective Computing, 14(3):2276–2289, 2022

Sijie Mai, Ying Zeng, Shuangjia Zheng, and Haifeng Hu. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis.IEEE Transactions on Affective Computing, 14(3):2276–2289, 2022

work page 2022
[46]

Hydiscgan: A hybrid distributed cgan for audio-visual privacy preservation in multimodal sentiment analysis.arXiv preprint arXiv:2404.11938, 2024

Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, and Liang Hu. Hydiscgan: A hybrid distributed cgan for audio-visual privacy preservation in multimodal sentiment analysis.arXiv preprint arXiv:2404.11938, 2024

work page arXiv 2024
[47]

Clgsi: a multimodal sentiment analysis framework based on contrastive learning guided by sentiment intensity

Yang Yang, Xunde Dong, and Yupeng Qiang. Clgsi: a multimodal sentiment analysis framework based on contrastive learning guided by sentiment intensity. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 2099–2110, 2024

work page 2024
[48]

Dlf: Disentangled-language-focused multimodal sentiment analysis

Pan Wang, Qiang Zhou, Yawen Wu, Tianlong Chen, and Jingtong Hu. Dlf: Disentangled-language-focused multimodal sentiment analysis. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 21180–21188, 2025

work page 2025
[49]

Pamoe-msa: polarity-aware mixture of experts network for multimodal sentiment analysis.International Journal of Multimedia Information Retrieval, 14(1):1–16, 2025

Changqin Huang, Zhenheng Lin, Zhongmei Han, Qionghao Huang, Fan Jiang, and Xiaodi Huang. Pamoe-msa: polarity-aware mixture of experts network for multimodal sentiment analysis.International Journal of Multimedia Information Retrieval, 14(1):1–16, 2025

work page 2025
[50]

Msamba: Exploring multimodal sentiment analysis with state space models

Xilin He, Haijian Liang, Boyi Peng, Weicheng Xie, Muhammad Haris Khan, Siyang Song, and Zitong Yu. Msamba: Exploring multimodal sentiment analysis with state space models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 1309–1317, 2025

work page 2025
[51]

Iemocap: Interactive emotional dyadic motion capture database

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335–359, 2008. 15 Missing-by-Design

work page 2008
[52]

Two-stage finetuning of wav2vec 2.0 for speech emotion recognition with asr and gender pretraining

Yuan Gao, Chenhui Chu, and Tatsuya Kawahara. Two-stage finetuning of wav2vec 2.0 for speech emotion recognition with asr and gender pretraining. InProc. Interspeech, pages 3637–3641, 2023

work page 2023
[53]

Learning robust self-attention features for speech emotion recognition with label-adaptive mixup

Lei Kang, Lichao Zhang, and Dazhi Jiang. Learning robust self-attention features for speech emotion recognition with label-adaptive mixup. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023

work page 2023
[54]

Improving speech emotion recognition with unsupervised speaking style transfer

Leyuan Qu, Wei Wang, Cornelius Weber, Pengcheng Yue, Taihao Li, and Stefan Wermter. Improving speech emotion recognition with unsupervised speaking style transfer. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 10101–10105. IEEE, 2024

work page 2024
[55]

Leveraging knowledge of modality experts for incomplete multimodal learning

Wenxin Xu, Hexin Jiang, and Xuefeng Liang. Leveraging knowledge of modality experts for incomplete multimodal learning. InProceedings of the 32nd ACM International Conference on Multimedia, pages 438–446, 2024

work page 2024
[56]

Apin: Amplitude-and phase-aware interaction network for speech emotion recognition.Speech Communication, 169:103201, 2025

Lili Guo, Jie Li, Shifei Ding, and Jianwu Dang. Apin: Amplitude-and phase-aware interaction network for speech emotion recognition.Speech Communication, 169:103201, 2025

work page 2025
[57]

Individual-aware attention modulation for unseen speaker emotion recognition.IEEE Transactions on Affective Computing, 2024

Yuanbo Fang, Xiaofen Xing, Zhaojie Chu, Yifeng Du, and Xiangmin Xu. Individual-aware attention modulation for unseen speaker emotion recognition.IEEE Transactions on Affective Computing, 2024

work page 2024
[58]

Gatem 2 former: Gated feature selection and expert modeling in multimodal emotion recognition

Weixiang Xu, Zhongren Dong, Runming Wang, Xinzhou Xu, and Zixing Zhang. Gatem 2 former: Gated feature selection and expert modeling in multimodal emotion recognition. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

work page 2025
[59]

Seenet: A soft emotion expert and data augmentation method to enhance speech emotion recognition.IEEE Transactions on Affective Computing, 2025

Qifei Li, Yingming Gao, Yuhua Wen, Ziping Zhao, Ya Li, and Björn W Schuller. Seenet: A soft emotion expert and data augmentation method to enhance speech emotion recognition.IEEE Transactions on Affective Computing, 2025

work page 2025
[60]

Towards robust multimodal sentiment analysis with incomplete data.Advances in Neural Information Processing Systems, 37:55943–55974, 2024

Haoyu Zhang, Wenbin Wang, and Tianshu Yu. Towards robust multimodal sentiment analysis with incomplete data.Advances in Neural Information Processing Systems, 37:55943–55974, 2024

work page 2024
[61]

adjacent

Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. Cider: Consensus-based image description evaluation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575, 2015. A Proofs and calibration details This appendix collects the full derivation of the DP-like indistinguishability bound used in the paper, supp...

work page 2015

[1] [1]

A systematic literature review on incomplete multimodal learning: techniques and challenges.Systems Science & Control Engineering, 13(1): 2467083, 2025

Yifan Zhan, Rui Yang, Junxian You, Mengjie Huang, Weibo Liu, and Xiaohui Liu. A systematic literature review on incomplete multimodal learning: techniques and challenges.Systems Science & Control Engineering, 13(1): 2467083, 2025

work page 2025

[2] [2]

Found in translation: Learning robust joint representations by cyclic translations between modalities

Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, and Barnabás Póczos. Found in translation: Learning robust joint representations by cyclic translations between modalities. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 6892–6899, 2019

work page 2019

[3] [3]

Multimodal and multi-view models for emotion recognition

Gustavo Aguilar, Viktor Rozgic, Weiran Wang, and Chao Wang. Multimodal and multi-view models for emotion recognition. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 991–1002, 2019

work page 2019

[4] [4]

Enhancing sentence representation with visually-supervised multimodal pre-training

Zhe Li, Laurence T Yang, Xin Nie, BoCheng Ren, and Xianjun Deng. Enhancing sentence representation with visually-supervised multimodal pre-training. InProceedings of the 31st ACM International Conference on Multimedia, pages 5686–5695, 2023

work page 2023

[5] [5]

Incomplete multimodality-diffused emotion recognition.Advances in Neural Information Processing Systems, 36:17117–17128, 2023

Yuanzhi Wang, Yong Li, and Zhen Cui. Incomplete multimodality-diffused emotion recognition.Advances in Neural Information Processing Systems, 36:17117–17128, 2023

work page 2023

[6] [6]

Gcnet: Graph completion network for incomplete multimodal learning in conversation.IEEE Transactions on pattern analysis and machine intelligence, 45(7): 8419–8432, 2023

Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation.IEEE Transactions on pattern analysis and machine intelligence, 45(7): 8419–8432, 2023

work page 2023

[7] [7]

Grmi: Graph representation learning of multimodal data with incompleteness

Xian Xu, Xiao Xu, Xiang Li, and Guotong Xie. Grmi: Graph representation learning of multimodal data with incompleteness. InInternational Conference on Database Systems for Advanced Applications, pages 286–296. Springer, 2023

work page 2023

[8] [8]

Ada2i: Enhancing modality balance for multimodal conversational emotion recognition

Cam-Van Thi Nguyen, The-Son Le, Anh-Tuan Mai, and Duc-Trong Le. Ada2i: Enhancing modality balance for multimodal conversational emotion recognition. InProceedings of the 32nd ACM International Conference on Multimedia, pages 9330–9339, 2024

work page 2024

[9] [9]

Patient-centered and practical privacy to support ai for healthcare

Ruixuan Liu, Hong Kyu Lee, Sivasubramanium V Bhavani, Xiaoqian Jiang, Lucila Ohno-Machado, and Li Xiong. Patient-centered and practical privacy to support ai for healthcare. In2024 IEEE 6th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA), pages 265–272. IEEE, 2024

work page 2024

[10] [10]

A survey on security and privacy of large multimodal deep learning models: Teaching and learning perspective

Md Abdur Rahman, Lamyaa Alqahtani, Amna Albooq, and Alaa Ainousah. A survey on security and privacy of large multimodal deep learning models: Teaching and learning perspective. In2024 21st Learning and Technology Conference (L&T), pages 13–18. IEEE, 2024

work page 2024

[11] [11]

Privacy protection in deep multi-modal retrieval

Peng-Fei Zhang, Yang Li, Zi Huang, and Hongzhi Yin. Privacy protection in deep multi-modal retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 634–643, 2021

work page 2021

[12] [12]

Affective computing and emotional data: Challenges and implications in privacy regulations, the ai act, and ethics in large language models.arXiv preprint arXiv:2509.20153, 2025

Nicola Fabiano. Affective computing and emotional data: Challenges and implications in privacy regulations, the ai act, and ethics in large language models.arXiv preprint arXiv:2509.20153, 2025

work page arXiv 2025

[13] [13]

Unimse: Towards unified multimodal sentiment analysis and emotion recognition.arXiv preprint arXiv:2211.11256, 2022

Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, and Yongbin Li. Unimse: Towards unified multimodal sentiment analysis and emotion recognition.arXiv preprint arXiv:2211.11256, 2022

work page arXiv 2022

[14] [14]

Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning.Ieee Access, 11: 14742–14751, 2023

Hoai-Duy Le, Guee-Sang Lee, Soo-Hyung Kim, Seungwon Kim, and Hyung-Jeong Yang. Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning.Ieee Access, 11: 14742–14751, 2023

work page 2023

[15] [15]

Memo- cmt: multimodal emotion recognition using cross-modal transformer-based feature fusion.Scientific reports, 15 (1):5473, 2025

Mustaqeem Khan, Phuong-Nam Tran, Nhat Truong Pham, Abdulmotaleb El Saddik, and Alice Othmani. Memo- cmt: multimodal emotion recognition using cross-modal transformer-based feature fusion.Scientific reports, 15 (1):5473, 2025. 13 Missing-by-Design

work page 2025

[16] [16]

Cross-modal fine-grained alignment and fusion network for multimodal aspect-based sentiment analysis.Information Processing & Management, 60 (6):103508, 2023

Luwei Xiao, Xingjiao Wu, Shuwen Yang, Junjie Xu, Jie Zhou, and Liang He. Cross-modal fine-grained alignment and fusion network for multimodal aspect-based sentiment analysis.Information Processing & Management, 60 (6):103508, 2023

work page 2023

[17] [17]

Pmr: Prototypical modal rebalance for multimodal learning

Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junxiao Wang, and Song Guo. Pmr: Prototypical modal rebalance for multimodal learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20029–20038, 2023

work page 2023

[18] [18]

Enhanced experts with uncertainty- aware routing for multimodal sentiment analysis

Zixian Gao, Disen Hu, Xun Jiang, Huimin Lu, Heng Tao Shen, and Xing Xu. Enhanced experts with uncertainty- aware routing for multimodal sentiment analysis. InProceedings of the 32nd ACM International Conference on Multimedia, pages 9650–9659, 2024

work page 2024

[19] [19]

Tmdc: A two-stage modality denoising and complementation framework for multimodal sentiment analysis with missing and noisy modalities.arXiv preprint arXiv:2511.10325, 2025

Yan Zhuang, Minhao Liu, Yanru Zhang, Jiawen Deng, and Fuji Ren. Tmdc: A two-stage modality denoising and complementation framework for multimodal sentiment analysis with missing and noisy modalities.arXiv preprint arXiv:2511.10325, 2025

work page arXiv 2025

[20] [20]

Msaf-cf: A multimodal sentiment analysis framework based on feature enhancement and cross-fusion.IEEE Access, 2025

Zhongliang Wei, Ruofan Chen, and Jing Sun. Msaf-cf: A multimodal sentiment analysis framework based on feature enhancement and cross-fusion.IEEE Access, 2025

work page 2025

[21] [21]

Meta-learning for incomplete multimodal sentiment analysis

Geng Tu, Tianhao Wu, Xuan Luo, Xi Zeng, Wenjie Li, and Ruifeng Xu. Meta-learning for incomplete multimodal sentiment analysis. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2911–2915, 2025

work page 2025

[22] [22]

Proxy-driven robust multimodal sentiment analysis with incomplete data

Aoqiang Zhu, Min Hu, Xiaohua Wang, Jiaoyun Yang, Yiming Tang, and Ning An. Proxy-driven robust multimodal sentiment analysis with incomplete data. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22123–22138, 2025

work page 2025

[23] [23]

A multimodal fusion network for student emotion recognition based on transformer and tensor product

Ao Xiang, Zongqing Qi, Han Wang, Qin Yang, and Danqing Ma. A multimodal fusion network for student emotion recognition based on transformer and tensor product. In2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE), pages 1–4. IEEE, 2024

work page 2024

[24] [24]

Learning from the global view: Supervised contrastive learning of multimodal representation.Information Fusion, 100:101920, 2023

Sijie Mai, Ying Zeng, and Haifeng Hu. Learning from the global view: Supervised contrastive learning of multimodal representation.Information Fusion, 100:101920, 2023

work page 2023

[25] [25]

Confede: Contrastive feature decomposition for multimodal sentiment analysis

Jiuding Yang, Yakun Yu, Di Niu, Weidong Guo, and Yu Xu. Confede: Contrastive feature decomposition for multimodal sentiment analysis. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7617–7630, 2023

work page 2023

[26] [26]

Disentanglement translation network for multimodal sentiment analysis.Information Fusion, 102:102031, 2024

Ying Zeng, Wenjun Yan, Sijie Mai, and Haifeng Hu. Disentanglement translation network for multimodal sentiment analysis.Information Fusion, 102:102031, 2024

work page 2024

[27] [27]

Rui Liu, Haolin Zuo, Zheng Lian, Björn W Schuller, and Haizhou Li. Contrastive learning based modality-invariant feature acquisition for robust multimodal emotion recognition with missing modalities.IEEE Transactions on Affective Computing, 15(4):1856–1873, 2024

work page 2024

[28] [28]

Multimodal sentiment analysis with unimodal label generation and modality decomposition.Information Fusion, 116:102787, 2025

Linan Zhu, Hongyan Zhao, Zhechao Zhu, Chenwei Zhang, and Xiangjie Kong. Multimodal sentiment analysis with unimodal label generation and modality decomposition.Information Fusion, 116:102787, 2025

work page 2025

[29] [29]

Hessian-Free Online Certified Unlearn- ing, February 2025

Xinbao Qiao, Meng Zhang, Ming Tang, and Ermin Wei. Hessian-free online certified unlearning.arXiv preprint arXiv:2404.01712, 2024

work page arXiv 2024

[30] [30]

Single image unlearning: Efficient machine unlearning in multimodal large language models.Advances in Neural Information Processing Systems, 37:35414–35453, 2024

Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi, and Fan Liu. Single image unlearning: Efficient machine unlearning in multimodal large language models.Advances in Neural Information Processing Systems, 37:35414–35453, 2024

work page 2024

[31] [31]

Multidelete for multimodal machine unlearning

Jiali Cheng and Hadi Amiri. Multidelete for multimodal machine unlearning. InEuropean Conference on Computer Vision, pages 165–184. Springer, 2024

work page 2024

[32] [32]

Certified minimax unlearning with generalization rates and deletion capacity.Advances in Neural Information Processing Systems, 36:62821–62852, 2023

Jiaqi Liu, Jian Lou, Zhan Qin, and Kui Ren. Certified minimax unlearning with generalization rates and deletion capacity.Advances in Neural Information Processing Systems, 36:62821–62852, 2023

work page 2023

[33] [33]

Gaussian certified unlearning in high dimensions: A hypothesis testing approach.arXiv preprint arXiv:2510.13094, 2025

Aaradhya Pandey, Arnab Auddy, Haolin Zou, Arian Maleki, and Sanjeev Kulkarni. Gaussian certified unlearning in high dimensions: A hypothesis testing approach.arXiv preprint arXiv:2510.13094, 2025. 14 Missing-by-Design

work page arXiv 2025

[34] [34]

Modality-aware neuron pruning for unlearning in multimodal large language models.arXiv preprint arXiv:2502.15910, 2025

Zheyuan Liu, Guangyao Dou, Xiangchi Yuan, Chunhui Zhang, Zhaoxuan Tan, and Meng Jiang. Modality-aware neuron pruning for unlearning in multimodal large language models.arXiv preprint arXiv:2502.15910, 2025

work page arXiv 2025

[35] [35]

Protecting privacy in multimodal large language models with mllmu-bench

Zheyuan Liu, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng, Yongle Yuan, and Meng Jiang. Protecting privacy in multimodal large language models with mllmu-bench. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4...

work page 2025

[36] [36]

Practical membership inference attacks against large-scale multi-modal models: A pilot study

Myeongseob Ko, Ming Jin, Chenguang Wang, and Ruoxi Jia. Practical membership inference attacks against large-scale multi-modal models: A pilot study. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4871–4881, 2023

work page 2023

[37] [37]

Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

Lu Wang, Tianyuan Zhang, Yang Qu, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu, and Dacheng Tao. Black-box adversarial attack on vision language models for autonomous driving.arXiv preprint arXiv:2501.13563, 2025

work page arXiv 2025

[38] [38]

Can textual unlearning solve cross-modality safety alignment? InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 9830–9844, 2024

Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael B Abu-Ghazaleh, M Salman Asif, Yue Dong, Amit Roy-Chowdhury, and Chengyu Song. Can textual unlearning solve cross-modality safety alignment? InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 9830–9844, 2024

work page 2024

[39] [39]

Towards benign memory forgetting for selective multimodal large language model unlearning.arXiv preprint arXiv:2511.20196, 2025

Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees GM Snoek, and Meng Wang. Towards benign memory forgetting for selective multimodal large language model unlearning.arXiv preprint arXiv:2511.20196, 2025

work page arXiv 2025

[40] [40]

User-controlled privacy: Taint, track, and control.Proceedings on Privacy Enhancing Technologies, 2024

François Hublet, David Basin, and Sr ¯dan Krsti´c. User-controlled privacy: Taint, track, and control.Proceedings on Privacy Enhancing Technologies, 2024

work page 2024

[41] [41]

Cross-modal privacy-preserving synthesis and mixture-of-experts ensemble for robust asd prediction.Frontiers in Neuroinformatics, 19:1679196, 2025

J Revathy and Karthiga M. Cross-modal privacy-preserving synthesis and mixture-of-experts ensemble for robust asd prediction.Frontiers in Neuroinformatics, 19:1679196, 2025

work page 2025

[42] [42]

Privacy-preserving multimodal sentiment analysis.IEEE Internet of Things Journal, 2025

Honghui Xu, Wei Li, Daniel Takabi, Daehee Seo, and Zhipeng Cai. Privacy-preserving multimodal sentiment analysis.IEEE Internet of Things Journal, 2025

work page 2025

[43] [43]

Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages.IEEE Intelligent Systems, 31(6):82–88, 2016

Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages.IEEE Intelligent Systems, 31(6):82–88, 2016

work page 2016

[44] [44]

Memory fusion network for multi-view sequential learning

Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. Memory fusion network for multi-view sequential learning. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018

[45] [45]

Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis.IEEE Transactions on Affective Computing, 14(3):2276–2289, 2022

Sijie Mai, Ying Zeng, Shuangjia Zheng, and Haifeng Hu. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis.IEEE Transactions on Affective Computing, 14(3):2276–2289, 2022

work page 2022

[46] [46]

Hydiscgan: A hybrid distributed cgan for audio-visual privacy preservation in multimodal sentiment analysis.arXiv preprint arXiv:2404.11938, 2024

Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, and Liang Hu. Hydiscgan: A hybrid distributed cgan for audio-visual privacy preservation in multimodal sentiment analysis.arXiv preprint arXiv:2404.11938, 2024

work page arXiv 2024

[47] [47]

Clgsi: a multimodal sentiment analysis framework based on contrastive learning guided by sentiment intensity

Yang Yang, Xunde Dong, and Yupeng Qiang. Clgsi: a multimodal sentiment analysis framework based on contrastive learning guided by sentiment intensity. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 2099–2110, 2024

work page 2024

[48] [48]

Dlf: Disentangled-language-focused multimodal sentiment analysis

Pan Wang, Qiang Zhou, Yawen Wu, Tianlong Chen, and Jingtong Hu. Dlf: Disentangled-language-focused multimodal sentiment analysis. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 21180–21188, 2025

work page 2025

[49] [49]

Pamoe-msa: polarity-aware mixture of experts network for multimodal sentiment analysis.International Journal of Multimedia Information Retrieval, 14(1):1–16, 2025

Changqin Huang, Zhenheng Lin, Zhongmei Han, Qionghao Huang, Fan Jiang, and Xiaodi Huang. Pamoe-msa: polarity-aware mixture of experts network for multimodal sentiment analysis.International Journal of Multimedia Information Retrieval, 14(1):1–16, 2025

work page 2025

[50] [50]

Msamba: Exploring multimodal sentiment analysis with state space models

Xilin He, Haijian Liang, Boyi Peng, Weicheng Xie, Muhammad Haris Khan, Siyang Song, and Zitong Yu. Msamba: Exploring multimodal sentiment analysis with state space models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 1309–1317, 2025

work page 2025

[51] [51]

Iemocap: Interactive emotional dyadic motion capture database

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335–359, 2008. 15 Missing-by-Design

work page 2008

[52] [52]

Two-stage finetuning of wav2vec 2.0 for speech emotion recognition with asr and gender pretraining

Yuan Gao, Chenhui Chu, and Tatsuya Kawahara. Two-stage finetuning of wav2vec 2.0 for speech emotion recognition with asr and gender pretraining. InProc. Interspeech, pages 3637–3641, 2023

work page 2023

[53] [53]

Learning robust self-attention features for speech emotion recognition with label-adaptive mixup

Lei Kang, Lichao Zhang, and Dazhi Jiang. Learning robust self-attention features for speech emotion recognition with label-adaptive mixup. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023

work page 2023

[54] [54]

Improving speech emotion recognition with unsupervised speaking style transfer

Leyuan Qu, Wei Wang, Cornelius Weber, Pengcheng Yue, Taihao Li, and Stefan Wermter. Improving speech emotion recognition with unsupervised speaking style transfer. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 10101–10105. IEEE, 2024

work page 2024

[55] [55]

Leveraging knowledge of modality experts for incomplete multimodal learning

Wenxin Xu, Hexin Jiang, and Xuefeng Liang. Leveraging knowledge of modality experts for incomplete multimodal learning. InProceedings of the 32nd ACM International Conference on Multimedia, pages 438–446, 2024

work page 2024

[56] [56]

Apin: Amplitude-and phase-aware interaction network for speech emotion recognition.Speech Communication, 169:103201, 2025

Lili Guo, Jie Li, Shifei Ding, and Jianwu Dang. Apin: Amplitude-and phase-aware interaction network for speech emotion recognition.Speech Communication, 169:103201, 2025

work page 2025

[57] [57]

Individual-aware attention modulation for unseen speaker emotion recognition.IEEE Transactions on Affective Computing, 2024

Yuanbo Fang, Xiaofen Xing, Zhaojie Chu, Yifeng Du, and Xiangmin Xu. Individual-aware attention modulation for unseen speaker emotion recognition.IEEE Transactions on Affective Computing, 2024

work page 2024

[58] [58]

Gatem 2 former: Gated feature selection and expert modeling in multimodal emotion recognition

Weixiang Xu, Zhongren Dong, Runming Wang, Xinzhou Xu, and Zixing Zhang. Gatem 2 former: Gated feature selection and expert modeling in multimodal emotion recognition. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

work page 2025

[59] [59]

Seenet: A soft emotion expert and data augmentation method to enhance speech emotion recognition.IEEE Transactions on Affective Computing, 2025

Qifei Li, Yingming Gao, Yuhua Wen, Ziping Zhao, Ya Li, and Björn W Schuller. Seenet: A soft emotion expert and data augmentation method to enhance speech emotion recognition.IEEE Transactions on Affective Computing, 2025

work page 2025

[60] [60]

Towards robust multimodal sentiment analysis with incomplete data.Advances in Neural Information Processing Systems, 37:55943–55974, 2024

Haoyu Zhang, Wenbin Wang, and Tianshu Yu. Towards robust multimodal sentiment analysis with incomplete data.Advances in Neural Information Processing Systems, 37:55943–55974, 2024

work page 2024

[61] [61]

adjacent

Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. Cider: Consensus-based image description evaluation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575, 2015. A Proofs and calibration details This appendix collects the full derivation of the DP-like indistinguishability bound used in the paper, supp...

work page 2015