arxiv: 2605.06643 · v1 · submitted 2026-05-07 · 💻 cs.CV · cs.AI· cs.LG· cs.MM

Recognition: unknown

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

Hao Dong , Hongzhao Li , Shupan Li , Muhammad Haris Khan , Eleni Chatzi , Olga Fink

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:16 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LGcs.MM

keywords multimodal domain generalizationbenchmark studydomain generalizationmultimodal learningrobustness evaluationempirical risk minimizationmissing modalitiesmodel trustworthiness

0 comments

The pith

A comprehensive benchmark reveals that recent specialized multimodal domain generalization methods offer only marginal improvements over standard training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles inconsistent evaluations in multimodal domain generalization by introducing MMDG-Bench, a standardized test suite across six datasets in action recognition, fault diagnosis, and sentiment analysis. It runs nine methods through six modality setups and multiple robustness checks, training over seven thousand networks in total. Results show specialized approaches edge out only slightly on a basic baseline, with no method dominating and big drops when data is corrupted or incomplete. This matters for building multimodal systems that work reliably outside clean lab conditions, where inputs often vary or fail.

Core claim

The authors introduce MMDG-Bench to standardize evaluation in multimodal domain generalization. Through extensive experiments involving 7402 neural networks across 95 tasks, they find that recent specialized methods offer only marginal gains over the ERM baseline, no method wins consistently, trimodal setups do not reliably beat bimodal ones, and all methods struggle with corruptions and missing modalities while some harm trustworthiness.

What carries the argument

MMDG-Bench, a unified evaluation framework that standardizes datasets, modality combinations, methods, and tests for accuracy, corruption robustness, missing modalities, and detection of misclassifications or out-of-distribution samples.

Load-bearing premise

The nine selected methods and six datasets sufficiently represent the diversity of approaches and challenges in the multimodal domain generalization field.

What would settle it

A new method that achieves substantially higher accuracy than ERM across all six datasets, all modality combinations, and under corruption and missing-modality conditions would challenge the finding of only marginal progress.

Figures

Figures reproduced from arXiv: 2605.06643 by Eleni Chatzi, Hao Dong, Hongzhao Li, Muhammad Haris Khan, Olga Fink, Shupan Li.

**Figure 1.** Figure 1: An overview of the MMDG-Bench and a summary of our key observations. view at source ↗

**Figure 2.** Figure 2: Illustration of three core tasks included in the MMDG-Bench. view at source ↗

**Figure 3.** Figure 3: Multimodal multi-source DG with corruptions on HAC dataset. Values show the change relative to the clean Video+Audio setting. Detailed results are in view at source ↗

**Figure 4.** Figure 4: Multimodal multi-source DG with missing modalities on HAC dataset. Values show the change relative to the full Video+Audio setting. Detailed results are in view at source ↗

**Figure 5.** Figure 5: Examples from action recognition datasets. view at source ↗

**Figure 6.** Figure 6: Examples from fault diagnosis dataset. C.1 Action Recognition Human-Animal-Cartoon (HAC) [11]. The HAC dataset consists of seven actions (“sleeping,” “watching TV,” “eating,” “drinking,” “swimming,” “running,” and “opening door”) performed by humans, animals, and cartoon characters, forming three distinct domains: Human (H), Animal (A), and Cartoon (C). The dataset contains a total of 3, 381 video clips, i… view at source ↗

**Figure 7.** Figure 7: Examples from sentiment analysis datasets. view at source ↗

read the original abstract

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly across datasets, modality configurations, and experimental settings. Furthermore, existing benchmarks focus predominantly on action recognition, often neglecting critical real-world challenges such as input corruptions, missing modalities, and model trustworthiness. This lack of standardization obscures a reliable assessment of the field's advancement. To address this issue, we introduce MMDG-Bench, the first unified and comprehensive benchmark for MMDG, which standardizes evaluation across six datasets spanning three diverse tasks: action recognition, mechanical fault diagnosis, and sentiment analysis. MMDG-Bench encompasses six modality combinations, nine representative methods, and multiple evaluation settings. Beyond standard accuracy, it systematically assesses corruption robustness, missing-modality generalization, misclassification detection, and out-of-distribution detection. With 7, 402 neural networks trained in total across 95 unique cross-domain tasks, MMDG-Bench yields five key findings: (1) under fair comparisons, recent specialized MMDG methods offer only marginal improvements over ERM baseline; (2) no single method consistently outperforms others across datasets or modality combinations; (3) a substantial gap to upper-bound performance persists, indicating that MMDG remains far from solved; (4) trimodal fusion does not consistently outperform the strongest bimodal configurations; and (5) all evaluated methods exhibit significant degradation under corruption and missing-modality scenarios, with some methods further compromising model trustworthiness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's big controlled comparison finds specialized MMDG methods give only marginal, inconsistent gains over ERM with clear drops under corruption or missing modalities, but that conclusion depends on how well the nine methods and six datasets represent the broader literature.

read the letter

The main point is that under their standardized setup, recent MMDG methods do not deliver reliable robustness improvements beyond a basic ERM baseline, no single approach wins consistently, and everything degrades with corruptions or missing inputs while leaving a large gap to the upper bound. That negative result on current progress is the takeaway worth knowing if you work on multimodal models that need to handle domain shifts or incomplete data.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MMDG-Bench, the first unified benchmark for Multimodal Domain Generalization (MMDG). It standardizes evaluation across six datasets spanning action recognition, mechanical fault diagnosis, and sentiment analysis; six modality combinations; nine methods; and multiple settings including corruption robustness, missing-modality generalization, misclassification detection, and OOD detection. With 7,402 networks trained over 95 cross-domain tasks, the paper reports five findings: specialized MMDG methods yield only marginal gains over ERM, no method consistently outperforms others, a large gap to upper-bound performance remains, trimodal fusion does not reliably beat the best bimodal setups, and all methods degrade under corruptions/missing modalities with some harming trustworthiness.

Significance. If the empirical conclusions are robust, this benchmark study is significant for documenting limited algorithmic progress in MMDG beyond ERM and for supplying a standardized, multi-task, multi-metric evaluation framework that future work can build upon. The scale (7,402 models) and breadth (robustness + trustworthiness metrics) are genuine strengths that could help the community avoid fragmented, non-comparable results.

major comments (3)

[Abstract and §3] Abstract and §3 (Benchmark Construction): The claim that the nine methods are 'representative' and the six datasets provide 'unified' coverage is load-bearing for findings (1) and (2) on marginal gains and lack of consistent winner. No explicit inclusion criteria, exhaustive literature survey, or ablation demonstrating that omitted methods/datasets would not change the ranking is supplied; this directly limits the generalizability of the 'no real progress' conclusion.
[§4.3 and Table 2] §4.3 (Implementation Details) and Table 2: The manuscript reports aggregate accuracies but provides no description of hyperparameter search ranges, exact train/val/test splits per domain, number of random seeds, or statistical significance testing. Without these, it is impossible to verify whether the reported marginal improvements over ERM are stable or could be artifacts of implementation choices.
[§5.1] §5.1 (Main Results): The upper-bound performance is referenced but its construction (e.g., whether it uses oracle domain labels or privileged information) is not detailed enough to interpret the size of the 'substantial gap' claimed in finding (3). This gap is central to the paper's narrative that MMDG remains far from solved.

minor comments (2)

[Figure 3 and §4.2] Figure 3 and §4.2: The visualization of modality combinations could include error bars or per-run variance to make the 'no consistent winner' claim visually clearer.
[Related Work] Related Work section: A short table comparing MMDG-Bench to prior single-task benchmarks (e.g., on action recognition only) would help readers quickly see the added coverage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important areas for improving clarity and reproducibility. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core empirical findings.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Benchmark Construction): The claim that the nine methods are 'representative' and the six datasets provide 'unified' coverage is load-bearing for findings (1) and (2) on marginal gains and lack of consistent winner. No explicit inclusion criteria, exhaustive literature survey, or ablation demonstrating that omitted methods/datasets would not change the ranking is supplied; this directly limits the generalizability of the 'no real progress' conclusion.

Authors: We selected the nine methods to cover the primary algorithmic paradigms in recent MMDG literature (invariant feature learning, augmentation-based, meta-learning, and fusion strategies) from top venues with publicly available code. The six datasets were chosen to extend beyond action recognition to fault diagnosis and sentiment analysis while supporting the six modality combinations. We will add an explicit subsection in §3 listing inclusion criteria (publication year 2020+, multimodal applicability, reproducibility) and a short discussion of omissions (e.g., methods lacking code or not supporting trimodal inputs). While an exhaustive survey or full ablation on every omitted method is outside the scope of a benchmark paper, we will note that the marginal-gains finding is consistent across the evaluated representative set. This will be a partial revision. revision: partial
Referee: [§4.3 and Table 2] §4.3 (Implementation Details) and Table 2: The manuscript reports aggregate accuracies but provides no description of hyperparameter search ranges, exact train/val/test splits per domain, number of random seeds, or statistical significance testing. Without these, it is impossible to verify whether the reported marginal improvements over ERM are stable or could be artifacts of implementation choices.

Authors: We agree these details are essential. In the revision we will expand §4.3 to specify: hyperparameter grids (learning rate 1e-4–1e-2, batch size 32–128, etc.), exact per-domain train/val/test splits for each of the six datasets, training with three random seeds, and paired t-test results confirming statistical significance of differences from ERM. Table 2 will be updated to report mean ± standard deviation. This is a full revision. revision: yes
Referee: [§5.1] §5.1 (Main Results): The upper-bound performance is referenced but its construction (e.g., whether it uses oracle domain labels or privileged information) is not detailed enough to interpret the size of the 'substantial gap' claimed in finding (3). This gap is central to the paper's narrative that MMDG remains far from solved.

Authors: The upper bound is obtained by training the same architectures on the pooled labeled data from all source and target domains (i.e., no domain shift during training), providing an oracle ceiling without domain-generalization constraints. It uses only standard supervised labels and does not rely on additional privileged information. We will add a precise description and footnote in §5.1 clarifying this construction. This is a full revision. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark with measured results on held-out data

full rationale

The paper performs a large-scale empirical comparison of nine MMDG methods across six datasets and multiple evaluation protocols. All reported findings (marginal gains over ERM, lack of consistent winner, performance gaps) are direct measurements on held-out target domains rather than quantities derived from fitted parameters or self-referential definitions inside the paper. No equations, uniqueness theorems, or ansatzes are invoked that reduce to the paper's own inputs by construction. Selection of methods and datasets raises external-validity questions but does not create internal circularity per the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the representativeness of the chosen datasets, tasks, and methods rather than on any mathematical derivation. No free parameters are fitted inside a model; the only choices are which baselines and evaluation settings to include.

axioms (1)

domain assumption The six selected datasets and three tasks adequately sample the space of multimodal domain shifts encountered in practice.
The paper uses these datasets to draw general conclusions about MMDG progress.

invented entities (1)

MMDG-Bench no independent evidence
purpose: A unified evaluation platform that standardizes datasets, modality combinations, and metrics for MMDG.
Newly introduced artifact whose value depends on community adoption.

pith-pipeline@v0.9.0 · 5615 in / 1382 out tokens · 46996 ms · 2026-05-08T12:16:01.353309+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 11 canonical work pages · 3 internal anchors

[1]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review arXiv 1907
[2]

Openface: an open source facial behavior analysis toolkit

Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. Openface: an open source facial behavior analysis toolkit. InWACV, 2016

2016
[3]

Generalizing from several related classification tasks to a new unlabeled sample

Gilles Blanchard, Gyemin Lee, and Clayton Scott. Generalizing from several related classification tasks to a new unlabeled sample. InNeurIPS, 2011

2011
[4]

Towards robust incomplete multimodal open-set domain general- ization with uncertain missing modalities.Knowledge-Based Systems, page 115777, 2026

Xin Chen, Huanjie Tao, and Benran Li. Towards robust incomplete multimodal open-set domain general- ization with uncertain missing modalities.Knowledge-Based Systems, page 115777, 2026

2026
[5]

Openmmlab’s next generation video understanding toolbox and benchmark

MMAction2 Contributors. Openmmlab’s next generation video understanding toolbox and benchmark. https://github.com/open-mmlab/mmaction2, 2020

2020
[6]

Scaling egocentric vision: The epic-kitchens dataset

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Scaling egocentric vision: The epic-kitchens dataset. InECCV, 2018

2018
[7]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019
[8]

Towards multimodal open-set domain generalization and adaptation through self-supervision

Hao Dong, Eleni Chatzi, and Olga Fink. Towards multimodal open-set domain generalization and adaptation through self-supervision. InECCV, 2024

2024
[9]

Towards robust multimodal open-set test-time adaptation via adaptive entropy-aware optimization

Hao Dong, Eleni Chatzi, and Olga Fink. Towards robust multimodal open-set test-time adaptation via adaptive entropy-aware optimization. InICLR, 2025

2025
[10]

Advances in multimodal adaptation and generalization: From traditional approaches to foundation models.arXiv preprint arXiv:2501.18592,

Hao Dong, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, and Olga Fink. Advances in multimodal adaptation and generalization: From traditional approaches to foundation models. arXiv preprint arXiv:2501.18592, 2025

work page arXiv 2025
[11]

SimMMDG: A simple and effective framework for multi-modal domain generalization

Hao Dong, Ismail Nejjar, Han Sun, Eleni Chatzi, and Olga Fink. SimMMDG: A simple and effective framework for multi-modal domain generalization. InNeurIPS, 2023

2023
[12]

Cross-modal representation flattening for multi-modal domain generalization

Yunfeng Fan, Wenchao Xu, Haozhao Wang, and Song Guo. Cross-modal representation flattening for multi-modal domain generalization. InNeurIPS, 2024

2024
[13]

Slowfast networks for video recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. InICCV, 2019

2019
[14]

Olga Fink, Ismail Nejjar, Vinay Sharma, Keivan Faghih Niresi, Han Sun, Hao Dong, Chenghao Xu, Amaury Wei, Arthur Bizzi, Raffael Theiler, et al. From physics to machine learning and back: Part ii-learning and observational bias in prognostics and health management (phm).Reliability Engineering & System Safety, page 112376, 2026

2026
[15]

From physics to machine learning and back: Part i-learning with inductive biases in prognostics and health management.Reliability Engineering & System Safety, page 112213, 2026

Olga Fink, Vinay Sharma, Ismail Nejjar, Leandro V on Krannichfeldt, Sergei Garmaev, Zepeng Zhang, Amaury Wei, Gaetan Frusque, Florent Forest, Mengjie Zhao, et al. From physics to machine learning and back: Part i-learning with inductive biases in prognostics and health management.Reliability Engineering & System Safety, page 112213, 2026

2026
[16]

Unsupervised domain adaptation by backpropagation

Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. InICML, 2015

2015
[17]

arXiv preprint arXiv:2007.01434 , year=

Ishaan Gulrajani and David Lopez-Paz. In search of lost domain generalization.arXiv preprint arXiv:2007.01434, 2020

work page arXiv 2007
[18]

Integrating audio narrations to strengthen domain generalization in multimodal first-person action recognition

Cagri Gungor and Adriana Kovashka. Integrating audio narrations to strengthen domain generalization in multimodal first-person action recognition. InICASSP, 2025

2025
[19]

Bridging the gap for test-time multimodal sentiment analysis

Zirun Guo, Tao Jin, Wenlong Xu, Wang Lin, and Yangyang Wu. Bridging the gap for test-time multimodal sentiment analysis. InAAAI, 2025

2025
[20]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, 2016. 10

2016
[21]

The many faces of robustness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InICCV, 2021

2021
[22]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.arXiv preprint arXiv:1903.12261, 2019

work page internal anchor Pith review arXiv 1903
[23]

Bridging domain generalization to multimodal domain generalization via unified representations

Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, and Zhou Zhao. Bridging domain generalization to multimodal domain generalization via unified representations. InICCV, 2025

2025
[24]

Alignment and distillation: A robust framework for multimodal domain generalizable human action recognition

Hyeonbin Ji, Juyeob Lee, and Eunil Park. Alignment and distillation: A robust framework for multimodal domain generalizable human action recognition. InWACV, 2026

2026
[25]

Wilds: A benchmark of in-the-wild distribution shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. Wilds: A benchmark of in-the-wild distribution shifts. InICML, 2021

2021
[26]

Out-of-distribution generalization via risk extrapolation (rex)

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk extrapolation (rex). In ICML, 2021

2021
[27]

Learning to generalize: Meta-learning for domain generalization

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. Learning to generalize: Meta-learning for domain generalization. InAAAI, 2018

2018
[28]

Towards multimodal domain generalization with few labels.arXiv preprint arXiv:2602.22917, 2026

Hongzhao Li, Hao Dong, Hualei Wan, Shupan Li, Mingliang Xu, and Muhammad Haris Khan. Towards multimodal domain generalization with few labels.arXiv preprint arXiv:2602.22917, 2026

work page arXiv 2026
[29]

Balancing multimodal domain generalization via gradient modulation and projection

Hongzhao Li, Guohao Shen, Shupan Li, Mingliang Xu, and Muhammad Haris Khan. Balancing multimodal domain generalization via gradient modulation and projection. InAAAI, 2026

2026
[30]

Towards robust multimodal domain generalization via modality-domain joint adversarial training

Hongzhao Li, Hualei Wan, Liangzhi Zhang, Mingyuan Jiu, Shupan Li, Mingliang Xu, and Muham- mad Haris Khan. Towards robust multimodal domain generalization via modality-domain joint adversarial training. InProceedings of the 33rd ACM International Conference on Multimedia, 2025

2025
[31]

Dpu: Dynamic prototype updating for multimodal out-of-distribution detection.arXiv preprint arXiv:2411.08227, 2024

Shawn Li, Huixian Gong, Hao Dong, Tiankai Yang, Zhengzhong Tu, and Yue Zhao. Dpu: Dynamic prototype updating for multimodal out-of-distribution detection.arXiv preprint arXiv:2411.08227, 2024

work page arXiv 2024
[32]

Adaptive confidence regularization for multimodal failure detection.arXiv preprint arXiv:2603.02200, 2026

Moru Liu, Hao Dong, Olga Fink, and Mario Trapp. Adaptive confidence regularization for multimodal failure detection.arXiv preprint arXiv:2603.02200, 2026

work page arXiv 2026
[33]

Extremely simple multimodal outlier synthesis for out-of-distribution detection and segmentation.arXiv preprint arXiv:2505.16985, 2025

Moru Liu, Hao Dong, Jessica Kelly, Olga Fink, and Mario Trapp. Extremely simple multimodal outlier synthesis for out-of-distribution detection and segmentation.arXiv preprint arXiv:2505.16985, 2025

work page arXiv 2025
[34]

librosa: Audio and music signal analysis in python.SciPy, 2015(18-24):7, 2015

Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, Oriol Nieto, et al. librosa: Audio and music signal analysis in python.SciPy, 2015(18-24):7, 2015

2015
[35]

Domain generalization via invariant feature representation

Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. Domain generalization via invariant feature representation. InICML, 2013

2013
[36]

Multi-modal domain adaptation for fine-grained action recognition

Jonathan Munro and Dima Damen. Multi-modal domain adaptation for fine-grained action recognition. In CVPR, 2020

2020
[37]

Domain generalization through audio-visual relative norm alignment in first person action recognition

Mirco Planamente, Chiara Plizzari, Emanuele Alberti, and Barbara Caputo. Domain generalization through audio-visual relative norm alignment in first person action recognition. InWACV, 2022

2022
[38]

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization.arXiv preprint arXiv:1911.08731, 2019

work page internal anchor Pith review arXiv 1911
[39]

Deep coral: Correlation alignment for deep domain adaptation

Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. InECCV, 2016

2016
[40]

Unbiased look at dataset bias

Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR, 2011

2011
[41]

An overview of statistical learning theory.IEEE transactions on neural networks, 10(5):988–999, 1999

Vladimir N Vapnik. An overview of statistical learning theory.IEEE transactions on neural networks, 10(5):988–999, 1999

1999
[42]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017. 11

2017
[43]

Generalizing to unseen domains via adversarial data augmentation

Riccardo V olpi, Hongseok Namkoong, Ozan Sener, John C Duchi, Vittorio Murino, and Silvio Savarese. Generalizing to unseen domains via adversarial data augmentation. InNeurIPS, 2018

2018
[44]

Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 35(8):8052–8072, 2022

Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip S Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE transactions on knowledge and data engineering, 35(8):8052–8072, 2022

2022
[45]

Modality-balanced collabora- tive distillation for multi-modal domain generalization

Xiaohan Wang, Zhangtao Cheng, Ting Zhong, Leiting Chen, and Fan Zhou. Modality-balanced collabora- tive distillation for multi-modal domain generalization. InAAAI, 2026

2026
[46]

Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality

Wenmeng Yu, Hua Xu, Fanyang Meng, Yilin Zhu, Yixiao Ma, Jiele Wu, Jiyun Zou, and Kaicheng Yang. Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In ACL, 2020

2020
[47]

Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos,

Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos.arXiv preprint arXiv:1606.06259, 2016

work page arXiv 2016
[48]

Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph

AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In ACL, 2018

2018
[49]

Nonpolarized embedding learning in multimodal domain generalization.Neurocomputing, page 131754, 2025

Baoqiang Zhang, Kunze Huang, Luyao Luyao, Xiaotong Tu, and Xiaolu Li. Nonpolarized embedding learning in multimodal domain generalization.Neurocomputing, page 131754, 2025

2025
[50]

Nico++: Towards better benchmarking for domain generalization

Xingxuan Zhang, Yue He, Renzhe Xu, Han Yu, Zheyan Shen, and Peng Cui. Nico++: Towards better benchmarking for domain generalization. InCVPR, 2023

2023
[51]

Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study.Reliability Engineering & System Safety, 245:109964, 2024

Chao Zhao, Enrico Zio, and Weiming Shen. Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study.Reliability Engineering & System Safety, 245:109964, 2024

2024
[52]

know what they do not know

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. Domain generalization: A survey. IEEE transactions on pattern analysis and machine intelligence, 45(4):4396–4415, 2022. 12 A Related Work A.1 Domain Generalization Domain generalization (DG), formalized by [ 3] and named by [ 35], aims to learn models that transfer to unseen target distribu...

work page arXiv 2022