VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals
Pith reviewed 2026-05-20 22:08 UTC · model grok-4.3
The pith
VCR learns valid contextual representations for incomplete wearable signals by disentangling shared and modality-specific features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VCR employs an orthogonal tokenizer to enforce strict orthogonal disentanglement by rectifying latent manifolds and applying a geometric projection, separating each modality into shared semantics and modality-specific residuals. This design preserves complete information integrity while serving as a structural foundation for robust learning under modality missingness. The resulting tokens are processed by a missing-aware mixture-of-experts backbone that adapts to varying patterns of modality availability. By constraining the objective to reconstruct only the shared components of missing modalities, VCR effectively mitigates hallucinations of non-inferable modality-specific details and yields
What carries the argument
The orthogonal tokenizer, which rectifies latent manifolds and applies a geometric projection to separate each modality into shared semantics and modality-specific residuals.
If this is right
- VCR maintains strong performance when all sensor modalities are available.
- The method increases robustness when one or more modalities are missing.
- By reconstructing only shared components, the approach avoids generating unsupported details in absent signals.
- The missing-aware mixture-of-experts adapts training and inference to any observed pattern of sensor availability.
Where Pith is reading between the lines
- The same separation of inferable shared information from non-inferable specifics could apply to other multimodal sensor problems where inputs intermittently drop out.
- If the disentanglement holds, the shared representations might transfer more reliably across different wearable devices or health monitoring tasks.
- Testing the geometric projection step on datasets with varying missing rates would clarify how much the orthogonality contributes to the observed robustness gains.
Load-bearing premise
An orthogonal tokenizer can enforce strict separation of shared semantics from modality-specific residuals through manifold rectification and geometric projection without any loss of essential information.
What would settle it
Measure whether accuracy on health outcome prediction drops sharply when multiple modalities are removed if the reconstruction objective is changed to include full modality-specific details instead of only shared components.
Figures
read the original abstract
Wearable devices enable continuous health monitoring from multimodal signals, but real-world deployment is hindered by limited labeled data and pervasive sensor incompleteness. While large-scale self-supervised pretraining reduces label dependence, most existing methods assume full modality availability. Current approaches for handling modality missingness often reconstruct entire absent signals, which can encourage hallucinating modality-specific details that are not inferable from the observed sensor signals and degrade robustness. We propose VCR, a self-supervised framework that learns to extract valid representations robust to modality missingness. VCR employs an orthogonal tokenizer to enforce strict orthogonal disentanglement by rectifying latent manifolds and applying a geometric projection, separating each modality into shared semantics and modality-specific residuals. This design preserves complete information integrity while serving as a structural foundation for robust learning under modality missingness. The resulting tokens are processed by a missing-aware mixture-of-experts backbone that adapts to varying patterns of modality availability. By constraining the objective to reconstruct only the shared components of missing modalities, VCR effectively mitigates hallucinations of non-inferable modality-specific details. Across multiple health monitoring tasks, VCR consistently improves performance and robustness under full, single-missing, and multiple-missing modality settings compared with strong supervised and self-supervised baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes VCR, a self-supervised framework for learning valid contextual representations from incomplete multimodal wearable signals in health monitoring. It introduces an orthogonal tokenizer that rectifies latent manifolds and applies a geometric projection to disentangle each modality into shared semantics and modality-specific residuals while preserving information integrity. These tokens feed a missing-aware mixture-of-experts backbone that adapts to availability patterns and reconstructs only shared components of missing modalities to avoid hallucinating non-inferable details. The central claim is that VCR yields consistent gains in performance and robustness versus strong supervised and self-supervised baselines under full, single-missing, and multiple-missing modality settings.
Significance. If the empirical claims hold, the work addresses a practical deployment barrier for continuous wearable health monitoring by providing a structural solution to modality missingness that avoids reconstruction hallucinations. The orthogonal disentanglement via geometric projection offers a principled alternative to standard imputation or masking strategies in multimodal self-supervised learning.
major comments (2)
- [Method section, orthogonal tokenizer] Method section, orthogonal tokenizer: the claim that latent manifold rectification plus geometric projection produces strictly orthogonal shared semantics while preserving all inferable shared information is load-bearing for the missing-aware reconstruction objective and the reported robustness gains. No information-theoretic bound or ablation isolating information loss under nonlinear, time-varying correlations (typical of wearable signals) is provided, leaving open the possibility that recoverable shared content is discarded or non-inferable specifics leak into the shared tokens.
- [Results section] Results section: the abstract asserts consistent improvements across missingness settings, yet the provided text supplies no quantitative metrics, error bars, statistical tests, or ablations that isolate the orthogonal tokenizer's contribution from the mixture-of-experts backbone. This absence prevents verification that the claimed robustness stems from the proposed disentanglement rather than other factors.
minor comments (1)
- [Introduction] Clarify the precise definition of 'valid contextual representation' and how the geometric projection differs from standard orthogonalization techniques (e.g., Gram-Schmidt) already used in multimodal learning.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the orthogonal tokenizer and empirical validation. We address each major comment point by point below.
read point-by-point responses
-
Referee: Method section, orthogonal tokenizer: the claim that latent manifold rectification plus geometric projection produces strictly orthogonal shared semantics while preserving all inferable shared information is load-bearing for the missing-aware reconstruction objective and the reported robustness gains. No information-theoretic bound or ablation isolating information loss under nonlinear, time-varying correlations (typical of wearable signals) is provided, leaving open the possibility that recoverable shared content is discarded or non-inferable specifics leak into the shared tokens.
Authors: We agree that the orthogonality claim is central. The geometric projection is designed to enforce orthogonality by construction after manifold rectification, mapping components to orthogonal subspaces. Preservation of inferable shared information is encouraged by the reconstruction objective targeting only shared components. We acknowledge the absence of a formal information-theoretic bound for nonlinear cases. In revision, we will add an ablation using synthetic signals with controlled nonlinear correlations to quantify retention and leakage in shared tokens. revision: yes
-
Referee: Results section: the abstract asserts consistent improvements across missingness settings, yet the provided text supplies no quantitative metrics, error bars, statistical tests, or ablations that isolate the orthogonal tokenizer's contribution from the mixture-of-experts backbone. This absence prevents verification that the claimed robustness stems from the proposed disentanglement rather than other factors.
Authors: The full manuscript's Results section (Section 4) presents quantitative metrics with means and standard deviations as error bars, along with statistical tests and ablations comparing models with and without the orthogonal tokenizer. To better isolate its contribution, we will expand the revision with a dedicated table and analysis breaking down the tokenizer's impact across full and missing-modality settings. revision: yes
Circularity Check
No significant circularity: VCR framework introduces independent structural constraints
full rationale
The paper presents VCR as a self-supervised framework relying on an orthogonal tokenizer (via latent manifold rectification and geometric projection) and a missing-aware mixture-of-experts backbone. These are introduced as novel design choices that enforce disentanglement and constrain reconstruction to shared components only. The performance claims rest on empirical comparisons under full, single-missing, and multi-missing settings rather than any derivation that reduces by construction to fitted parameters, self-citations, or renamed inputs. No load-bearing step equates a claimed prediction or uniqueness result to its own inputs; the derivation chain remains self-contained with independent content.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Wearable multimodal signals contain separable shared semantics and modality-specific residuals that can be isolated via orthogonal projection without loss of integrity.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We employ P to project h onto the modal-shared subspace, yielding the shared representation: s=Ph. ... p=h−s=(I−P)h ... Cov(s,p)≈0 ... I(s;p)≈0
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
QR decomposition on W ... extract first r columns of Q to form basis B ... P=BB⊤ ... orthogonal projection operator
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Introducing wesad, a multimodal dataset for wearable stress and affect detection
Philip Schmidt, Attila Reiss, Robert Duerichen, Claus Marberger, and Kristof Van Laerhoven. Introducing wesad, a multimodal dataset for wearable stress and affect detection. InProceedings of the 20th ACM international conference on multimodal interaction, pages 400–408, 2018
work page 2018
-
[2]
Rummana Bari, Md Mahbubur Rahman, Nazir Saleheen, Megan Battles Parsons, Eugene H Buder, and Santosh Kumar. Automated detection of stressful conversations using wearable physiological and inertial sensors.Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 4(4):1–23, 2020
work page 2020
-
[3]
Bing Zhai, Ignacio Perez-Pozuelo, Emma AD Clifton, Joao Palotti, and Yu Guan. Making sense of sleep: Multimodal sleep stage classification in a large, diverse population using movement and cardiac sensing.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(2):1–33, 2020
work page 2020
-
[4]
Real-Time Sleep Staging using Deep Learning on a Smartphone for a Wearable EEG
Abhay Koushik, Judith Amores, and Pattie Maes. Real-time sleep staging using deep learning on a smartphone for a wearable eeg.arXiv preprint arXiv:1811.10111, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[5]
Taoran Sheng and Manfred Huber. Weakly supervised multi-task representation learning for human activity analysis using wearables.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(2):1–18, 2020
work page 2020
-
[6]
Shengzhong Liu, Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Huajie Shao, and Tarek Abdelzaher. Giobalfusion: A global attentional deep learning framework for multisensor information fusion.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(1):1–27, 2020
work page 2020
-
[7]
Beyond sensor data: Foundation models of behavioral data from wearables improve health predictions
Eray Erturk, Fahad Kamran, Salar Abbaspourazad, Sean Jewell, Harsh Sharma, Yujie Li, Sinead Williamson, Nicholas J Foti, and Joseph Futoma. Beyond sensor data: Foundation models of behavioral data from wearables improve health predictions. InProceedings of the 42nd International Conference on Machine Learning (ICML), pages 15516–15541. PMLR, 2025
work page 2025
-
[8]
Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam A. Tailor, Jacob Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. Scaling wearable foundation models. InThe Thirteenth International Conference on Learning R...
work page 2025
-
[9]
Relcon: Relative contrastive learning for a motion foundation model for wearable data
Maxwell A Xu, Jaya Narain, Gregory Darnell, Haraldur T Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Andres Fineman, Karthik Jayaraman Raghuram, James Matthew Rehg, and Shirley You Ren. Relcon: Relative contrastive learning for a motion foundation model for wearable data. InThe Thirteenth International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[10]
Papagei: Open foundation models for optical physiological signals
Arvind Pillai, Dimitris Spathis, Fahim Kawsar, and Mohammad Malekzadeh. Papagei: Open foundation models for optical physiological signals. InThe Thirteenth International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[11]
Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A Ali Heydari, Girish Narayanswamy, Maxwell A Xu, Ahmed A Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, et al. Sensorlm: Learning the language of wearable sensors.Advances in neural information processing systems (NeurIPS), 2025
work page 2025
-
[12]
Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. InEuropean conference on computer vision (ECCV), pages 776–794. Springer, 2020
work page 2020
-
[13]
Masked siamese networks for label-efficient learning
Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Mike Rabbat, and Nicolas Ballas. Masked siamese networks for label-efficient learning. InComputer Vision – ECCV 2022, pages 456–473. Springer, 2022. doi: 10.1007/978-3-031-19821-2_26. 10
-
[14]
Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, and Marinka Zitnik. Self-supervised contrastive pre-training for time series via time-frequency consistency.Advances in neural information processing systems (NeurIPS), 35:3988–4003, 2022
work page 2022
-
[15]
Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A
Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A. Tailor, Ahmed Metwally, A. Ali Heydari, Yuwei Zhang, Jake Garrison, Samy Abdel-Ghaffar, Xuhai Xu, Ken Gu, Jacob Sunshine, Ming-Zher Poh, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Mark Malhotra, Shwetak Patel, Yuzhe Yang, James M. Rehg, Xin Liu, and Dani...
-
[16]
Flex-moe: Modeling arbitrary modality combination via the flexible mixture-of-experts
Sukwon Yun, Inyoung Choi, Jie Peng, Yangfan Wu, Jingxuan Bao, Qiyiwen Zhang, Jiayi Xin, Qi Long, and Tianlong Chen. Flex-moe: Modeling arbitrary modality combination via the flexible mixture-of-experts. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, 2024. doi: 10.52202/079017-3135
-
[17]
Fusemoe: mixture-of-experts transformers for fleximodal fusion
Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, and Suchi Saria. Fusemoe: mixture-of-experts transformers for fleximodal fusion. InProceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS), pages 67850–67900, 2024
work page 2024
-
[18]
Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. Domain separation networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 29, pages 343–351, 2016
work page 2016
-
[19]
Ssole: Rethinking orthogonal low-rank em- bedding for self-supervised learning
Lun Huang, Qiang Qiu, and Guillermo Sapiro. Ssole: Rethinking orthogonal low-rank em- bedding for self-supervised learning. InThe Thirteenth International Conference on Learning Representations (ICLR), 2025
work page 2025
-
[20]
Shengzhong Liu, Tomoyoshi Kimura, Dongxin Liu, Ruijie Wang, Jinyang Li, Suhas Diggavi, Mani Srivastava, and Tarek Abdelzaher. Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023
work page 2023
-
[21]
Barlow twins: Self- supervised learning via redundancy reduction
Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InInternational conference on machine learning (ICML), pages 12310–12320. PMLR, 2021
work page 2021
-
[22]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofProceedings of Machine Learning Research, pages 1597–1607, 2020
work page 2020
-
[23]
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. InInternational Conference on Learning Representations (ICLR), 2017
work page 2017
-
[24]
Data2vec: A general framework for self-supervised learning in speech, vision and language
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, and Michael Auli. Data2vec: A general framework for self-supervised learning in speech, vision and language. In International conference on machine learning (ICML), pages 1298–1312. PMLR, 2022
work page 2022
-
[25]
Masked feature prediction for self-supervised visual pre-training
Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichten- hofer. Masked feature prediction for self-supervised visual pre-training. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 14668–14678, 2022
work page 2022
-
[26]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 16000–16009, 2022
work page 2022
-
[27]
Brinnae Bent, Peter J Cho, Maria Henriquez, April Wittmann, Connie Thacker, Mark Feinglos, Matthew J Crowley, and Jessilyn P Dunn. Engineering digital biomarkers of interstitial glucose from noninvasive smartwatches.NPJ Digital Medicine, 4(1):89, 2021. 11
work page 2021
-
[28]
Fusion of learned representations for multimodal sensor data classification
Lee B Hinkle, Gentry Atkinson, and Vangelis Metsis. Fusion of learned representations for multimodal sensor data classification. InIFIP International Conference on Artificial Intelligence Applications and Innovations, pages 404–415. Springer, 2023
work page 2023
-
[29]
Reza Rahimi Azghan, Nicholas C Glodosky, Ramesh Kumar Sah, Carrie Cuttler, Ryan McLaugh- lin, Michael J Cleveland, and Hassan Ghasemzadeh. Can-stress: A real-world multimodal dataset for understanding cannabis use, stress, and physiological responses.arXiv preprint arXiv:2503.19935, 2025
-
[30]
Attila Reiss, Ina Indlekofer, Philip Schmidt, and Kristof Van Laerhoven. Deep ppg: Large-scale heart rate estimation with convolutional neural networks.Sensors, 19(14):3079, 2019
work page 2019
-
[31]
Shagen Djanian, Thomas Dyhre Nielsen, Søren H. Nielsen, and Anders Bruun. Aalborg University Wearable Sleep Study (AAUWSS), August 2025. URL https://doi.org/10. 5281/zenodo.16919071. Version 1.0
work page 2025
-
[32]
Shkurta Gashi, Chulhong Min, Alessandro Montanari, Silvia Santini, and Fahim Kawsar. A multidevice and multimodal dataset for human energy expenditure estimation using wearable devices.Scientific Data, 9(1):537, 2022
work page 2022
-
[33]
URLhttps://ieeexplore.ieee.org/document/7780459
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90
-
[34]
An image is worth 16×16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[35]
Mf-clr: Multi-frequency contrastive learning representation for time series
Jufang Duan, Wei Zheng, Yangzhou Du, Wenfa Wu, Haipeng Jiang, and Hongsheng Qi. Mf-clr: Multi-frequency contrastive learning representation for time series. InForty-first International Conference on Machine Learning (ICML), 2024
work page 2024
-
[36]
Large-scale training of foundation models for wearable biosignals
Salar Abbaspourazad, Oussama Elachqar, Andrew Miller, Saba Emrani, Udhyakumar Nallasamy, and Ian Shapiro. Large-scale training of foundation models for wearable biosignals. InThe Twelfth International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[37]
Clocs: Contrastive learning of cardiac signals across space, time, and patients
Dani Kiyasseh, Tingting Zhu, and David A Clifton. Clocs: Contrastive learning of cardiac signals across space, time, and patients. InInternational Conference on Machine Learning (ICML), pages 5606–5615. PMLR, 2021
work page 2021
-
[38]
Unsupervised representation learning for time series with temporal neighborhood coding
Sana Tonekaboni, Danny Eytan, and Anna Goldenberg. Unsupervised representation learning for time series with temporal neighborhood coding. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[39]
Ts2vec: Towards universal representation of time series
Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 8980–8987, 2022
work page 2022
-
[40]
A time series is worth 64 words: Long-term forecasting with transformers
Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[41]
Moment: A family of open time-series foundation models
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models. InInternational Conference on Machine Learning, pages 16115–16152. PMLR, 2024
work page 2024
-
[42]
Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. Simmtm: A simple pre-training framework for masked time-series modeling.Advances in Neural Information Processing Systems (NeurIPS), 36:29996–30025, 2023
work page 2023
-
[43]
Time-series representation learning via temporal and contextual contrasting
Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. Time-series representation learning via temporal and contextual contrasting. InProceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI), pages 2352–2359. International Joint Conferences on Artificial Intelligence Orga...
work page 2021
-
[44]
Yeongyeon Na, Minje Park, Yunwon Tae, and Sunghoon Joo. Guiding masked representa- tion learning to capture spatio-temporal relationship of electrocardiogram. InThe Twelfth International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[45]
Contrastive intra-and inter-modality generation for enhancing incomplete multimedia recommendation
Zhenghong Lin, Yanchao Tan, Yunfei Zhan, Weiming Liu, Fan Wang, Chaochao Chen, Shiping Wang, and Carl Yang. Contrastive intra-and inter-modality generation for enhancing incomplete multimedia recommendation. InProceedings of the 31st ACM International Conference on Multimedia (MM), pages 6234–6242, 2023
work page 2023
-
[46]
Qi Shen, Junchang Xin, Bing T Dai, Shudi Zhang, and Zhiqiong Wang. Robust sleep staging over incomplete multimodal physiological signals via contrastive imagination.Advances in Neural Information Processing Systems (NeurIPS), 37:112025–112049, 2024
work page 2024
-
[47]
Babel: A scalable pre-trained model for multi-modal sensing via expandable modality alignment
Shenghong Dai, Shiqi Jiang, Yifan Yang, Ting Cao, Mo Li, Suman Banerjee, and Lili Qiu. Babel: A scalable pre-trained model for multi-modal sensing via expandable modality alignment. InProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys), pages 240–253, 2025
work page 2025
-
[48]
Xiaomin Ouyang, Jason Wu, Tomoyoshi Kimura, Yihan Lin, Gunjan Verma, Tarek Abdelzaher, and Mani Srivastava. Mmbind: Unleashing the potential of distributed and heterogeneous data for multimodal learning in iot. InProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys), pages 491–503, 2025
work page 2025
-
[49]
Maestro: Adaptive sparse attention and robust learning for multimodal dynamic time series
Payal Mohapatra, Yueyuan Sui, Akash Pandey, Stephen Xia, and Qi Zhu. Maestro: Adaptive sparse attention and robust learning for multimodal dynamic time series. InAdvances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[50]
Multimodal patient representation learning with missing modalities and labels
Zhenbang Wu, Anant Dadu, Nicholas Tustison, Brian Avants, Mike Nalls, Jimeng Sun, and Faraz Faghri. Multimodal patient representation learning with missing modalities and labels. In The Twelfth International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[51]
Incomplete multimodality-diffused emotion recognition
Yuanzhi Wang, Yong Li, and Zhen Cui. Incomplete multimodality-diffused emotion recognition. Advances in Neural Information Processing Systems (NeurIPS), 36:17117–17128, 2023
work page 2023
-
[52]
Paul Pu Liang, Zihao Deng, Martin Q Ma, James Y Zou, Louis-Philippe Morency, and Ruslan Salakhutdinov. Factorized contrastive learning: Going beyond multi-view redundancy.Advances in Neural Information Processing Systems (NeurIPS), 36:32971–32998, 2023. 13 A Related Works Wearable / Physiological Foundation Models and Self-Supervised Pretraining.Self-supe...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.