BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed Modalities
Pith reviewed 2026-05-14 21:24 UTC · model grok-4.3
The pith
Block-wise aggregation of shared model components while keeping task-specific blocks private enables effective multimodal federated learning when clients observe different modality subsets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BLOSSOM enables block-wise aggregation in multimodal federated learning by selectively combining only the shared model components across clients while keeping task-specific blocks private to each client. This allows effective learning under conditions where modalities are shared but sparsely observed, leading to better performance than full model averaging in heterogeneous settings.
What carries the argument
Block-wise aggregation strategy that partitions the model into shared blocks aggregated across clients and task-specific blocks kept private.
If this is right
- Selective aggregation prevents negative transfer from clients missing entire modalities.
- Partial personalization allows each client to retain performance on its available modality subset.
- The approach scales to any number of modality combinations without requiring synchronized participation.
- Performance advantage grows as modality sparsity increases across the client population.
Where Pith is reading between the lines
- The same selective-aggregation logic could apply to other heterogeneous federated settings such as varying label distributions or sensor types.
- Dynamic identification of which blocks are truly shared versus private might further improve results when modality overlap is unknown in advance.
- Real-world deployments in healthcare could test whether the performance gains persist under stricter privacy constraints on model updates.
- Extending the partition to include modality-specific encoders might reduce communication cost while preserving the observed accuracy benefits.
Load-bearing premise
Model components can be cleanly partitioned into shared and task-specific blocks that remain effective when aggregated selectively.
What would settle it
A controlled experiment in which clients hold completely disjoint modalities and block-wise aggregation produces no accuracy gain or outright degradation relative to full-model aggregation.
Figures
read the original abstract
Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often missing modalities. However, most existing FL approaches assume uniform modality availability, limiting their applicability in practice. We introduce BLOSSOM, a task-agnostic framework for multimodal FL designed to operate under shared and sparsely observed modality conditions. BLOSSOM supports clients with arbitrary modality subsets and enables flexible sharing of model components. To address client and task heterogeneity, we propose a block-wise aggregation strategy that selectively aggregates shared components while keeping task-specific blocks private, enabling partial personalization. We evaluate BLOSSOM on multiple diverse multimodal datasets and analyse the effects of missing modalities and personalization. Our results show that block-wise personalization significantly improves performance, particularly in settings with severe modality sparsity. In modality-incomplete scenarios, BLOSSOM achieves an average performance gain of 18.7% over full-model aggregation, while in modality-exclusive settings the gain increases to 37.7%, highlighting the importance of block-wise learning for practical multimodal FL systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BLOSSOM, a task-agnostic framework for multimodal federated learning under shared and sparsely observed modalities. It supports clients with arbitrary modality subsets via a block-wise aggregation strategy that selectively shares common model components while keeping task-specific blocks private, and reports average performance gains of 18.7% over full-model aggregation in modality-incomplete scenarios and 37.7% in modality-exclusive settings across multiple datasets.
Significance. If the empirical results prove robust, BLOSSOM would represent a meaningful advance for practical multimodal FL in heterogeneous real-world settings such as healthcare and autonomous systems. The block-wise personalization mechanism directly targets modality sparsity without requiring uniform client participation or additional coordination, addressing a limitation of prior FL methods that assume complete modality availability.
major comments (2)
- [Abstract] Abstract: the specific average gains of 18.7% and 37.7% are stated without any information on experimental setup, number of runs, baselines, statistical tests, or dataset characteristics, preventing assessment of whether the data supports the claims.
- [Method] Method description: the central claim that block-wise aggregation enables effective partial personalization rests on the assumption that model components can be cleanly and consistently partitioned into shared versus task-specific blocks that remain effective under arbitrary modality subsets; the manuscript provides insufficient detail on how blocks are identified (fixed architecture split, learned masks, or per-modality) and how representations are aligned when modalities are missing.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify key aspects of the manuscript. We address each major point below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the specific average gains of 18.7% and 37.7% are stated without any information on experimental setup, number of runs, baselines, statistical tests, or dataset characteristics, preventing assessment of whether the data supports the claims.
Authors: We agree that the abstract would benefit from additional context. In the revised version we will add a concise clause specifying the evaluation across five multimodal datasets (including healthcare and autonomous driving benchmarks), results averaged over five independent runs with different random seeds, comparison against full-model aggregation and standard multimodal FL baselines, and that standard deviations are reported in the main text. This provides the necessary information without violating abstract length limits; full experimental protocols remain in Section 4. revision: yes
-
Referee: [Method] Method description: the central claim that block-wise aggregation enables effective partial personalization rests on the assumption that model components can be cleanly and consistently partitioned into shared versus task-specific blocks that remain effective under arbitrary modality subsets; the manuscript provides insufficient detail on how blocks are identified (fixed architecture split, learned masks, or per-modality) and how representations are aligned when modalities are missing.
Authors: The manuscript already specifies a fixed architecture split in Section 3.1: modality-specific encoder blocks remain private while the shared fusion module and task head are aggregated only among clients possessing the corresponding modalities. Alignment for missing modalities is achieved via modality dropout during local training combined with zero-padding of absent modality inputs to maintain fixed block dimensions. To further address the referee's concern we will insert an explicit subsection with pseudocode illustrating the block-wise aggregation rule under arbitrary subsets and will add a sentence clarifying that no learned masks are used. revision: partial
Circularity Check
No circularity detected; claims rest on empirical evaluation of proposed block-wise method
full rationale
The paper defines BLOSSOM as a new task-agnostic framework that partitions models into shared and task-specific blocks and applies selective aggregation. Performance gains (18.7% and 37.7%) are reported from experiments on multimodal datasets under modality-incomplete and modality-exclusive settings. No equations or derivations reduce the central claim to a self-definition, fitted parameter renamed as prediction, or self-citation chain; the block partitioning and aggregation rules are presented as design choices whose effectiveness is measured externally rather than presupposed by construction. The derivation chain is therefore self-contained against the reported benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Clients possess varying and often missing modalities in multimodal federated learning settings
Reference graph
Works this paper leans on
-
[1]
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,Communication-efficient learning of deep networks from decentralized data, 2023
work page 2023
-
[2]
D. C. Nguyen et al.,Federated learning for smart healthcare: A survey, 2021
work page 2021
-
[3]
Federated learning in robotic and autonomous systems,
Y . Xianjia, J. P. Queralta, J. Heikkonen, and T. Westerlund, “Federated learning in robotic and autonomous systems,”Procedia Computer Science, vol. 191, 2021
work page 2021
-
[4]
Federated learning in mobile edge networks: A comprehensive survey,
W. Y . B. Lim et al., “Federated learning in mobile edge networks: A comprehensive survey,”IEEE Communications Surveys and Tutorials, vol. 22, 2020
work page 2020
-
[5]
Multimodal machine learning: A survey and taxonomy,
T. Baltru ˇsaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, 2019
work page 2019
-
[6]
A survey of multimodal federated learning: Background, applications, and perspectives,
H. Pan, X. Zhao, L. He, Y . Shi, and X. Lin, “A survey of multimodal federated learning: Background, applications, and perspectives,”Mul- timedia Syst., vol. 30, 2024
work page 2024
-
[7]
Harmony: Heterogeneous multi-modal federated learning through disentangled model training,
X. Ouyang et al., “Harmony: Heterogeneous multi-modal federated learning through disentangled model training,” inProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, 2023
work page 2023
-
[8]
Q. Li, Y . Diao, Q. Chen, and B. He,Federated learning on non-iid data silos: An experimental study, 2021
work page 2021
-
[9]
K. Pillutla, K. Malik, A. Mohamed, M. Rabbat, M. Sanjabi, and L. Xiao,Federated learning with partial model personalization, 2022
work page 2022
-
[10]
Multi- modal federated learning: Concept, methods, applications and future directions,
W. Huang, D. Wang, X. Ouyang, J. Wan, J. Liu, and T. Li, “Multi- modal federated learning: Concept, methods, applications and future directions,”Information Fusion, vol. 112, 2024
work page 2024
-
[11]
L. Yuan, D.-J. Han, V . P. Chellapandi, S. H. ˙Zak, and C. G. Brinton, Fedmfs: Federated multimodal fusion learning with selective modality communication, 2024
work page 2024
-
[12]
Q. Yu, Y . Liu, Y . Wang, K. Xu, and J. Liu,Multimodal federated learning via contrastive representation ensemble, 2023
work page 2023
-
[13]
Compre- hensive semi-supervised multi-modal learning,
Y . Yang, K.-T. Wang, D.-C. Zhan, H. Xiong, and Y . Jiang, “Compre- hensive semi-supervised multi-modal learning,” inProceedings of the 28th International Joint Conference on Artificial Intelligence, 2019
work page 2019
-
[14]
Y . Zhao, P. Barnaghi, and H. Haddadi,Multimodal federated learning on iot data, 2022
work page 2022
-
[15]
K. Yue, R. Jin, C.-W. Wong, D. Baron, and H. Dai,Gradient obfus- cation gives a false sense of security in federated learning, 2022
work page 2022
-
[16]
Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks,
J. Chen and A. Zhang, “Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022
work page 2022
-
[17]
Feng et al.,Fedmultimodal: A benchmark for multimodal federated learning, 2023
T. Feng et al.,Fedmultimodal: A benchmark for multimodal federated learning, 2023
work page 2023
-
[18]
Learning to associate: Multimodal inference with fully missing modalities,
J. Geraghty, A. Hines, and F. Golpayegani, “Learning to associate: Multimodal inference with fully missing modalities,”ACM Trans. Intell. Syst. Technol., vol. 16, 2025
work page 2025
-
[19]
Reddi et al.,Adaptive federated optimization, 2021
S. Reddi et al.,Adaptive federated optimization, 2021
work page 2021
-
[20]
Ku-har: An open dataset for hetero- geneous human activity recognition,
N. Sikder and A.-A. Nahid, “Ku-har: An open dataset for hetero- geneous human activity recognition,”Pattern Recognition Letters, vol. 146, 2021
work page 2021
-
[21]
A public domain dataset for real-life human activity recognition using smartphone sensors,
D. Garcia-Gonzalez, D. Rivero, E. Fernandez-Blanco, and M. R. Lu- aces, “A public domain dataset for real-life human activity recognition using smartphone sensors,”Sensors, vol. 20, 2020
work page 2020
-
[22]
PTB-XL, a large publicly available electrocardiog- raphy dataset,
P. Wagner et al., “PTB-XL, a large publicly available electrocardiog- raphy dataset,”Sci. Data, vol. 7, 2020
work page 2020
-
[23]
V . Vielzeuf, A. Lechervy, S. Pateux, and F. Jurie,Centralnet: A multilayer approach for multimodal fusion, 2018
work page 2018
- [24]
-
[25]
Iemocap: Interactive emotional dyadic motion capture database,
C. Busso et al., “Iemocap: Interactive emotional dyadic motion capture database,”Language Resources and Evaluation, vol. 42, 2008
work page 2008
-
[26]
D. J. Beutel et al.,Flower: A friendly federated learning research framework, 2020
work page 2020
-
[27]
Yadan,Hydra - a framework for elegantly configuring complex applications, 2019
O. Yadan,Hydra - a framework for elegantly configuring complex applications, 2019
work page 2019
-
[28]
S. Padi, S. O. Sadjadi, D. Manocha, and R. D. Sriram,Multimodal emotion recognition using transfer learning from speaker recognition and bert-based models, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.