BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed Modalities

Ahmed M. Abdelmoniem; Arnab K. Paul; Jayant Chandwani; Pranav M R

arxiv: 2603.27552 · v2 · submitted 2026-03-29 · 💻 cs.LG · cs.DC

BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed Modalities

Pranav M R , Jayant Chandwani , Ahmed M. Abdelmoniem , Arnab K. Paul This is my paper

Pith reviewed 2026-05-14 21:24 UTC · model grok-4.3

classification 💻 cs.LG cs.DC

keywords multimodal federated learningblock-wise aggregationmodality sparsitypartial personalizationheterogeneous clientsshared modalitiesmissing modalities

0 comments

The pith

Block-wise aggregation of shared model components while keeping task-specific blocks private enables effective multimodal federated learning when clients observe different modality subsets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

BLOSSOM partitions multimodal models into shared blocks that aggregate across clients and private blocks that remain local to handle arbitrary missing modalities without extra coordination. This selective strategy addresses heterogeneity by enabling partial personalization instead of forcing full-model averaging. Experiments across multiple datasets demonstrate that the method delivers 18.7 percent average gains in modality-incomplete scenarios and 37.7 percent gains in modality-exclusive settings relative to standard aggregation. The framework is task-agnostic and supports flexible sharing of components based on observed data. These results indicate that clean separation of shared and private blocks is sufficient to mitigate the performance drop caused by sparse modality overlap.

Core claim

BLOSSOM enables block-wise aggregation in multimodal federated learning by selectively combining only the shared model components across clients while keeping task-specific blocks private to each client. This allows effective learning under conditions where modalities are shared but sparsely observed, leading to better performance than full model averaging in heterogeneous settings.

What carries the argument

Block-wise aggregation strategy that partitions the model into shared blocks aggregated across clients and task-specific blocks kept private.

If this is right

Selective aggregation prevents negative transfer from clients missing entire modalities.
Partial personalization allows each client to retain performance on its available modality subset.
The approach scales to any number of modality combinations without requiring synchronized participation.
Performance advantage grows as modality sparsity increases across the client population.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selective-aggregation logic could apply to other heterogeneous federated settings such as varying label distributions or sensor types.
Dynamic identification of which blocks are truly shared versus private might further improve results when modality overlap is unknown in advance.
Real-world deployments in healthcare could test whether the performance gains persist under stricter privacy constraints on model updates.
Extending the partition to include modality-specific encoders might reduce communication cost while preserving the observed accuracy benefits.

Load-bearing premise

Model components can be cleanly partitioned into shared and task-specific blocks that remain effective when aggregated selectively.

What would settle it

A controlled experiment in which clients hold completely disjoint modalities and block-wise aggregation produces no accuracy gain or outright degradation relative to full-model aggregation.

Figures

Figures reproduced from arXiv: 2603.27552 by Ahmed M. Abdelmoniem, Arnab K. Paul, Jayant Chandwani, Pranav M R.

**Figure 1.** Figure 1: Illustration of the BLOSSOM framework under the three block-wise aggregation modes. integrates high-level representations at a later stage, making it more robust to missing-modality scenarios [10]. These distinctions are critical in federated settings with heterogeneous modality availability across clients. Standard Federated Averaging (FedAvg) [1] assumes homogeneous model architectures and synchronized… view at source ↗

**Figure 2.** Figure 2: Modality-wise validation F1 across training rounds on PTBXL under the 3–3–4 NIID setting. Colours indicate client modality availability (unimodal vs. multimodal), and line styles denote aggregation modes (FM, PH, PHF). An important question in structurally sparse multimodal federated settings is whether modality-incomplete clients meaningfully contribute to global performance. We investigate this using… view at source ↗

read the original abstract

Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often missing modalities. However, most existing FL approaches assume uniform modality availability, limiting their applicability in practice. We introduce BLOSSOM, a task-agnostic framework for multimodal FL designed to operate under shared and sparsely observed modality conditions. BLOSSOM supports clients with arbitrary modality subsets and enables flexible sharing of model components. To address client and task heterogeneity, we propose a block-wise aggregation strategy that selectively aggregates shared components while keeping task-specific blocks private, enabling partial personalization. We evaluate BLOSSOM on multiple diverse multimodal datasets and analyse the effects of missing modalities and personalization. Our results show that block-wise personalization significantly improves performance, particularly in settings with severe modality sparsity. In modality-incomplete scenarios, BLOSSOM achieves an average performance gain of 18.7% over full-model aggregation, while in modality-exclusive settings the gain increases to 37.7%, highlighting the importance of block-wise learning for practical multimodal FL systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BLOSSOM's block-wise aggregation for partial personalization in sparse multimodal FL is a straightforward idea that targets a real gap, but the abstract gives almost no experimental details so the 18.7% and 37.7% gains are impossible to assess yet.

read the letter

The main point on this paper is that BLOSSOM tries to fix a practical problem in multimodal federated learning: clients often have different and incomplete sets of modalities, yet most methods assume everyone has the same data. It does this with a block-wise aggregation rule that shares only the common model parts across clients while keeping task-specific blocks local for personalization. That split is presented as task-agnostic and able to work with arbitrary modality subsets without extra coordination.

Referee Report

2 major / 0 minor

Summary. The paper introduces BLOSSOM, a task-agnostic framework for multimodal federated learning under shared and sparsely observed modalities. It supports clients with arbitrary modality subsets via a block-wise aggregation strategy that selectively shares common model components while keeping task-specific blocks private, and reports average performance gains of 18.7% over full-model aggregation in modality-incomplete scenarios and 37.7% in modality-exclusive settings across multiple datasets.

Significance. If the empirical results prove robust, BLOSSOM would represent a meaningful advance for practical multimodal FL in heterogeneous real-world settings such as healthcare and autonomous systems. The block-wise personalization mechanism directly targets modality sparsity without requiring uniform client participation or additional coordination, addressing a limitation of prior FL methods that assume complete modality availability.

major comments (2)

[Abstract] Abstract: the specific average gains of 18.7% and 37.7% are stated without any information on experimental setup, number of runs, baselines, statistical tests, or dataset characteristics, preventing assessment of whether the data supports the claims.
[Method] Method description: the central claim that block-wise aggregation enables effective partial personalization rests on the assumption that model components can be cleanly and consistently partitioned into shared versus task-specific blocks that remain effective under arbitrary modality subsets; the manuscript provides insufficient detail on how blocks are identified (fixed architecture split, learned masks, or per-modality) and how representations are aligned when modalities are missing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of the manuscript. We address each major point below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the specific average gains of 18.7% and 37.7% are stated without any information on experimental setup, number of runs, baselines, statistical tests, or dataset characteristics, preventing assessment of whether the data supports the claims.

Authors: We agree that the abstract would benefit from additional context. In the revised version we will add a concise clause specifying the evaluation across five multimodal datasets (including healthcare and autonomous driving benchmarks), results averaged over five independent runs with different random seeds, comparison against full-model aggregation and standard multimodal FL baselines, and that standard deviations are reported in the main text. This provides the necessary information without violating abstract length limits; full experimental protocols remain in Section 4. revision: yes
Referee: [Method] Method description: the central claim that block-wise aggregation enables effective partial personalization rests on the assumption that model components can be cleanly and consistently partitioned into shared versus task-specific blocks that remain effective under arbitrary modality subsets; the manuscript provides insufficient detail on how blocks are identified (fixed architecture split, learned masks, or per-modality) and how representations are aligned when modalities are missing.

Authors: The manuscript already specifies a fixed architecture split in Section 3.1: modality-specific encoder blocks remain private while the shared fusion module and task head are aggregated only among clients possessing the corresponding modalities. Alignment for missing modalities is achieved via modality dropout during local training combined with zero-padding of absent modality inputs to maintain fixed block dimensions. To further address the referee's concern we will insert an explicit subsection with pseudocode illustrating the block-wise aggregation rule under arbitrary subsets and will add a sentence clarifying that no learned masks are used. revision: partial

Circularity Check

0 steps flagged

No circularity detected; claims rest on empirical evaluation of proposed block-wise method

full rationale

The paper defines BLOSSOM as a new task-agnostic framework that partitions models into shared and task-specific blocks and applies selective aggregation. Performance gains (18.7% and 37.7%) are reported from experiments on multimodal datasets under modality-incomplete and modality-exclusive settings. No equations or derivations reduce the central claim to a self-definition, fitted parameter renamed as prediction, or self-citation chain; the block partitioning and aggregation rules are presented as design choices whose effectiveness is measured externally rather than presupposed by construction. The derivation chain is therefore self-contained against the reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no free parameters, invented entities, or additional axioms beyond the stated domain assumption can be extracted.

axioms (1)

domain assumption Clients possess varying and often missing modalities in multimodal federated learning settings
Directly stated in the abstract as the core practical challenge addressed by the framework.

pith-pipeline@v0.9.0 · 5505 in / 1070 out tokens · 37830 ms · 2026-05-14T21:24:13.568032+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,Communication-efficient learning of deep networks from decentralized data, 2023

work page 2023
[2]

D. C. Nguyen et al.,Federated learning for smart healthcare: A survey, 2021

work page 2021
[3]

Federated learning in robotic and autonomous systems,

Y . Xianjia, J. P. Queralta, J. Heikkonen, and T. Westerlund, “Federated learning in robotic and autonomous systems,”Procedia Computer Science, vol. 191, 2021

work page 2021
[4]

Federated learning in mobile edge networks: A comprehensive survey,

W. Y . B. Lim et al., “Federated learning in mobile edge networks: A comprehensive survey,”IEEE Communications Surveys and Tutorials, vol. 22, 2020

work page 2020
[5]

Multimodal machine learning: A survey and taxonomy,

T. Baltru ˇsaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, 2019

work page 2019
[6]

A survey of multimodal federated learning: Background, applications, and perspectives,

H. Pan, X. Zhao, L. He, Y . Shi, and X. Lin, “A survey of multimodal federated learning: Background, applications, and perspectives,”Mul- timedia Syst., vol. 30, 2024

work page 2024
[7]

Harmony: Heterogeneous multi-modal federated learning through disentangled model training,

X. Ouyang et al., “Harmony: Heterogeneous multi-modal federated learning through disentangled model training,” inProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, 2023

work page 2023
[8]

Q. Li, Y . Diao, Q. Chen, and B. He,Federated learning on non-iid data silos: An experimental study, 2021

work page 2021
[9]

Pillutla, K

K. Pillutla, K. Malik, A. Mohamed, M. Rabbat, M. Sanjabi, and L. Xiao,Federated learning with partial model personalization, 2022

work page 2022
[10]

Multi- modal federated learning: Concept, methods, applications and future directions,

W. Huang, D. Wang, X. Ouyang, J. Wan, J. Liu, and T. Li, “Multi- modal federated learning: Concept, methods, applications and future directions,”Information Fusion, vol. 112, 2024

work page 2024
[11]

Yuan, D.-J

L. Yuan, D.-J. Han, V . P. Chellapandi, S. H. ˙Zak, and C. G. Brinton, Fedmfs: Federated multimodal fusion learning with selective modality communication, 2024

work page 2024
[12]

Q. Yu, Y . Liu, Y . Wang, K. Xu, and J. Liu,Multimodal federated learning via contrastive representation ensemble, 2023

work page 2023
[13]

Compre- hensive semi-supervised multi-modal learning,

Y . Yang, K.-T. Wang, D.-C. Zhan, H. Xiong, and Y . Jiang, “Compre- hensive semi-supervised multi-modal learning,” inProceedings of the 28th International Joint Conference on Artificial Intelligence, 2019

work page 2019
[14]

Y . Zhao, P. Barnaghi, and H. Haddadi,Multimodal federated learning on iot data, 2022

work page 2022
[15]

K. Yue, R. Jin, C.-W. Wong, D. Baron, and H. Dai,Gradient obfus- cation gives a false sense of security in federated learning, 2022

work page 2022
[16]

Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks,

J. Chen and A. Zhang, “Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022

work page 2022
[17]

Feng et al.,Fedmultimodal: A benchmark for multimodal federated learning, 2023

T. Feng et al.,Fedmultimodal: A benchmark for multimodal federated learning, 2023

work page 2023
[18]

Learning to associate: Multimodal inference with fully missing modalities,

J. Geraghty, A. Hines, and F. Golpayegani, “Learning to associate: Multimodal inference with fully missing modalities,”ACM Trans. Intell. Syst. Technol., vol. 16, 2025

work page 2025
[19]

Reddi et al.,Adaptive federated optimization, 2021

S. Reddi et al.,Adaptive federated optimization, 2021

work page 2021
[20]

Ku-har: An open dataset for hetero- geneous human activity recognition,

N. Sikder and A.-A. Nahid, “Ku-har: An open dataset for hetero- geneous human activity recognition,”Pattern Recognition Letters, vol. 146, 2021

work page 2021
[21]

A public domain dataset for real-life human activity recognition using smartphone sensors,

D. Garcia-Gonzalez, D. Rivero, E. Fernandez-Blanco, and M. R. Lu- aces, “A public domain dataset for real-life human activity recognition using smartphone sensors,”Sensors, vol. 20, 2020

work page 2020
[22]

PTB-XL, a large publicly available electrocardiog- raphy dataset,

P. Wagner et al., “PTB-XL, a large publicly available electrocardiog- raphy dataset,”Sci. Data, vol. 7, 2020

work page 2020
[23]

Vielzeuf, A

V . Vielzeuf, A. Lechervy, S. Pateux, and F. Jurie,Centralnet: A multilayer approach for multimodal fusion, 2018

work page 2018
[24]

Poria, D

S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea,Meld: A multimodal multi-party dataset for emotion recognition in conversations, 2019

work page 2019
[25]

Iemocap: Interactive emotional dyadic motion capture database,

C. Busso et al., “Iemocap: Interactive emotional dyadic motion capture database,”Language Resources and Evaluation, vol. 42, 2008

work page 2008
[26]

D. J. Beutel et al.,Flower: A friendly federated learning research framework, 2020

work page 2020
[27]

Yadan,Hydra - a framework for elegantly configuring complex applications, 2019

O. Yadan,Hydra - a framework for elegantly configuring complex applications, 2019

work page 2019
[28]

S. Padi, S. O. Sadjadi, D. Manocha, and R. D. Sriram,Multimodal emotion recognition using transfer learning from speaker recognition and bert-based models, 2022

work page 2022

[1] [1]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,Communication-efficient learning of deep networks from decentralized data, 2023

work page 2023

[2] [2]

D. C. Nguyen et al.,Federated learning for smart healthcare: A survey, 2021

work page 2021

[3] [3]

Federated learning in robotic and autonomous systems,

Y . Xianjia, J. P. Queralta, J. Heikkonen, and T. Westerlund, “Federated learning in robotic and autonomous systems,”Procedia Computer Science, vol. 191, 2021

work page 2021

[4] [4]

Federated learning in mobile edge networks: A comprehensive survey,

W. Y . B. Lim et al., “Federated learning in mobile edge networks: A comprehensive survey,”IEEE Communications Surveys and Tutorials, vol. 22, 2020

work page 2020

[5] [5]

Multimodal machine learning: A survey and taxonomy,

T. Baltru ˇsaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, 2019

work page 2019

[6] [6]

A survey of multimodal federated learning: Background, applications, and perspectives,

H. Pan, X. Zhao, L. He, Y . Shi, and X. Lin, “A survey of multimodal federated learning: Background, applications, and perspectives,”Mul- timedia Syst., vol. 30, 2024

work page 2024

[7] [7]

Harmony: Heterogeneous multi-modal federated learning through disentangled model training,

X. Ouyang et al., “Harmony: Heterogeneous multi-modal federated learning through disentangled model training,” inProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, 2023

work page 2023

[8] [8]

Q. Li, Y . Diao, Q. Chen, and B. He,Federated learning on non-iid data silos: An experimental study, 2021

work page 2021

[9] [9]

Pillutla, K

K. Pillutla, K. Malik, A. Mohamed, M. Rabbat, M. Sanjabi, and L. Xiao,Federated learning with partial model personalization, 2022

work page 2022

[10] [10]

Multi- modal federated learning: Concept, methods, applications and future directions,

W. Huang, D. Wang, X. Ouyang, J. Wan, J. Liu, and T. Li, “Multi- modal federated learning: Concept, methods, applications and future directions,”Information Fusion, vol. 112, 2024

work page 2024

[11] [11]

Yuan, D.-J

L. Yuan, D.-J. Han, V . P. Chellapandi, S. H. ˙Zak, and C. G. Brinton, Fedmfs: Federated multimodal fusion learning with selective modality communication, 2024

work page 2024

[12] [12]

Q. Yu, Y . Liu, Y . Wang, K. Xu, and J. Liu,Multimodal federated learning via contrastive representation ensemble, 2023

work page 2023

[13] [13]

Compre- hensive semi-supervised multi-modal learning,

Y . Yang, K.-T. Wang, D.-C. Zhan, H. Xiong, and Y . Jiang, “Compre- hensive semi-supervised multi-modal learning,” inProceedings of the 28th International Joint Conference on Artificial Intelligence, 2019

work page 2019

[14] [14]

Y . Zhao, P. Barnaghi, and H. Haddadi,Multimodal federated learning on iot data, 2022

work page 2022

[15] [15]

K. Yue, R. Jin, C.-W. Wong, D. Baron, and H. Dai,Gradient obfus- cation gives a false sense of security in federated learning, 2022

work page 2022

[16] [16]

Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks,

J. Chen and A. Zhang, “Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022

work page 2022

[17] [17]

Feng et al.,Fedmultimodal: A benchmark for multimodal federated learning, 2023

T. Feng et al.,Fedmultimodal: A benchmark for multimodal federated learning, 2023

work page 2023

[18] [18]

Learning to associate: Multimodal inference with fully missing modalities,

J. Geraghty, A. Hines, and F. Golpayegani, “Learning to associate: Multimodal inference with fully missing modalities,”ACM Trans. Intell. Syst. Technol., vol. 16, 2025

work page 2025

[19] [19]

Reddi et al.,Adaptive federated optimization, 2021

S. Reddi et al.,Adaptive federated optimization, 2021

work page 2021

[20] [20]

Ku-har: An open dataset for hetero- geneous human activity recognition,

N. Sikder and A.-A. Nahid, “Ku-har: An open dataset for hetero- geneous human activity recognition,”Pattern Recognition Letters, vol. 146, 2021

work page 2021

[21] [21]

A public domain dataset for real-life human activity recognition using smartphone sensors,

D. Garcia-Gonzalez, D. Rivero, E. Fernandez-Blanco, and M. R. Lu- aces, “A public domain dataset for real-life human activity recognition using smartphone sensors,”Sensors, vol. 20, 2020

work page 2020

[22] [22]

PTB-XL, a large publicly available electrocardiog- raphy dataset,

P. Wagner et al., “PTB-XL, a large publicly available electrocardiog- raphy dataset,”Sci. Data, vol. 7, 2020

work page 2020

[23] [23]

Vielzeuf, A

V . Vielzeuf, A. Lechervy, S. Pateux, and F. Jurie,Centralnet: A multilayer approach for multimodal fusion, 2018

work page 2018

[24] [24]

Poria, D

S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea,Meld: A multimodal multi-party dataset for emotion recognition in conversations, 2019

work page 2019

[25] [25]

Iemocap: Interactive emotional dyadic motion capture database,

C. Busso et al., “Iemocap: Interactive emotional dyadic motion capture database,”Language Resources and Evaluation, vol. 42, 2008

work page 2008

[26] [26]

D. J. Beutel et al.,Flower: A friendly federated learning research framework, 2020

work page 2020

[27] [27]

Yadan,Hydra - a framework for elegantly configuring complex applications, 2019

O. Yadan,Hydra - a framework for elegantly configuring complex applications, 2019

work page 2019

[28] [28]

S. Padi, S. O. Sadjadi, D. Manocha, and R. D. Sriram,Multimodal emotion recognition using transfer learning from speaker recognition and bert-based models, 2022

work page 2022