pith. sign in

arxiv: 2605.16099 · v1 · pith:ZTZFZCNQnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI

Federated Imputation under Heterogeneous Feature Spaces

Pith reviewed 2026-05-20 20:03 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords Federated LearningMissing Data ImputationHeterogeneous Feature SpacesFeature GraphsMessage PassingTabular DataDecentralized Training
0
0 comments X

The pith

A shared global feature graph enables indirect knowledge transfer for federated imputation even when clients observe disjoint feature sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses federated imputation when clients hold only partially overlapping feature subsets, a situation that defeats standard parameter-averaging methods because they cannot transfer information across non-shared features. It separates the structural absence of features from ordinary missing values and builds a single global feature graph that records statistical relations among all features across clients. Message passing over this graph then moves imputation information from observed features to related but unobserved ones, allowing indirect transfer without ever requiring joint observation on any client. The approach keeps ordinary federated communication rounds unchanged. Tests under simulated partial overlap show clear gains in root-mean-square error on two of the three evaluated datasets.

Core claim

FedHF-Impute constructs a shared global feature graph from the union of client feature spaces and applies message passing over the graph to propagate imputation estimates between statistically related features, thereby permitting knowledge transfer across clients whose local feature observations never intersect while preserving standard federated averaging.

What carries the argument

shared global feature graph encoding statistical relations among features, with message passing to propagate imputation estimates across non-overlapping client views

If this is right

  • Imputation can proceed collaboratively even when clients never share the same feature schema.
  • Standard federated communication protocols remain sufficient for the transfer.
  • Root-mean-square error improves by 26.9 percent on SECOM and 8.4 percent on AirQuality under partial feature overlap.
  • Performance stays within 0.3 percent of the best baseline on fully aligned data such as PhysioNET.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-propagation idea could support other federated tasks such as classification when feature spaces are heterogeneous.
  • Real deployments would still require a secure way to assemble the global graph without exposing raw client data.
  • The method might reduce the preprocessing burden of forcing feature alignment before federated training begins.

Load-bearing premise

The relations captured in the global feature graph are sufficiently strong and accurate to carry useful imputation information between features that no single client ever observes together.

What would settle it

Constructing a feature graph on data where connected features show no statistical dependence and observing that the resulting imputation error equals or exceeds that of non-graph federated baselines.

Figures

Figures reproduced from arXiv: 2605.16099 by Chaimaa Medjadji, Gregoire Danoy, Imane Hocine, Sylvain Kubler, Yves Le Traon.

Figure 1
Figure 1. Figure 1: FedHF-Impute overview: Federated imputation with heterogeneous feature spaces [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Imputation performance (RMSE ↓) of FedHF-Impute vs the other baselines on three datasets: AirQuality, PhysioNET and SECOM. port variability across repeated runs whenever available. This distinction reflects the different evaluation units of centralized local imputers and federated global imputers, but all methods are scored on identical corrupted entries. For centralized baselines, we report the RMSE and s… view at source ↗
Figure 3
Figure 3. Figure 3: FedHF-Impute convergence across federated rounds on AirQuality, PhysioNET, and SECOM. Each curve shows the validation RMSE ( [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Federated Learning (FL) enables collaborative training across decentralized clients, but most methods assume aligned feature schemas, an assumption that rarely holds in tabular settings where clients observe only partially overlapping feature subsets. In these heterogeneous feature spaces, parameter-averaging methods (e.g., FedAvg) transfer little information across weakly overlapping or disjoint feature groups, limiting their effectiveness for federated imputation. To overcome this, we propose \textbf{FedHF-Impute}, a federated imputation framework that separates structural feature unavailability from conventional missingness and uses a shared global feature graph to propagate information across statistically related features through message passing. This enables indirect cross-client knowledge transfer, even when features are never jointly observed locally, while preserving standard federated communication. Under simulated partial schema overlap on the SECOM and AirQuality datasets, FedHF-Impute improves imputation accuracy (RMSE) over FL baselines by 26.9\%, and 8.4\% respectively, while achieving comparable performance on PhysioNET, with only a 0.3\% difference relative to the best baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces FedHF-Impute, a federated imputation framework for heterogeneous feature spaces where clients observe only partially overlapping feature subsets. It separates structural unavailability from conventional missingness and employs a shared global feature graph with message passing to enable indirect cross-client knowledge transfer even when features are never jointly observed locally, while preserving standard federated communication. The paper reports RMSE improvements of 26.9% on SECOM and 8.4% on AirQuality over FL baselines under simulated partial schema overlap, with comparable performance on PhysioNET.

Significance. If the central claims hold after clarification, the work could meaningfully advance federated learning for tabular data in domains with schema heterogeneity, such as healthcare or manufacturing, by providing a mechanism for indirect imputation transfer without requiring feature alignment or data centralization.

major comments (2)
  1. [Framework description / global feature graph construction] The construction of the shared global feature graph (described in the framework overview) is insufficiently specified. It remains unclear whether edge weights or adjacency are derived from server-side aggregation of per-feature statistics or from an external ontology; either route appears to conflict with the premise of never jointly observed features and the guarantee of standard federated communication without centralization.
  2. [Section 4] Section 4 (Experiments): The reported RMSE improvements (26.9% on SECOM, 8.4% on AirQuality) lack supporting details on the exact protocol for simulating partial schema overlap, baseline implementations, statistical significance tests, variance across runs, and ablations isolating the message-passing component over the global graph. Without these, the empirical support for the central claim cannot be verified.
minor comments (2)
  1. [Method overview] Clarify the precise communication protocol (e.g., what is exchanged in each round) to make the claim of 'preserving standard federated communication' more concrete.
  2. [Framework description] Add formal notation or equations for the message-passing update rule on the global feature graph to improve technical precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and strengthen our manuscript. We address each major comment point by point below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Framework description / global feature graph construction] The construction of the shared global feature graph (described in the framework overview) is insufficiently specified. It remains unclear whether edge weights or adjacency are derived from server-side aggregation of per-feature statistics or from an external ontology; either route appears to conflict with the premise of never jointly observed features and the guarantee of standard federated communication without centralization.

    Authors: We agree that the original description of global feature graph construction is insufficiently precise and could lead to the ambiguities noted. The graph is constructed from server-side aggregation of locally computed per-client feature correlation statistics (via federated averaging of pairwise correlation estimates), without any external ontology or requirement for joint feature observation across clients. This is fully consistent with standard federated communication. In the revised manuscript we have added a dedicated subsection in Section 3 with the exact algorithm, including how local correlations are estimated on each client's partial schema and aggregated at the server while preserving heterogeneity and privacy constraints. revision: yes

  2. Referee: [Section 4] Section 4 (Experiments): The reported RMSE improvements (26.9% on SECOM, 8.4% on AirQuality) lack supporting details on the exact protocol for simulating partial schema overlap, baseline implementations, statistical significance tests, variance across runs, and ablations isolating the message-passing component over the global graph. Without these, the empirical support for the central claim cannot be verified.

    Authors: We acknowledge that the experimental section would benefit from additional transparency to allow full verification. The revised manuscript will include: (i) the precise protocol for simulating partial schema overlap (overlap ratios, feature partitioning method, and random seeds); (ii) hyperparameter settings and implementation details for all baselines; (iii) statistical significance tests (paired t-tests with p-values); (iv) mean and standard deviation of RMSE across five independent runs; and (v) an ablation study isolating the message-passing component. These additions will appear in the main Section 4 and an expanded appendix. revision: yes

Circularity Check

0 steps flagged

No circularity detected; framework derivation is self-contained

full rationale

The paper proposes FedHF-Impute as a novel federated imputation method that introduces a shared global feature graph for message passing to enable indirect knowledge transfer under heterogeneous feature spaces. The abstract and described approach present this as an explicit architectural choice with empirical results on SECOM, AirQuality, and PhysioNET datasets showing RMSE improvements. No load-bearing steps reduce by construction to fitted inputs, self-definitions, or self-citation chains; the central mechanism (graph-based propagation) is presented as an independent modeling decision rather than a renaming or statistical tautology of prior results. The derivation chain remains externally falsifiable via the reported experiments and does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the ability to construct a global feature graph whose edges reflect statistical relations usable for imputation via message passing, an assumption not independently verified in the abstract.

axioms (1)
  • domain assumption Features across clients possess statistical relations that can be represented in a single global graph for effective message passing.
    Invoked to justify indirect knowledge transfer when features are never jointly observed locally.
invented entities (1)
  • shared global feature graph no independent evidence
    purpose: To connect statistically related features so that message passing can propagate imputation information across clients with disjoint local observations.
    New structural element introduced to overcome the limitation of parameter averaging under heterogeneous schemas.

pith-pipeline@v0.9.0 · 5724 in / 1319 out tokens · 83730 ms · 2026-05-20T20:03:59.180218+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Extracting and composing robust features with denoising autoencoders,

    P. Vincent, H. Larochelle, Y . Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” inProc. ICML, 2008, pp. 1096–1103

  2. [2]

    van Buuren,Flexible Imputation of Missing Data

    S. van Buuren,Flexible Imputation of Missing Data. Boca Raton, FL, USA: CRC Press, 2011

  3. [3]

    GAIN: Missing data imputation using generative adversarial nets,

    J. Yoon, J. Jordon, and M. van der Schaar, “GAIN: Missing data imputation using generative adversarial nets,” inProc. 35th Int. Conf. Machine Learning (ICML), Stockholm, Sweden, 2018, pp. 5689–5698

  4. [4]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017

  5. [5]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” inProc. 20th Int. Conf. Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 2017, pp. 1273–1282

  6. [6]

    MIW AE: Deep generative modelling and imputation of incomplete data sets,

    P.-A. Mattei and J. Frellsen, “MIW AE: Deep generative modelling and imputation of incomplete data sets,” inProc. ICML, 2019

  7. [7]

    BRITS: Bidirectional recurrent imputation for time series,

    W. Caoet al., “BRITS: Bidirectional recurrent imputation for time series,” inAdvances in NeurIPS, 2018

  8. [8]

    SAITS: Self-attention-based imputation for time series,

    W. Du, D. Cote, and Y . Liu, “SAITS: Self-attention-based imputation for time series,”Expert Systems with Applications, vol. 219, 2023

  9. [9]

    CSDI: Conditional score-based diffusion models for probabilistic time series imputation,

    Y . Tashiroet al., “CSDI: Conditional score-based diffusion models for probabilistic time series imputation,” inAdvances in NeurIPS, 2021

  10. [10]

    TabTransformer: Tabular Data Modeling Using Contextual Embeddings

    X. Huanget al., “TabTransformer: Tabular data modeling using contex- tual embeddings,” arXiv:2012.06678, 2020

  11. [11]

    Filling the G ap s: Multivariate time series imputation by graph neural networks,

    A. Cini, I. Marisca, and C. Alippi, “Filling the G ap s: Multivariate time series imputation by graph neural networks,” inProc. ICLR, 2022

  12. [12]

    Data imputation with iterative graph reconstruction,

    J. Zhong, N. Gui, and W. Ye, “Data imputation with iterative graph reconstruction,” inProc. AAAI, 2023

  13. [13]

    EGG-GAE: Scalable graph autoen- coders for tabular data imputation,

    L. Telyatnikov and S. Scardapane, “EGG-GAE: Scalable graph autoen- coders for tabular data imputation,” inProc. AISTATS, 2023

  14. [14]

    Federated optimization in heterogeneous networks (Fed- Prox),

    T. Liet al., “Federated optimization in heterogeneous networks (Fed- Prox),” inProc. MLSys, 2020

  15. [15]

    SCAFFOLD: Stochastic controlled averaging for federated learning,

    S. Karimireddyet al., “SCAFFOLD: Stochastic controlled averaging for federated learning,” inProc. ICML, 2020

  16. [16]

    Model-contrastive federated learning,

    Q. Li, B. He, and D. Song, “Model-contrastive federated learning,” in Proc. CVPR, 2021

  17. [17]

    FedBN: Federated learning on non-IID features via local batch normalization,

    X. Liet al., “FedBN: Federated learning on non-IID features via local batch normalization,” inProc. ICLR, 2021

  18. [18]

    A survey on vertical federated learning: From a layered perspective,

    L. Yanget al., “A survey on vertical federated learning: From a layered perspective,” arXiv:2304.01829, 2023

  19. [19]

    Vertical federated learning: A structured literature review,

    A. Khan, M. ten Thij, and A. Wilbik, “Vertical federated learning: A structured literature review,”Knowl. Inf. Syst., 2025

  20. [20]

    Protecting model updates in privacy-preserving federated learning: Input privacy for vertical FL,

    D. Darais, J. Near, M. Durkee, and D. Buckley, “Protecting model updates in privacy-preserving federated learning: Input privacy for vertical FL,” NIST Cybersecurity Insights Blog, 2024

  21. [21]

    Federated private set intersection example,

    NVIDIA, “Federated private set intersection example,” NVFlare GitHub repository, 2024

  22. [22]

    SecureBoost: A lossless federated learning framework,

    K. Chenget al., “SecureBoost: A lossless federated learning framework,” inProc. FML Workshop @ IJCAI, 2019

  23. [23]

    SecureBoost+: Large-scale and high-performance vertical federated gradient boosting decision trees,

    T. Fanet al., “SecureBoost+: Large-scale and high-performance vertical federated gradient boosting decision trees,” inProc. PAKDD, 2024

  24. [24]

    Split learning for health: Distributed deep learning without sharing raw patient data

    P. Vepakommaet al., “Split learning for health: Distributed deep learning without sharing raw patient data,” arXiv:1812.00564, 2018

  25. [25]

    Fed-MIW AE: Federated imputation of incomplete data via deep generative models,

    I. Balelliet al., “Fed-MIW AE: Federated imputation of incomplete data via deep generative models,” arXiv:2304.08054, 2023

  26. [26]

    Cafe: Improved federated data imputation by leveraging missing data heterogeneity,

    S. Minet al., “Cafe: Improved federated data imputation by leveraging missing data heterogeneity,” preprint, 2024

  27. [27]

    Personalised federated learning on hetero- geneous feature spaces,

    A. Rakotomamonjyet al., “Personalised federated learning on hetero- geneous feature spaces,” arXiv:2301.11447, 2023

  28. [28]

    Clustered federated learning for heterogeneous feature spaces using Siamese GCN distance prediction,

    Y . Suzuki and F. Banaei-Kashani, “Clustered federated learning for heterogeneous feature spaces using Siamese GCN distance prediction,” OpenReview (FLSys), 2023

  29. [29]

    Towards heterogeneous federated learning: Analysis, solutions, and future directions,

    Y . Linet al., “Towards heterogeneous federated learning: Analysis, solutions, and future directions,” inProc. AIS&P, Springer, 2024

  30. [30]

    Harmony in federated learning: A compre- hensive review of techniques to tackle heterogeneity and non-IID data,

    M. Karami and A. Karami, “Harmony in federated learning: A compre- hensive review of techniques to tackle heterogeneity and non-IID data,” Cluster Computing, vol. 28, no. 570, 2025

  31. [31]

    Handling spatial–temporal data heterogeneity for federated continual learning via Tail Anchor,

    H. Yuet al., “Handling spatial–temporal data heterogeneity for federated continual learning via Tail Anchor,” inProc. CVPR, 2025

  32. [32]

    Heterogeneous federated learning: State-of-the-art and research challenges,

    M. Yeet al., “Heterogeneous federated learning: State-of-the-art and research challenges,”ACM Computing Surveys, 2023

  33. [33]

    Rethinking the diffusion models for missing data impu- tation: A gradient flow perspective,

    Z. Chenet al., “Rethinking the diffusion models for missing data impu- tation: A gradient flow perspective,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

  34. [34]

    Privacy-preserving vertical federated KNN feature imputation method,

    W. Du, Y . Wang, G. Meng, and Y . Guo, “Privacy-preserving vertical federated KNN feature imputation method,”Electronics, vol. 13, no. 2, 2024

  35. [35]

    Multimodal federated learning with missing modalities through feature imputation network,

    P. Poudel, A. Chhetri, P. Gyawali, G. Leontidis, and B. Bhattarai, “Multimodal federated learning with missing modalities through feature imputation network,” inMedical Image Understanding and Analysis (MIUA), LNCS, 2026

  36. [36]

    Decentralized noise handling in medical imaging: Encoder–decoder based federated imputation for robust training,

    Y . Chang, Y . Noh, S.-W. Lee, M. Lee, and W. Noh, “Decentralized noise handling in medical imaging: Encoder–decoder based federated imputation for robust training,” inProc. MICCAI, 2025

  37. [37]

    FedFA: Federated learning with feature anchors to align features and classifiers for heterogeneous data,

    T. Zhou, J. Zhang, and D. H. K. Tsang, “FedFA: Federated learning with feature anchors to align features and classifiers for heterogeneous data,” IEEE Transactions on Mobile Computing, vol. 23, no. 6, pp. 6731–6742, 2024

  38. [38]

    FedCOLA: Federated learning with heterogeneous feature concatenation and local acceleration for non-IID data,

    W.-C. Chung and C.-H. Peng, “FedCOLA: Federated learning with heterogeneous feature concatenation and local acceleration for non-IID data,”Future Generation Computer Systems, vol. 166, 2025

  39. [39]

    Reconstructing missing variables for multivariate time series forecasting via conditional generative flows,

    X. Hu, W. Fan, H. Chen, and Y . Fu, “Reconstructing missing variables for multivariate time series forecasting via conditional generative flows,” inProc. IJCAI, 2024

  40. [40]

    TIformer: A transformer-based framework for time-series forecasting with missing data,

    Z. Ding, Y . Chen, H. Wang, X. Wang, W. Zhang, and Y . Zhang, “TIformer: A transformer-based framework for time-series forecasting with missing data,” inDatabases Theory and Applications (ADC 2024), LNCS vol. 15449, Springer, 2024, pp. 71–84

  41. [41]

    Medjadji, Chaimaa, et al. ”FedSparQ: Adaptive Sparse Quantization with Error Feedback for Robust & Efficient Federated Learning.” 2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA). IEEE, 2025

  42. [42]

    FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Con- trastive Regularization,

    M. D. Nguyen, T. T. Nguyen, H. H. Pham, T. N. Hoang, P. L. Nguyen, and T. T. Huynh, “FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Con- trastive Regularization,” inProc. 22nd Int. Symp. Network Com- puting and Applications (NCA), 2024, pp. 1–8. [Online]. Available: https://arxiv.org/abs/2410.03070

  43. [43]

    Learning from data with structured missingness,

    R. Mitra, S. F. McGough, T. Chakraborti, C. Holmes, R. Copping, N. Hagenbuch, S. Biedermann, J. Noonan, B. Lehmann, A. Shenvi, X. V . Doan, D. Leslie, G. Bianconi, R. Sanchez-Garcia, A. Davies, M. Mackintosh, E.-R. Andrinopoulou, A. Basiri, C. Harbron, and B. D. MacArthur, “Learning from data with structured missingness,” Nature Machine Intelligence, vol....

  44. [44]

    Vision paper: Metadata-Guided Diffusion and LLM-Orchestrated Quality Gov- ernance for Time Series Imputation,

    I. Hocine, A. Abboura, A. Kella, M. Hanini, and G. Danoy, “Vision paper: Metadata-Guided Diffusion and LLM-Orchestrated Quality Gov- ernance for Time Series Imputation,” In Proceedings of the QuaLLM-KG Workshop, co-located with EDBT/ICDT 2026, Tampere, Finland, 2026