pith. machine review for the scientific record. sign in

arxiv: 2605.02693 · v1 · submitted 2026-05-04 · 📊 stat.ML · cs.LG· stat.ME

Recognition: unknown

Random-Effects Algorithm for Random Objects in Metric Spaces

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:16 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords random effectsmetric spacesFréchet regressionM-estimationnon-Euclidean datarandom objectsconsistent estimationdigital health
0
0 comments X

The pith

A nonlinear Fréchet-based algorithm delivers consistent random-effects prediction for arbitrary objects in metric spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a general random-effects framework for data consisting of multiple observations on the same units when those observations are non-Euclidean random objects living in a metric space. Standard mixed-effects tools stop at Euclidean scalars or Hilbert-space functions, leaving no principled way to borrow strength across units for outcomes such as probability distributions or graphs. The authors introduce a nonlinear Fréchet-based algorithm and use M-estimation theory to prove that a metric-space prediction target is consistently recovered under a working random-effects model. Numerical checks on synthetic examples and digital-health datasets show that the method matches or exceeds the performance of existing Hilbert-space procedures even when those procedures can be applied.

Core claim

We propose a nonlinear Fréchet-based algorithm for random-effects modeling of arbitrary random objects defined on a metric space. Using M-estimation theory, we establish conditions under which the proposed metric-space prediction target is consistently estimated under a working random-effects formulation. We then evaluate the empirical performance of the proposed method using both synthetic data and digital health datasets that require practical tools for analyzing random objects in metric spaces, such as multivariate probability distributions and random graphs. We show that, although our method is developed beyond Hilbert spaces, it can outperform existing Hilbert space-based methods.

What carries the argument

Nonlinear Fréchet-based random-effects algorithm, which replaces Euclidean operations with Fréchet means and regressions defined directly in the metric space to produce a consistent prediction target.

If this is right

  • Random-effects borrowing of strength becomes available for repeated non-Euclidean observations such as random graphs or probability distributions.
  • Personalized prediction targets can be formed for metric-space outcomes with explicit consistency guarantees.
  • The same working-model strategy applies to any metric space once the Fréchet operations are well-defined.
  • Digital-health analyses that previously required Euclidean approximations can now use the native geometry of the data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The working random-effects formulation may still produce useful shrinkage even when the true data-generating process is not exactly a random-effects model.
  • The approach could be tested on longitudinal metric-space data where the metric itself changes over time.
  • Because the method works in general metric spaces, it offers a route to unify random-effects modeling across shape analysis, network data, and distributional data.
  • Extensions to time-to-event or censored observations in metric spaces would follow the same M-estimation template.

Load-bearing premise

The metric space and random objects must satisfy the technical conditions required for M-estimation to deliver consistent estimation of the prediction target under the working random-effects model.

What would settle it

A simulation in which the algorithm's estimated prediction target exhibits persistent bias or fails to converge to the true target as the number of units and observations per unit increase, despite the metric space satisfying all stated M-estimation regularity conditions.

Figures

Figures reproduced from arXiv: 2605.02693 by Marcos Matabuena, Mateo C\'amara.

Figure 1
Figure 1. Figure 1: L 2 Hilbert-space examples for physical activity and continuous glucose monitoring data. Top row: representative repeated daily curves for NHANES PAX (left) and CGMCR (right). Faint lines denote daily replicates, and solid lines denote within-individual means for two representative individuals in each dataset. Bottom row: held-out per-individual MSE for With RE and Without RE on NHANES PAX (left) and CGMCR… view at source ↗
Figure 2
Figure 2. Figure 2: CGM 3D distributions. Left: observed 95% probability ellipsoid of the distributional response for one held-out individual (blue, filled), together with the ellipsoid of the anchor selected by With RE (red, wire); axes are in the native CGM-derived units. Right: held-out per-individual W2 2 error under With RE and Without RE, summarizing the distribution of individual-level errors across the 415 individuals… view at source ↗
Figure 3
Figure 3. Figure 3: NHANES hourly correlation graphs. From left to right: observed Laplacian and corre￾sponding thresholded graph for one held-out individual-day pair; Laplacian and graph selected by the With RE strategy for the same case; and held-out per-individual squared Frobenius error under With RE and Without RE. 3.3 Laplacian graph example in NHANES Finally, we consider a graph-derived response constructed from the NH… view at source ↗
Figure 4
Figure 4. Figure 4: Summary of real-data prediction results across the four datasets analyzed. Panel (a) reports view at source ↗
read the original abstract

Across many scientific disciplines, multiple observations are collected from the same experimental units, and in modern datasets these observations often arise as non-Euclidean random objects. In such settings, the incorporation of random effects is a critical modeling step for efficient estimation and personalized prediction. Although mixed-effects models are well established for scalar outcomes and, more recently, for functional data in Hilbert spaces, general random-effects frameworks for objects in metric spaces remain underdeveloped. In this paper, we propose a nonlinear Fr\'echet-based algorithm for random-effects modeling of arbitrary random objects defined on a metric space. Using M-estimation theory, we establish conditions under which the proposed metric-space prediction target is consistently estimated under a working random-effects formulation. We then evaluate the empirical performance of the proposed method using both synthetic data and digital health datasets that require practical tools for analyzing random objects in metric spaces, such as multivariate probability distributions and random graphs. We show that, although our method is developed beyond Hilbert spaces, it can outperform existing Hilbert space-based methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a nonlinear Fréchet-based algorithm for random-effects modeling of arbitrary random objects defined on a metric space. It invokes M-estimation theory to establish conditions under which the metric-space prediction target is consistently estimated under a working random-effects formulation. The approach is evaluated empirically on synthetic data and digital health datasets involving multivariate probability distributions and random graphs, with claims that it can outperform existing Hilbert space-based methods.

Significance. If the consistency results hold under the stated technical conditions on the metric space and random objects, the work addresses an important gap by extending mixed-effects modeling beyond Euclidean and Hilbert spaces to general metric spaces. This could enable more efficient estimation and personalized prediction for complex non-Euclidean data types increasingly encountered in applications such as digital health. The conditional framing of the guarantees is appropriate, and the empirical outperformance claim, if substantiated with quantitative comparisons, would strengthen the practical contribution.

major comments (2)
  1. Abstract: The claim that M-estimation theory is used to establish consistency conditions is central to the theoretical contribution, yet the abstract supplies no explicit conditions, proof sketches, or error bounds. This makes it impossible to assess whether the technical assumptions on the metric space and random objects are verifiable or overly restrictive.
  2. Empirical section: Claims of outperformance over Hilbert space-based methods are stated without quantitative results, specific metrics, baseline details, or comparison tables. This undermines evaluation of the practical advantage for the central claim that the method works beyond Hilbert spaces.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major point below and indicate planned revisions.

read point-by-point responses
  1. Referee: Abstract: The claim that M-estimation theory is used to establish consistency conditions is central to the theoretical contribution, yet the abstract supplies no explicit conditions, proof sketches, or error bounds. This makes it impossible to assess whether the technical assumptions on the metric space and random objects are verifiable or overly restrictive.

    Authors: We agree the abstract would benefit from greater specificity on the theoretical contribution. We will revise it to briefly state the key conditions (metric space complete and separable, unique Fréchet mean, finite second moment) and note that consistency follows from standard M-estimation arguments. Full proofs and any quantitative bounds remain in the body of the paper. revision: yes

  2. Referee: Empirical section: Claims of outperformance over Hilbert space-based methods are stated without quantitative results, specific metrics, baseline details, or comparison tables. This undermines evaluation of the practical advantage for the central claim that the method works beyond Hilbert spaces.

    Authors: We acknowledge that the empirical claims require stronger quantitative support. In the revised manuscript we will add explicit performance metrics (e.g., prediction error or MSE), identify the exact Hilbert-space baselines used, and include comparison tables for both the synthetic and digital-health experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper proposes a nonlinear Fréchet-based random-effects algorithm for arbitrary random objects in metric spaces and derives consistency of the prediction target via standard M-estimation theory under explicitly stated technical conditions on the metric space and objects. No load-bearing steps reduce by construction to fitted inputs, self-definitions, or unverified self-citations; the consistency claims are conditional on external assumptions rather than tautological. The empirical evaluations on synthetic and digital health data further stand apart from the theoretical derivation. This is the normal case of an independent proposal with standard supporting theory.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; full text would be required to audit them.

pith-pipeline@v0.9.0 · 5476 in / 1035 out tokens · 86029 ms · 2026-05-08T17:16:50.129479+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 28 canonical work pages

  1. [1]

    Geodesic Mixed Effects Models for Repeatedly Ob- served/Longitudinal Random Objects

    Satarupa Bhattacharjee et al. “Geodesic Mixed Effects Models for Repeatedly Ob- served/Longitudinal Random Objects”. In:Journal of the American Statistical Association 120.551 (2025), pp. 1879–1892.DOI: 10.1080/01621459.2025.2474267. eprint: https: //doi.org/10.1080/01621459.2025.2474267 .URL: https://doi.org/10.1080/ 01621459.2025.2474267

  2. [2]

    Nonlinear Global Fréchet Regression for Random Objects via Weak Conditional Expectation

    Satarupa Bhattacharjee et al. “Nonlinear Global Fréchet Regression for Random Objects via Weak Conditional Expectation”. In:The Annals of Statistics53.1 (2025), pp. 117–143.DOI: 10.1214/24-AOS2457

  3. [3]

    Single Index Fréchet Regression

    Satarupa Bhattacharjee et al. “Single Index Fréchet Regression”. In:The Annals of Statistics 51.4 (2023), pp. 1770–1798.DOI:10.1214/23-AOS2307

  4. [4]

    Medoid Splits for Efficient Random Forests in Metric Spaces

    Matthieu Bulté et al. “Medoid Splits for Efficient Random Forests in Metric Spaces”. In: Computational Statistics & Data Analysis198 (2024), p. 107995.DOI: 10.1016/j.csda. 2024.107995

  5. [5]

    Sliced Wasserstein Regression

    Han Chen et al. “Sliced Wasserstein Regression”. In:arXiv preprint arXiv:2306.10601(2023). DOI:10.48550/arXiv.2306.10601

  6. [6]

    2022 , journal =

    Yaqing Chen et al. “Wasserstein Regression”. In:Journal of the American Statistical Associa- tion118.542 (2023), pp. 869–882.DOI:10.1080/01621459.2021.1956937

  7. [7]

    Crainiceanu et al.Functional Data Analysis with R

    C.M. Crainiceanu et al.Functional Data Analysis with R. Chapman and Hall/CRC, 2024

  8. [8]

    Bootstrap-based inference on the difference in the means of two correlated functional processes

    Ciprian M Crainiceanu et al. “Bootstrap-based inference on the difference in the means of two correlated functional processes”. In:Statistics in medicine31.26 (2012), pp. 3223–3240

  9. [9]

    Fast Univariate Inference for Longitudinal Functional Models

    Erjia Cui et al. “Fast Univariate Inference for Longitudinal Functional Models”. In:Journal of Computational and Graphical Statistics0.0 (2021), pp. 1–12.DOI: 10.1080/10618600. 2021.1950006 . eprint: https://doi.org/10.1080/10618600.2021.1950006 .URL: https://doi.org/10.1080/10618600.2021.1950006

  10. [10]

    Fast univariate inference for longitudinal functional models

    Erjia Cui et al. “Fast univariate inference for longitudinal functional models”. In:Journal of Computational and Graphical Statistics(2021), pp. 1–12

  11. [11]

    Fréchet analysis of variance for random objects

    Paromita Dubey et al. “Fréchet analysis of variance for random objects”. In:Biometrika106.4 (2019), pp. 803–821

  12. [12]

    2022 , journal =

    Paromita Dubey et al. “Modeling Time-Varying Random Objects and Dynamic Networks”. In:Journal of the American Statistical Association117.540 (2022), pp. 2252–2267.DOI: 10.1080/01621459.2021.1917416

  13. [13]

    Conditional Distribution Regression for Functional Responses

    Jianing Fan et al. “Conditional Distribution Regression for Functional Responses”. In:Scandi- navian Journal of Statistics49.2 (2022), pp. 502–524.DOI:10.1111/sjos.12525

  14. [14]

    and Shah N.B

    Alex Fout et al. “Fréchet Covariance and MANOV A Tests for Random Objects in Multiple Metric Spaces”. In:arXiv preprint arXiv:2306.12066(2023).DOI: 10.48550/arXiv.2306. 12066

  15. [15]

    Les éléments aléatoires de nature quelconque dans un espace distancié

    Maurice Fréchet. “Les éléments aléatoires de nature quelconque dans un espace distancié”. In: Annales de l’institut Henri Poincaré. V ol. 10. 4. 1948, pp. 215–310

  16. [16]

    Sara A Geer.Empirical Processes in M-estimation. V ol. 6. Cambridge university press, 2000

  17. [17]

    Cambridge university press, 2007

    Andrew Gelman et al.Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2007

  18. [18]

    Longitudinal scalar-on-functions regression with application to trac- tography data

    Jan Gertheiss et al. “Longitudinal scalar-on-functions regression with application to trac- tography data”. In:Biostatistics14.3 (Jan. 2013), pp. 447–461.ISSN: 1465-4644.DOI: 10. 1093/biostatistics/kxs051. eprint: https://academic.oup.com/biostatistics/ article-pdf/14/3/447/17738955/kxs051.pdf .URL: https://doi.org/10.1093/ biostatistics/kxs051

  19. [19]

    Predicting distributional profiles of physical activity in the NHANES database using a Partially Linear Single-Index Fr\’echet Regression model

    Aritra Ghosal et al. “Predicting distributional profiles of physical activity in the NHANES database using a Partially Linear Single-Index Fr\’echet Regression model”. In:arXiv preprint arXiv:2302.07692(2023)

  20. [20]

    Longitudinal functional principal component analysis

    Sonja Greven et al. “Longitudinal functional principal component analysis”. eng. In:Electronic journal of statistics4 (2010). 21743825[pmid], pp. 1022–1054.ISSN: 1935-7524.DOI: 10. 1214/10-EJS575.URL:https://pubmed.ncbi.nlm.nih.gov/21743825

  21. [21]

    Grinsztajn, E

    Léo Grinsztajn et al. “Why do tree-based models still outperform deep learning on tabular data?” In:arXiv preprint arXiv:2207.08815(2022).DOI: 10.48550/arXiv.2207.08815 . URL:https://doi.org/10.48550/arXiv.2207.08815. 11

  22. [22]

    Mixed-effects random forest for clustered data

    Ahlem Hajjem et al. “Mixed-effects random forest for clustered data”. In:Journal of Statistical Computation and Simulation84.6 (2014), pp. 1313–1328.DOI: 10.1080/00949655.2012. 741599

  23. [23]

    Universal Bayes Consistency in Metric Spaces

    Steve Hanneke et al. “Universal Bayes Consistency in Metric Spaces”. In:The Annals of Statistics49.4 (2021), pp. 2129–2155.DOI:10.1214/20-AOS2029

  24. [24]

    Locally Polynomial Hilbertian Additive Regression

    Jeong Min Jeon et al. “Locally Polynomial Hilbertian Additive Regression”. In:Bernoulli28.3 (2022), pp. 2034–2066.DOI:10.3150/21-BEJ1410

  25. [25]

    Usable and precise asymptotics for generalized linear mixed model analysis and design

    Jiming Jiang et al. “Usable and precise asymptotics for generalized linear mixed model analysis and design”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology84.1 (2022), pp. 55–82

  26. [26]

    Model Averaging for Global Fréchet Regression

    Daisuke Kurisu et al. “Model Averaging for Global Fréchet Regression”. In:Journal of Multivariate Analysis207 (2025), p. 105416.DOI:10.1016/j.jmva.2025.105416

  27. [27]

    Random-effects models for longitudinal data

    Nan M Laird et al. “Random-effects models for longitudinal data”. In:Biometrics(1982), pp. 963–974

  28. [28]

    Conformal and knn predictive uncertainty quantification algorithms in metric spaces

    Gábor Lugosi et al. “Conformal and knn predictive uncertainty quantification algorithms in metric spaces”. In:arXiv preprint arXiv:2507.15741(2025)

  29. [29]

    Second Errata to “Distance Covariance in Metric Spaces

    Russell Lyons. “Second Errata to “Distance Covariance in Metric Spaces””. In:The Annals of Probability49.5 (2021), pp. 2668–2670.DOI:10.1214/20-AOP1504

  30. [30]

    Application of functional data analysis for the prediction of maxi- mum heart rate

    Marcos Matabuena et al. “Application of functional data analysis for the prediction of maxi- mum heart rate”. In:IEEE Access7 (2019), pp. 121841–121852

  31. [31]

    Beyond scalar metrics: functional data analysis of postprandial continuous glucose monitoring in the AEGIS study

    Marcos Matabuena et al. “Beyond scalar metrics: functional data analysis of postprandial continuous glucose monitoring in the AEGIS study”. In:BMC Medical Research Methodology (2026)

  32. [32]

    Estimating Knee Movement Patterns of Recreational Runners Across Training Sessions Using Multilevel Functional Regression Models

    Marcos Matabuena et al. “Estimating Knee Movement Patterns of Recreational Runners Across Training Sessions Using Multilevel Functional Regression Models”. In:The American Statisticianjust-accepted (2022), pp. 1–24

  33. [33]

    Glucodensities: a new representation of glucose profiles using distributional data analysis

    Marcos Matabuena et al. “Glucodensities: a new representation of glucose profiles using distributional data analysis”. In:Statistical Methods in Medical Research30.6 (2021), pp. 1445– 1464

  34. [34]

    A novel single-crystal & single-pass source for polarisation- and colour-entangled photon pairs,

    Marcos Matabuena et al. “Glucodensity functional profiles outperform traditional continuous glucose monitoring metrics”. In:Scientific Reports15.1 (Sept. 29, 2025). Article number: 33662, p. 33662.ISSN: 2045-2322.DOI: 10.1038/s41598- 025- 18119- 2.URL: https: //doi.org/10.1038/s41598-025-18119-2

  35. [35]

    Marcos Matabuena et al.Multilevel functional data analysis modeling of human glucose response to meal intake. 2024. arXiv: 2405.14690 [q-bio.QM] .URL: https://arxiv. org/abs/2405.14690

  36. [36]

    Multilevel functional distributional models with applications to continuous glucose monitoring in diabetes clinical trials

    Marcos Matabuena et al. “Multilevel functional distributional models with applications to continuous glucose monitoring in diabetes clinical trials”. In:The Annals of Applied Statistics 20.1 (2026), pp. 476–495

  37. [37]

    Personalized Imputation in Metric Spaces via Conformal Prediction: Applications in Predicting Diabetes Development with Continuous Glucose Monitoring Infor- mation

    Marcos Matabuena et al. “Personalized Imputation in Metric Spaces via Conformal Prediction: Applications in Predicting Diabetes Development with Continuous Glucose Monitoring Infor- mation”. In:arXiv preprint arXiv:2403.18069(2024).DOI:10.48550/arXiv.2403.18069

  38. [38]

    WassersteinF -tests and confidence bands for the Fréchet regression of density response curves

    Alexander Petersen et al. “WassersteinF -tests and confidence bands for the Fréchet regression of density response curves”. In:The Annals of Statistics49.1 (2021), pp. 590–611

  39. [39]

    Large sample confidence regions based on subsamples under minimal assumptions

    Dimitris N Politis et al. “Large sample confidence regions based on subsamples under minimal assumptions”. In:The Annals of Statistics(1994), pp. 2031–2050

  40. [40]

    CatBoost: unbiased boosting with categorical features

    Liudmila Prokhorenkova et al. “CatBoost: unbiased boosting with categorical features”. In: Advances in Neural Information Processing Systems 31. 2018, pp. 6639–6649.DOI: 10 . 5555/3327757.3327770 .URL: https://papers.nips.cc/paper/7898- catboost- unbiased-boosting-with-categorical-features

  41. [41]

    Semi-supervised Fréchet Regression

    Rui Qiu et al. “Semi-supervised Fréchet Regression”. In:arXiv preprint arXiv:2404.10444 (2024).DOI:10.48550/arXiv.2404.10444

  42. [42]

    Improving generalised estimating equations using quadratic inference func- tions

    Annie Qu et al. “Improving generalised estimating equations using quadratic inference func- tions”. In:Biometrika87.4 (2000), pp. 823–836

  43. [43]

    Sch\" o tz

    Christof Schötz. “Nonparametric Regression in Nonstandard Spaces”. In:Electronic Journal of Statistics16.2 (2022), pp. 4679–4741.DOI:10.1214/22-EJS2056. 12

  44. [44]

    Gaussian Process Boosting

    Fabio Sigrist. “Gaussian Process Boosting”. In:Journal of Machine Learning Research23.232 (2022), pp. 1–46.URL:http://jmlr.org/papers/v23/20-322.html

  45. [45]

    Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes

    W.V . Tamborlane et al. “Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes”. In:New England Journal of Medicine359.14 (2008), pp. 1464–1476

  46. [46]

    Journal of the American Sta- tistical Association56(293), 52–64 (1961).https://doi.org/10.1080/01621459

    Danielle C. Tucker et al. “Variable Selection for Global Fréchet Regression”. In:Journal of the American Statistical Association118 (2023), pp. 1023–1037.DOI: 10.1080/01621459. 2021.1969240

  47. [47]

    Journal of the American Statistical Association , author =

    Qi Zhang et al. “Dimension Reduction for Fréchet Regression”. In:Journal of the American Statistical Association119.548 (2024), pp. 2733–2747.DOI: 10.1080/01621459.2023. 2277406

  48. [48]

    Dynamic Network Regression

    Yidong Zhou et al. “Dynamic Network Regression”. In:arXiv preprint arXiv:2109.02981 (2021).DOI:10.48550/arXiv.2109.02981

  49. [49]

    Network regression with graph Laplacians

    Yidong Zhou et al. “Network regression with graph Laplacians”. In:Journal of Machine Learning Research23.320 (2022), pp. 1–41

  50. [50]

    Alain F Zuur et al.Mixed effects models and extensions in ecology with R. V ol. 574. Springer, 2009. 13