Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy

Ali Najar; Christoph H. Lampert; Nikita P. Kalinin; Valentin Roth

arxiv: 2601.22320 · v2 · submitted 2026-01-29 · 💻 cs.LG · stat.ML

Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy

Nikita P. Kalinin , Ali Najar , Valentin Roth , Christoph H. Lampert This is my paper

Pith reviewed 2026-05-16 09:40 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords continual mean estimationuser-level differential privacymatrix factorizationapproximate differential privacymean squared errorstreaming dataonline estimationprivacy mechanisms

0 comments

The pith

A mean-specific matrix factorization achieves asymptotically lower error in continual mean estimation under user-level approximate differential privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to track running means as data points arrive one after another while ensuring that each user's full set of contributions remains private. Earlier methods relied on pure differential privacy and produced estimates too noisy for most uses. The authors switch to approximate differential privacy and introduce a factorization of the noise matrix that is built expressly around the structure of mean estimation. This design keeps the method computationally light while delivering mean-squared error that improves with more data in the long run. A reader would care because it moves private streaming statistics from theoretically safe but practically unusable to something closer to deployable accuracy.

Core claim

The authors claim that a novel factorization tailored to mean estimation, when plugged into the Matrix Factorization mechanism under approximate differential privacy, simultaneously satisfies user-level privacy, runs efficiently, and attains asymptotically lower mean-squared error bounds than prior pure-differential-privacy approaches for the continual mean estimation task.

What carries the argument

A mean-estimation-specific factorization of the noise matrix inside the Matrix Factorization mechanism, which aligns the added noise with the linear structure of running averages to reduce error accumulation.

Load-bearing premise

The new factorization can be realized in code with efficient, privacy-preserving operations that add no hidden privacy loss or extra computation that would cancel the theoretical error improvement.

What would settle it

A concrete implementation and numerical test on long streams of synthetic or real user data that shows the observed mean-squared error failing to improve asymptotically or the runtime exceeding standard baselines would disprove the central claim.

read the original abstract

We study continual mean estimation, where data vectors arrive sequentially and the goal is to maintain accurate estimates of the running mean. We address this problem under user-level differential privacy, which protects each user's entire dataset even when they contribute multiple data points. Previous work on this problem has focused on pure differential privacy. While important, this approach limits applicability, as it leads to overly noisy estimates. In contrast, we analyze the problem under approximate differential privacy, adopting recent advances in the Matrix Factorization mechanism. We introduce a novel mean estimation specific factorization, which is both efficient and accurate, achieving asymptotically lower mean-squared error bounds in continual mean estimation under user-level differential privacy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a mean-estimation-specific matrix factorization that claims lower asymptotic MSE under approximate user-level DP, but the gains likely require bounded per-user contributions.

read the letter

This paper introduces a mean-estimation-specific matrix factorization for continual mean estimation under approximate user-level differential privacy. It claims this delivers asymptotically lower mean-squared error than earlier pure-DP approaches that the authors correctly flag as too noisy for practical use. The shift to approximate DP plus the tailored factorization is the main step forward, and it positions the work as more deployable than generic matrix factorization mechanisms from prior continual-release papers. If the derivations hold, the construction looks efficient enough to be a useful primitive for streaming tasks. The abstract and positioning give credit to the relevant MF literature without overclaiming. The soft spot is the handling of user contribution counts. User-level DP protects an entire user history, so if the number of vectors per user is unbounded or unknown at design time, the row sensitivity of the query matrix grows and the noise scale may erase the claimed asymptotic advantage. The paper needs to state this regime explicitly and show whether the factorization is re-derived or if it simply assumes a fixed bound. Without that, the improvement is narrower than the abstract suggests. The math appears formally grounded once the details are checked, and there are no obvious circularities or invented entities. This is for people working on private streaming algorithms and DP mechanisms for online data. A reader focused on practical utility bounds would get concrete value from the construction and the error analysis, assuming the assumptions are clarified. It deserves a serious referee because the central claim is specific and targets a real limitation in the literature, even if revisions will be needed on the sensitivity analysis and validation.

Referee Report

1 major / 0 minor

Summary. The paper studies continual mean estimation under user-level approximate differential privacy. It adopts the matrix factorization mechanism and introduces a novel mean-estimation-specific factorization that is claimed to be both computationally efficient and to achieve asymptotically lower mean-squared error than prior pure-DP approaches.

Significance. If the asymptotic MSE improvement holds under the stated assumptions, the work would supply a practical tool for streaming mean estimation with better utility while respecting user-level privacy, addressing a limitation of pure DP mechanisms in continual-release settings.

major comments (1)

[Abstract] Abstract: the claimed asymptotic MSE improvement is stated without an explicit bound on the number of vectors contributed by any single user. Under user-level DP the row sensitivity of the query matrix scales with this count; if the count is unbounded or unknown at design time, the factorization must be re-derived and the noise scaled, which may eliminate the asymptotic gain. Please state the precise assumption on per-user contribution cardinality and derive the resulting sensitivity and MSE bound.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address the major comment below and will revise the paper to incorporate the requested clarifications.

read point-by-point responses

Referee: [Abstract] Abstract: the claimed asymptotic MSE improvement is stated without an explicit bound on the number of vectors contributed by any single user. Under user-level DP the row sensitivity of the query matrix scales with this count; if the count is unbounded or unknown at design time, the factorization must be re-derived and the noise scaled, which may eliminate the asymptotic gain. Please state the precise assumption on per-user contribution cardinality and derive the resulting sensitivity and MSE bound.

Authors: We agree that the assumption on per-user contribution cardinality must be stated explicitly. Our analysis assumes each user contributes at most a fixed, known constant k vectors (a standard modeling choice in the user-level DP literature to ensure finite sensitivity). Under this assumption the row sensitivity of the query matrix is bounded by k (for unit-norm vectors), and the noise calibration and resulting MSE bound follow directly from the matrix factorization framework. We will revise the abstract to state the assumption on k and add an explicit derivation of the sensitivity and asymptotic MSE (showing the improvement over pure-DP baselines is preserved, with constants depending on k) to Section 3. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained; novel factorization introduces independent content

full rationale

The abstract introduces a novel mean-estimation-specific factorization for the matrix factorization mechanism under approximate user-level DP and claims asymptotically lower MSE bounds. No equations or steps are shown that reduce the claimed improvement to a fitted parameter, self-definition, or self-citation chain. The central claim rests on the choice of factorization itself, which is presented as an original construction rather than a renaming or re-derivation of prior fitted quantities. Without load-bearing self-referential steps visible in the provided text, the derivation does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no free parameters, axioms, or invented entities are explicitly stated or derivable from the provided text.

pith-pipeline@v0.9.0 · 5413 in / 970 out tokens · 30876 ms · 2026-05-16T09:40:59.187427+00:00 · methodology

Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)