Graded Projection Recursion (GPR): Corrections, Obstructions, and Conservative Approximate Matrix Multiplication

Jeffrey Uhlmann

arxiv: 2511.11988 · v5 · submitted 2025-11-15 · 💻 cs.CC · cs.DS

Graded Projection Recursion (GPR): Corrections, Obstructions, and Conservative Approximate Matrix Multiplication

Jeffrey Uhlmann This is my paper

Pith reviewed 2026-05-17 22:39 UTC · model grok-4.3

classification 💻 cs.CC cs.DS

keywords approximate matrix multiplicationgraded projection recursionunbiased estimatorprotected queriesresidual error localizationconservative approximationmatrix product

0 comments

The pith

Unbiased estimator preserves protected matrix queries exactly

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper withdraws its prior claim of exact dense matrix multiplication via graded projection recursion due to a global obstruction in the recursive argument. It replaces that claim with a conservative approximate matrix multiplication framework. Protected left and right query subspaces are chosen so their low and marginal contributions to the product AB can be computed exactly. An unbiased AMM primitive is then applied only to the high/high residual. The construction yields an overall unbiased estimator that keeps every protected query exact in all realizations and confines stochastic error to the residual subspace.

Core claim

For chosen protected left and right query subspaces, the low/marginal part of AB is computed exactly and an unbiased AMM primitive is applied only to the high/high residual. The resulting estimator is unbiased, preserves protected queries exactly in every realization, localizes stochastic error to the residual subspace, and inherits residual output-norm or query-risk guarantees from the underlying estimator.

What carries the argument

The conservative approximate matrix multiplication framework, which separates the product into exactly computed low/marginal parts over protected subspaces and an approximated high/high residual using local graded projection recursion identities.

If this is right

The overall estimator remains unbiased.
Protected queries are preserved exactly in every realization.
Stochastic error is localized exclusively to the residual subspace.
Output-norm or query-risk guarantees from the base AMM primitive carry over directly to the residual.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of exact protected parts from approximated residuals could be applied to other recursive linear-algebra primitives.
The rank/capacity argument against separable active-state realizations suggests that exact dense multiplication would require non-separable state tracking.
The framework offers a route to hybrid exact-approximate methods whenever a subset of outputs must remain precise.

Load-bearing premise

An unbiased AMM primitive exists for the high/high residual and protected subspaces can be chosen so that exact computation of the low/marginal part introduces no bias.

What would settle it

Applying the exact low/marginal computation plus an unbiased AMM primitive to the residual on concrete matrix pairs and checking whether the overall expectation equals the true product would confirm or refute the unbiasedness claim.

read the original abstract

Earlier versions proposed Graded Projection Recursion (GPR) as a deterministic packed-recursion framework for model-honest near-quadratic dense matrix multiplication. This revised version withdraws the exact dense matrix multiplication theorem and the downstream consequences that depended on it with a conservative AMM framework. The local ingredients remain useful as local tools: the three-band packing identity, scaled middle-band extraction under certified gaps, centering and reconstruction identities, and row/column normalization bounds. The gap in the earlier argument is global: the proof relied on a bounded active-state realization that would remove first-mismatch terms through the recursion. For arbitrary dense inputs this would require an exact equality filter over the inner index. We formulate this obstruction as a target-slice/equality-filter problem and give a rank/capacity argument against the natural separable active-state realization. The positive replacement is a conservative approximate matrix multiplication framework. For chosen protected left and right query subspaces, the low/marginal part of AB is computed exactly and an unbiased AMM primitive is applied only to the high/high residual. The resulting estimator is unbiased, preserves protected queries exactly in every realization, localizes stochastic error to the residual subspace, and inherits residual output-norm or query-risk guarantees from the underlying estimator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is mainly a self-correction that drops the exact dense multiplication claim and swaps in a conservative AMM split, but the replacement still assumes an unbiased primitive for the residual without showing one exists.

read the letter

The core takeaway is that the author is retracting the earlier exact GPR theorem for dense matrix multiplication and offering a conservative approximate replacement instead. The paper is upfront about the gap: the old proof needed a bounded active-state realization that could filter mismatches through recursion, but arbitrary dense inputs require an exact equality filter over the inner index that a separable realization cannot provide. The rank/capacity argument against that realization is the clearest new piece here, and it looks logically consistent on its face even if the details are kept abstract.

Referee Report

2 major / 2 minor

Summary. The manuscript withdraws the exact dense matrix multiplication theorem from prior GPR work and replaces it with a conservative approximate matrix multiplication (AMM) framework. For chosen protected left and right query subspaces, the low/marginal part of AB is computed exactly while an unbiased AMM primitive is applied only to the high/high residual. The resulting estimator is claimed to be unbiased overall, to preserve protected queries exactly in every realization, and to localize stochastic error to the residual subspace while inheriting residual guarantees from the underlying AMM primitive. The paper also formulates an obstruction to exact multiplication as a target-slice/equality-filter problem and supplies a rank/capacity argument against the natural separable active-state realization.

Significance. If the central replacement framework can be instantiated with a suitable unbiased AMM primitive, the approach would offer a principled way to localize approximation error while guaranteeing exactness on designated query subspaces. The explicit identification of the gap in the earlier bounded active-state argument and the formulation of the obstruction via rank/capacity are constructive contributions. However, the significance is tempered by the absence of any concrete construction or existence argument for the required unbiased AMM primitive on arbitrary dense high/high residuals.

major comments (2)

[Abstract, §3] Abstract and §3 (obstruction and replacement framework): The central claim that the estimator remains unbiased and localizes stochastic error to the residual rests on the existence of an unbiased AMM primitive applicable to arbitrary high/high residuals. The manuscript supplies no explicit construction, reference to a known primitive, or existence proof for general dense inputs; the framework treats this as a black-box assumption. This is load-bearing for the unbiasedness and error-localization assertions.
[Abstract] Abstract: The rank/capacity argument against separable active-state realizations is presented as the obstruction to exact multiplication, yet the text provides no detailed derivation, explicit counter-example construction, or verification that the argument rules out all relevant realizations. Without this, the justification for withdrawing the exact theorem and moving to the conservative framework remains incomplete.

minor comments (2)

[Abstract] The abstract refers to 'three-band packing identity, scaled middle-band extraction under certified gaps, centering and reconstruction identities' as local tools; these should be given explicit equation numbers or subsection references in the main text for traceability.
Notation for the protected left/right query subspaces and the low/marginal versus high/high decomposition should be introduced with a single consistent diagram or table early in the paper to aid readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful and constructive report. The comments help clarify the presentation of the obstruction argument and the assumptions in the conservative framework. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract, §3] The central claim that the estimator remains unbiased and localizes stochastic error to the residual rests on the existence of an unbiased AMM primitive applicable to arbitrary high/high residuals. The manuscript supplies no explicit construction, reference to a known primitive, or existence proof for general dense inputs; the framework treats this as a black-box assumption. This is load-bearing for the unbiasedness and error-localization assertions.

Authors: We agree that the unbiasedness and error localization properties are conditional on the availability of a suitable unbiased AMM primitive for the high/high residual. The manuscript presents the framework in a modular fashion precisely to allow for the use of any such primitive that satisfies the unbiasedness condition. We do not claim to construct such a primitive here; instead, the contribution lies in showing how exact computation on protected subspaces can be combined with an AMM on the residual to achieve overall unbiased estimation with localized error. In the revised manuscript, we will add a dedicated subsection discussing the requirements for the AMM primitive, reference existing AMM literature that could be adapted (e.g., random sampling or sketching methods for residuals), and explicitly note that instantiating it for general dense cases remains an open direction. This addresses the load-bearing nature by making the assumption transparent. revision: partial
Referee: [Abstract] The rank/capacity argument against separable active-state realizations is presented as the obstruction to exact multiplication, yet the text provides no detailed derivation, explicit counter-example construction, or verification that the argument rules out all relevant realizations. Without this, the justification for withdrawing the exact theorem and moving to the conservative framework remains incomplete.

Authors: We acknowledge that the current presentation of the rank/capacity argument is concise and would benefit from expansion. The argument is based on showing that a separable active-state realization cannot implement the required equality filter for the target slice without exceeding capacity bounds derived from the rank of the projection operators. To strengthen this, in the revision we will include: (1) a detailed step-by-step derivation of the capacity bound, (2) an explicit small-scale counter-example demonstrating the obstruction for a low-dimensional case, and (3) an argument that the separable case is the natural candidate and that non-separable realizations would require additional structure not present in the GPR framework. This will provide a more complete justification for withdrawing the exact dense matrix multiplication theorem. revision: yes

standing simulated objections not resolved

A concrete construction or existence proof for an unbiased AMM primitive applicable to arbitrary dense high/high residuals.

Circularity Check

0 steps flagged

No significant circularity; conservative framework relies on external unbiased AMM primitives

full rationale

The paper explicitly withdraws its prior exact dense multiplication theorem after identifying a global obstruction in the bounded active-state realization and equality-filter problem. The replacement conservative estimator computes the low/marginal part exactly for protected subspaces and applies a standard unbiased AMM primitive only to the high/high residual. This structure treats the primitive as an external black-box assumption rather than deriving or fitting it internally. Local tools such as the three-band packing identity and normalization bounds are presented as independent ingredients. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citation chains appear in the derivation; the central claims inherit properties from the assumed external primitive without circular reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard linear-algebraic identities for matrix multiplication and the existence of an unbiased AMM primitive; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption Existence of an unbiased approximate matrix multiplication primitive for the high/high residual subspace
Invoked when applying the AMM only to the residual after exact computation of protected parts

pith-pipeline@v0.9.0 · 5517 in / 1302 out tokens · 36091 ms · 2026-05-17T22:39:34.845222+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

On multiplication of 2 × 2 matrices,

S. Winograd, “On multiplication of 2 × 2 matrices,”Linear Algebra and its Applications, vol. 4:4, pp. 381–388, 1971

work page 1971
[2]

Landsberg,Geometry and Complexity Theory, Cambridge University Press, 2017

J.M. Landsberg,Geometry and Complexity Theory, Cambridge University Press, 2017

work page 2017
[3]

Gaussian elimination is not optimal,

V. Strassen, “Gaussian elimination is not optimal,”Numerische Mathematik, 13:354–356, 1969

work page 1969
[4]

Matrix multiplication via arithmetic progressions,

D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,”Journal of Symbolic Computation, 9(3):251–280, 1990

work page 1990
[5]

Powers of tensors and fast matrix multiplication,

F. Le Gall, “Powers of tensors and fast matrix multiplication,” InISSAC 2014, 296–303. ACM, 2014

work page 2014
[6]

A refined laser method and faster matrix multiplication,

J. Alman and V. Vassilevska Williams, “A refined laser method and faster matrix multiplication,” InSODA 2021, 522–539. SIAM, 2021

work page 2021
[7]

Surpassing the information theoretic bound with fusion trees,

M.L. Fredman and D.E. Willard, “Surpassing the information theoretic bound with fusion trees,” Journal of Computer and System Sciences, 47(3):424–436, 1993

work page 1993
[8]

Higham,Accuracy and Stability of Numerical Algorithms, SIAM, 2nd ed., 2002

N.J. Higham,Accuracy and Stability of Numerical Algorithms, SIAM, 2nd ed., 2002

work page 2002
[9]

IEEE Standard for Floating-Point Arithmetic,

“IEEE Standard for Floating-Point Arithmetic,”IEEE Std 754-2019, 2019. 63

work page 2019
[10]

von zur Gathen and J

J. von zur Gathen and J. Gerhard,Modern Computer Algebra, Cambridge University Press, 3rd ed., 2013

work page 2013
[11]

Cormen, C.E

T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein,Introduction to Algorithms, MIT Press, 3rd ed., 2009

work page 2009
[12]

General context-free recognition in less than cubic time,

L.G. Valiant, “General context-free recognition in less than cubic time,”Journal of Computer and System Sciences, 10(2):308–315, 1975

work page 1975
[13]

All pairs shortest paths using bridging sets and rectangular matrix multiplication,

U. Zwick, “All pairs shortest paths using bridging sets and rectangular matrix multiplication,” Journal of the ACM, 49(3):289–317, 2002

work page 2002
[14]

Finding a minimum circuit in a graph,

A. Itai and M. Rodeh, “Finding a minimum circuit in a graph,”SIAM Journal on Computing, 7(4):413–423, 1978

work page 1978
[15]

Multiplication of many-digital numbers by automatic computers,

A. Karatsuba and Y. Ofman, “Multiplication of many-digital numbers by automatic computers,” Proceedings of the USSR Academy of Sciences, 145:293–294, 1962

work page 1962
[16]

Schonhage and V

A. Schonhage and V. Strassen,Schnelle Multiplikation grosser Zahlen,Computing, 7:281–292, 1971

work page 1971
[17]

Matrix-by-matrix multiplication algorithm with O(N 2 log2 N) computational complexity for variable precision arithmetic,

M. Paszynski, “Matrix-by-matrix multiplication algorithm with O(N 2 log2 N) computational complexity for variable precision arithmetic,”arXiv:2410.21050, 2024

work page arXiv 2024
[18]

O(n2.7799) complexity for n×n approximate matrix multiplication,

D. Bini, M. Capovani, G. Lotti, and F. Romani, “ O(n2.7799) complexity for n×n approximate matrix multiplication,”Information Processing Letters, 8(5):234–235, 1979

work page 1979
[19]

Relations between exact and approximate bilinear algorithms. Applications,

D. Bini, “Relations between exact and approximate bilinear algorithms. Applications,”Calcolo, 17(1):87–97, 1980

work page 1980
[20]

Approximate Solutions for the Bilinear Form Computational Problem

D. Bini, G. Lotti, and F. Romani, “Approximate Solutions for the Bilinear Form Computational Problem.”SIAM Journal on Computing, 9(4):692–697, 1980

work page 1980
[21]

Field extension and trilinear aggregating, uniting and canceling for the acceleration of matrix multiplications

V. Ya. Pan, “Field extension and trilinear aggregating, uniting and canceling for the acceleration of matrix multiplications.”20th Annual Symposium on Foundations of Computer Science(FOCS), pp. 28–38, 1979

work page 1979
[22]

Faster polynomial multiplication via multipoint Kronecker substitution

D. Harvey, “Faster polynomial multiplication via multipoint Kronecker substitution,” arXiv:0712.4046, 2007

work page internal anchor Pith review Pith/arXiv arXiv 2007
[23]

Implementing High-Performance Complex Matrix Multiplication via the 1M Method,

F.G. Van Zee and R.A. van de Geijn. “Implementing High-Performance Complex Matrix Multiplication via the 1M Method,”SIAM Journal on Scientific Computing, 2019

work page 2019
[24]

Further Remarks on Reducing Truncation Errors,

W. Kahan, “Further Remarks on Reducing Truncation Errors,”Communications of the ACM, 8:1, 1965. 64 A Parallel bit-complexity and communication overhead This appendix records a simple but useful consequence of the depth-uniform representation bounds proved in the main text: once a GPR kernel has eliminated depth-dependent growth of packed intermediates, th...

work page 1965
[25]

Ife≥0, thenD 2zis an integer and D2z =N·2 e

work page
[26]

Ife <0, lets:=−e >0 and define D2z := RoundPow2(N, s) via Definition B.2. And then define: 1 D2 D2( z) := 1 D2 D2z ∈D −2Z⊂ L D.(244) Lemma B.5(Correctness of the implementation).If 1 D2 D2( ·)is implemented as in Definition B.4, then for every z∈ L D it equals the mathematical definition 1 D2 D2( z) = D−2 D2z exactly. Proof. This is immediate from the rep...

work page
[27]

Return a failure flag (instance leaves the declared success set), or

work page
[28]

On the success set, the check passes and extraction is exact

Increaseβand retry (if using an adaptive base policy outside the constant-βregime). On the success set, the check passes and extraction is exact. B.6 Notes on efficiency The primitives above are constant-time per scalar under fixed-width machine arithmetic: •exponent updates are constant time, •RoundPow2uses shifts/masks on a fixed-width integer type, • 1...

work page

[1] [1]

On multiplication of 2 × 2 matrices,

S. Winograd, “On multiplication of 2 × 2 matrices,”Linear Algebra and its Applications, vol. 4:4, pp. 381–388, 1971

work page 1971

[2] [2]

Landsberg,Geometry and Complexity Theory, Cambridge University Press, 2017

J.M. Landsberg,Geometry and Complexity Theory, Cambridge University Press, 2017

work page 2017

[3] [3]

Gaussian elimination is not optimal,

V. Strassen, “Gaussian elimination is not optimal,”Numerische Mathematik, 13:354–356, 1969

work page 1969

[4] [4]

Matrix multiplication via arithmetic progressions,

D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,”Journal of Symbolic Computation, 9(3):251–280, 1990

work page 1990

[5] [5]

Powers of tensors and fast matrix multiplication,

F. Le Gall, “Powers of tensors and fast matrix multiplication,” InISSAC 2014, 296–303. ACM, 2014

work page 2014

[6] [6]

A refined laser method and faster matrix multiplication,

J. Alman and V. Vassilevska Williams, “A refined laser method and faster matrix multiplication,” InSODA 2021, 522–539. SIAM, 2021

work page 2021

[7] [7]

Surpassing the information theoretic bound with fusion trees,

M.L. Fredman and D.E. Willard, “Surpassing the information theoretic bound with fusion trees,” Journal of Computer and System Sciences, 47(3):424–436, 1993

work page 1993

[8] [8]

Higham,Accuracy and Stability of Numerical Algorithms, SIAM, 2nd ed., 2002

N.J. Higham,Accuracy and Stability of Numerical Algorithms, SIAM, 2nd ed., 2002

work page 2002

[9] [9]

IEEE Standard for Floating-Point Arithmetic,

“IEEE Standard for Floating-Point Arithmetic,”IEEE Std 754-2019, 2019. 63

work page 2019

[10] [10]

von zur Gathen and J

J. von zur Gathen and J. Gerhard,Modern Computer Algebra, Cambridge University Press, 3rd ed., 2013

work page 2013

[11] [11]

Cormen, C.E

T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein,Introduction to Algorithms, MIT Press, 3rd ed., 2009

work page 2009

[12] [12]

General context-free recognition in less than cubic time,

L.G. Valiant, “General context-free recognition in less than cubic time,”Journal of Computer and System Sciences, 10(2):308–315, 1975

work page 1975

[13] [13]

All pairs shortest paths using bridging sets and rectangular matrix multiplication,

U. Zwick, “All pairs shortest paths using bridging sets and rectangular matrix multiplication,” Journal of the ACM, 49(3):289–317, 2002

work page 2002

[14] [14]

Finding a minimum circuit in a graph,

A. Itai and M. Rodeh, “Finding a minimum circuit in a graph,”SIAM Journal on Computing, 7(4):413–423, 1978

work page 1978

[15] [15]

Multiplication of many-digital numbers by automatic computers,

A. Karatsuba and Y. Ofman, “Multiplication of many-digital numbers by automatic computers,” Proceedings of the USSR Academy of Sciences, 145:293–294, 1962

work page 1962

[16] [16]

Schonhage and V

A. Schonhage and V. Strassen,Schnelle Multiplikation grosser Zahlen,Computing, 7:281–292, 1971

work page 1971

[17] [17]

Matrix-by-matrix multiplication algorithm with O(N 2 log2 N) computational complexity for variable precision arithmetic,

M. Paszynski, “Matrix-by-matrix multiplication algorithm with O(N 2 log2 N) computational complexity for variable precision arithmetic,”arXiv:2410.21050, 2024

work page arXiv 2024

[18] [18]

O(n2.7799) complexity for n×n approximate matrix multiplication,

D. Bini, M. Capovani, G. Lotti, and F. Romani, “ O(n2.7799) complexity for n×n approximate matrix multiplication,”Information Processing Letters, 8(5):234–235, 1979

work page 1979

[19] [19]

Relations between exact and approximate bilinear algorithms. Applications,

D. Bini, “Relations between exact and approximate bilinear algorithms. Applications,”Calcolo, 17(1):87–97, 1980

work page 1980

[20] [20]

Approximate Solutions for the Bilinear Form Computational Problem

D. Bini, G. Lotti, and F. Romani, “Approximate Solutions for the Bilinear Form Computational Problem.”SIAM Journal on Computing, 9(4):692–697, 1980

work page 1980

[21] [21]

Field extension and trilinear aggregating, uniting and canceling for the acceleration of matrix multiplications

V. Ya. Pan, “Field extension and trilinear aggregating, uniting and canceling for the acceleration of matrix multiplications.”20th Annual Symposium on Foundations of Computer Science(FOCS), pp. 28–38, 1979

work page 1979

[22] [22]

Faster polynomial multiplication via multipoint Kronecker substitution

D. Harvey, “Faster polynomial multiplication via multipoint Kronecker substitution,” arXiv:0712.4046, 2007

work page internal anchor Pith review Pith/arXiv arXiv 2007

[23] [23]

Implementing High-Performance Complex Matrix Multiplication via the 1M Method,

F.G. Van Zee and R.A. van de Geijn. “Implementing High-Performance Complex Matrix Multiplication via the 1M Method,”SIAM Journal on Scientific Computing, 2019

work page 2019

[24] [24]

Further Remarks on Reducing Truncation Errors,

W. Kahan, “Further Remarks on Reducing Truncation Errors,”Communications of the ACM, 8:1, 1965. 64 A Parallel bit-complexity and communication overhead This appendix records a simple but useful consequence of the depth-uniform representation bounds proved in the main text: once a GPR kernel has eliminated depth-dependent growth of packed intermediates, th...

work page 1965

[25] [25]

Ife≥0, thenD 2zis an integer and D2z =N·2 e

work page

[26] [26]

Ife <0, lets:=−e >0 and define D2z := RoundPow2(N, s) via Definition B.2. And then define: 1 D2 D2( z) := 1 D2 D2z ∈D −2Z⊂ L D.(244) Lemma B.5(Correctness of the implementation).If 1 D2 D2( ·)is implemented as in Definition B.4, then for every z∈ L D it equals the mathematical definition 1 D2 D2( z) = D−2 D2z exactly. Proof. This is immediate from the rep...

work page

[27] [27]

Return a failure flag (instance leaves the declared success set), or

work page

[28] [28]

On the success set, the check passes and extraction is exact

Increaseβand retry (if using an adaptive base policy outside the constant-βregime). On the success set, the check passes and extraction is exact. B.6 Notes on efficiency The primitives above are constant-time per scalar under fixed-width machine arithmetic: •exponent updates are constant time, •RoundPow2uses shifts/masks on a fixed-width integer type, • 1...

work page