Random Khatri-Rao-Product Codes for Numerically-Stable Distributed Matrix Multiplication

Adarsh M. Subramaniam; Anoosheh Heidarzadeh; Krishna R. Narayanan

REVIEW 2 major objections 2 minor 14 references

Random Khatri-Rao-Product codes achieve maximum distance separability with probability 1 and decode distributed matrix multiplications with substantially lower numerical error than polynomial codes.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-24 22:00 UTC pith:TQVRHOXZ

load-bearing objection RKRP codes give a stability edge over polynomial and OrthoPoly constructions but the MDS claim needs explicit verification that the Khatri-Rao structure does not force any determinant to be identically zero. the 2 major comments →

arxiv 1907.05965 v1 pith:TQVRHOXZ submitted 2019-07-12 cs.IT math.IT

Random Khatri-Rao-Product Codes for Numerically-Stable Distributed Matrix Multiplication

Adarsh M. Subramaniam , Anoosheh Heidarzadeh , Krishna R. Narayanan This is my paper

classification cs.IT math.IT

keywords distributed matrix multiplicationstraggler mitigationcoding for computationnumerical stabilityKhatri-Rao productMDS codesrandom coding

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces random Khatri-Rao-Product codes to manage stragglers during distributed matrix multiplication. It establishes that these codes meet the maximum distance separable property with probability 1 and that their decoding stays numerically stable far better than Polynomial codes or OrthoPoly codes. This stability matters because finite-precision arithmetic in large systems can otherwise produce large recovery errors. The codes require the same communication volume and encoding work as the earlier schemes yet need less average decoding effort than OrthoPoly codes. Experiments report markedly smaller average relative L2-norm reconstruction error.

Core claim

We propose a class of codes called random Khatri-Rao-Product (RKRP) codes for distributed matrix multiplication in the presence of stragglers. The main advantage of the proposed codes is that decoding of RKRP codes is highly numerically stable in comparison to decoding of Polynomial codes and decoding of the recently proposed OrthoPoly codes. We show that RKRP codes are maximum distance separable with probability 1. The communication cost and encoding complexity for RKRP codes are identical to that of OrthoPoly codes and Polynomial codes and the average decoding complexity of RKRP codes is lower than that of OrthoPoly codes. Numerical results show that the average relative L2-norm of the 0re

What carries the argument

The random Khatri-Rao product construction of the encoding matrices, which produces full-rank submatrices with probability 1.

Load-bearing premise

Random choices inside the Khatri-Rao product construction always yield full-rank submatrices with probability 1 and the observed numerical stability generalizes beyond the tested matrix sizes and arithmetic settings.

What would settle it

A concrete random parameter choice that produces a rank-deficient submatrix, or a numerical experiment in which the relative L2-norm error of RKRP decoding is not lower than that of OrthoPoly decoding.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Straggler recovery in distributed matrix multiplication becomes reliable under finite-precision arithmetic.
Communication volume stays identical to that of Polynomial codes.
Average decoding effort drops below that required by OrthoPoly codes.
Reconstruction error remains small across the tested range of worker counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same random-product idea could be tested on other linear distributed tasks such as convolution or tensor operations.
Stability gains might be measured directly on hardware with limited floating-point precision.
Explicit bounds on the probability of rank deficiency for finite random draws could be derived.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

RKRP codes give a stability edge over polynomial and OrthoPoly constructions but the MDS claim needs explicit verification that the Khatri-Rao structure does not force any determinant to be identically zero.

read the letter

The paper's main contribution is a random Khatri-Rao-product construction for straggler codes in matrix multiplication. It keeps the same upload cost as existing schemes but reports substantially lower reconstruction error in the numerical tests and lower average decoding complexity than OrthoPoly codes. The authors also state that the codes are MDS with probability 1. That combination addresses a practical pain point—numerical blow-up during decoding—that polynomial codes are known for. The empirical gap shown in the L2 error plots is the clearest takeaway if the numbers hold under broader testing. The construction is simple enough that it could be implemented quickly for further checks. The soft spot is the MDS argument. The stress-test note is right to flag that Khatri-Rao products can create algebraic dependencies not present in fully random matrices. The paper must show that every relevant square submatrix has a determinant polynomial that is not the zero polynomial; otherwise the probability-1 claim does not follow from standard random-matrix arguments. Without seeing that step spelled out, the theoretical guarantee is not yet solid. The numerical comparisons also lack enough detail on field size, matrix dimensions, and run count to judge how general the stability advantage is. This paper is for people already working on coded distributed computation who need codes that behave well in floating point. It has a distinct construction and a concrete practical claim, so it is worth sending to referees who can verify the determinant polynomials and ask for more experimental controls.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes random Khatri-Rao-product (RKRP) codes for distributed matrix multiplication in the presence of stragglers. It claims that RKRP codes are maximum distance separable (MDS) with probability 1, that their decoding is numerically more stable than Polynomial codes and OrthoPoly codes, that communication cost and encoding complexity are identical to those baselines, and that average decoding complexity is lower. Numerical experiments are reported to demonstrate substantially lower average relative L2-norm reconstruction error.

Significance. If the MDS property holds for the Khatri-Rao construction and the reported numerical stability advantage is robust, the codes could provide a practical option for coded distributed linear algebra where finite-precision effects matter. Matching the communication cost of existing schemes is a concrete implementation advantage.

major comments (2)

[§3] §3 (MDS property): the argument that RKRP codes are MDS with probability 1 requires showing that the determinant of every square submatrix of the generator matrix is a non-zero polynomial in the random variables. The Khatri-Rao product structure may introduce algebraic dependencies that make some such determinants identically zero; the manuscript must explicitly rule this out rather than relying on generic random-matrix results. This is the load-bearing theoretical step.
[Numerical results section] Numerical results section / Table 1: the reported error comparisons lack the finite-field size, matrix dimensions, number of Monte-Carlo trials, and any variance or error-bar information. Without these details the claimed substantial improvement in average relative L2-norm error cannot be assessed for robustness or generalizability.

minor comments (2)

[§2] The definition and notation for the Khatri-Rao product should be recalled explicitly in the construction section for readers who may not have the reference at hand.
[§2] A short remark on how the random matrices are sampled (e.g., over which field or ring) would clarify the probability-1 statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comments on the MDS proof and numerical results. We address each major comment below.

read point-by-point responses

Referee: [§3] §3 (MDS property): the argument that RKRP codes are MDS with probability 1 requires showing that the determinant of every square submatrix of the generator matrix is a non-zero polynomial in the random variables. The Khatri-Rao product structure may introduce algebraic dependencies that make some such determinants identically zero; the manuscript must explicitly rule this out rather than relying on generic random-matrix results. This is the load-bearing theoretical step.

Authors: We agree that an explicit argument ruling out identically zero determinants is needed to fully address potential algebraic dependencies arising from the Khatri-Rao structure. The current proof in §3 invokes generic results on random matrices after establishing the generator matrix entries, but does not separately verify the absence of structural dependencies. In the revision we will add a short lemma showing that, for the specific random Khatri-Rao construction used, every square submatrix has a determinant that is a non-zero polynomial in the random variables; the proof will proceed by induction on the number of factors and by examining the leading monomial in each determinant. This addition will be placed immediately after the existing argument in §3. revision: yes
Referee: [Numerical results section] Numerical results section / Table 1: the reported error comparisons lack the finite-field size, matrix dimensions, number of Monte-Carlo trials, and any variance or error-bar information. Without these details the claimed substantial improvement in average relative L2-norm error cannot be assessed for robustness or generalizability.

Authors: The referee correctly identifies missing experimental parameters. In the revised manuscript we will augment the numerical results section and Table 1 with: (i) the finite-field size, (ii) the exact matrix dimensions (m, n, p) used for each experiment, (iii) the number of Monte-Carlo trials (currently 1000), and (iv) standard-deviation error bars on the reported average relative L2-norm errors. These additions will allow readers to assess both statistical significance and generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: MDS claim rests on probabilistic determinant argument; stability on external numerical comparison.

full rationale

The paper asserts that RKRP codes are MDS with probability 1 via a probabilistic argument over random Khatri-Rao products and reports numerical stability via direct simulation against baselines. No equations reduce a claimed prediction to a fitted parameter by construction, no load-bearing self-citation chain is invoked for the core uniqueness or ansatz, and the generator-matrix construction is not shown to be definitionally equivalent to the MDS property. The derivation chain is therefore self-contained against the stated probabilistic and empirical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities can be extracted. The MDS claim implicitly relies on standard random-matrix rank properties over finite fields.

axioms (1)

domain assumption Random matrices generate full-rank Khatri-Rao products with probability 1
Invoked to establish the MDS property with probability 1.

pith-pipeline@v0.9.0 · 5672 in / 1232 out tokens · 20529 ms · 2026-05-24T22:00:34.882390+00:00 · methodology

0 comments

read the original abstract

Figures

Figures reproduced from arXiv: 1907.05965 by Adarsh M. Subramaniam, Anoosheh Heidarzadeh, Krishna R. Narayanan.

**Figure 2.** Figure 2: Plot of average relative error versus fraction of str [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Plot of average relative error versus number of strag [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Plot of E[log(condition number)] of inverted matrix [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 6 internal anchors

[1]

Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication

Q. Y u, M. A. Maddah-Ali, and A. S. Avestimehr, “Polynomia l codes: an optimal design for high-dimensional coded matri x multiplication,” arXiv:1705.10464, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Numerically stable polynomial ly coded computing,

M. Fahim and V . Cadambe, “Numerically stable polynomial ly coded computing,” in to appear in proceedings of the International Symposium on Information Theory , 2019

work page 2019
[3]

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding,

Q. Y u, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundame ntal limits and optimal coding,” arXiv:1801.07487, 2018

work page arXiv 2018
[4]

A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication

S. Dutta, Z. Bai, H. Jeong, T. M. Low, and P . Grover, “A uniﬁ ed coded deep neural network training strategy based on generalized polydot codes for matrix multiplication,” arXiv:1811.10751, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

A. Ramamoorthy, L. Tang, and P . O. V ontobel, “Universally decodable matrices for distributed matrix-vector multip lication,” arXiv:1901.10674, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[6]

Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

Q. Y u, N. Raviv, J. So, and A. S. Avestimehr, “Lagrange cod ed computing: Optimal design for resiliency, security and privacy,” arXiv:1806.00939, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Speeding Up Distributed Machine Learning Using Codes

K. Lee, M. Lam, R. Pedarsani, D. S. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” arXiv:1512.02673, 2015. July 16, 2019 DRAFT 16

work page internal anchor Pith review Pith/arXiv arXiv 2015
[8]

"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

S. Dutta, V . R. Cadambe, and P . Grover, “"short-dot": Com puting large linear transforms distributedly using coded s hort dot products,” arXiv:1704.05181, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

A uniﬁed co ding framework for distributed computing with straggling servers,

S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “A uniﬁed co ding framework for distributed computing with straggling servers,” in 2016 IEEE Globecom W orkshops , Dec 2016, pp. 1–6

work page 2016
[10]

High-dimensional c oded matrix multiplication,

K. Lee, C. Suh, and K. Ramchandran, “High-dimensional c oded matrix multiplication,” in 2017 IEEE International Symposium on Information Theory (ISIT) , June 2017, pp. 2418–2422

work page 2017
[11]

Coded dis tributed computing: Straggling servers and multistage dat aﬂows,

S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Coded dis tributed computing: Straggling servers and multistage dat aﬂows,” in 2016 54th Annual Allerton Conference on Communication, Con trol, and Computing (Allerton) , Sep. 2016, pp. 164–171

work page 2016
[12]

Coded distributed compu ting for inverse problems,

Y . Yang, P . Grover, and S. Kar, “Coded distributed compu ting for inverse problems,” in Advances in Neural Information Processing Systems 30 , 2017, pp. 709–719

work page 2017
[13]

Almos t-sure identiﬁability of multidimensional harmonic retri eval,

T. Jiang, N. D. Sidiropoulos, and J. M. ten Berge, “Almos t-sure identiﬁability of multidimensional harmonic retri eval,” IEEE Transactions on Signal Processing , vol. 49, no. 9, pp. 1849–1859, 2001

work page 2001
[14]

Eigenvalues and condition numbers of rand om matrices,

A. Edelman, “Eigenvalues and condition numbers of rand om matrices,” SIAM Journal on Matrix Analysis and Applications , vol. 9, no. 4, pp. 543–560, 1988. July 16, 2019 DRAFT

work page 1988

[1] [1]

Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication

Q. Y u, M. A. Maddah-Ali, and A. S. Avestimehr, “Polynomia l codes: an optimal design for high-dimensional coded matri x multiplication,” arXiv:1705.10464, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Numerically stable polynomial ly coded computing,

M. Fahim and V . Cadambe, “Numerically stable polynomial ly coded computing,” in to appear in proceedings of the International Symposium on Information Theory , 2019

work page 2019

[3] [3]

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding,

Q. Y u, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundame ntal limits and optimal coding,” arXiv:1801.07487, 2018

work page arXiv 2018

[4] [4]

A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication

S. Dutta, Z. Bai, H. Jeong, T. M. Low, and P . Grover, “A uniﬁ ed coded deep neural network training strategy based on generalized polydot codes for matrix multiplication,” arXiv:1811.10751, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

A. Ramamoorthy, L. Tang, and P . O. V ontobel, “Universally decodable matrices for distributed matrix-vector multip lication,” arXiv:1901.10674, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[6] [6]

Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

Q. Y u, N. Raviv, J. So, and A. S. Avestimehr, “Lagrange cod ed computing: Optimal design for resiliency, security and privacy,” arXiv:1806.00939, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Speeding Up Distributed Machine Learning Using Codes

K. Lee, M. Lam, R. Pedarsani, D. S. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” arXiv:1512.02673, 2015. July 16, 2019 DRAFT 16

work page internal anchor Pith review Pith/arXiv arXiv 2015

[8] [8]

"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

S. Dutta, V . R. Cadambe, and P . Grover, “"short-dot": Com puting large linear transforms distributedly using coded s hort dot products,” arXiv:1704.05181, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

A uniﬁed co ding framework for distributed computing with straggling servers,

S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “A uniﬁed co ding framework for distributed computing with straggling servers,” in 2016 IEEE Globecom W orkshops , Dec 2016, pp. 1–6

work page 2016

[10] [10]

High-dimensional c oded matrix multiplication,

K. Lee, C. Suh, and K. Ramchandran, “High-dimensional c oded matrix multiplication,” in 2017 IEEE International Symposium on Information Theory (ISIT) , June 2017, pp. 2418–2422

work page 2017

[11] [11]

Coded dis tributed computing: Straggling servers and multistage dat aﬂows,

S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Coded dis tributed computing: Straggling servers and multistage dat aﬂows,” in 2016 54th Annual Allerton Conference on Communication, Con trol, and Computing (Allerton) , Sep. 2016, pp. 164–171

work page 2016

[12] [12]

Coded distributed compu ting for inverse problems,

Y . Yang, P . Grover, and S. Kar, “Coded distributed compu ting for inverse problems,” in Advances in Neural Information Processing Systems 30 , 2017, pp. 709–719

work page 2017

[13] [13]

Almos t-sure identiﬁability of multidimensional harmonic retri eval,

T. Jiang, N. D. Sidiropoulos, and J. M. ten Berge, “Almos t-sure identiﬁability of multidimensional harmonic retri eval,” IEEE Transactions on Signal Processing , vol. 49, no. 9, pp. 1849–1859, 2001

work page 2001

[14] [14]

Eigenvalues and condition numbers of rand om matrices,

A. Edelman, “Eigenvalues and condition numbers of rand om matrices,” SIAM Journal on Matrix Analysis and Applications , vol. 9, no. 4, pp. 543–560, 1988. July 16, 2019 DRAFT

work page 1988