General asymptotic rank speedup theorems are established via Strassen calculus, proving the asymptotic rank of cw_2 is below 3.931 and yielding an upper bound below d^{2ω/3} for any d×d×d tensor.
Gaussian elimination is not optimal
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Blocked Jacobi attains the communication lower bound for classical O(n^3) matrix multiplication while a recursive version reaches near-optimal arithmetic and communication cost using fast Strassen-like multiplication; analogous bounds hold for one-sided Jacobi SVD.
FalconGEMM delivers a framework with deployment, group-parallel execution, and analytical decision modules that makes lower-complexity matrix multiplication practical, beating cuBLAS and similar libraries by 7.59-17.85% on LLM tasks.
Approximates large matrix multiplication via truncated SVD and circulant decompositions with O(n^2 log n) complexity and ~1% relative error, including LLM operation demonstrations.
citing papers explorer
-
Asymptotic Rank Speedup Theorems, Revisited
General asymptotic rank speedup theorems are established via Strassen calculus, proving the asymptotic rank of cw_2 is below 3.931 and yielding an upper bound below d^{2ω/3} for any d×d×d tensor.
-
Minimizing the Arithmetic and Communication Complexity of Jacobi's Method for Eigenvalues and Singular Values: Part One -- Serial Algorithms
Blocked Jacobi attains the communication lower bound for classical O(n^3) matrix multiplication while a recursive version reaches near-optimal arithmetic and communication cost using fast Strassen-like multiplication; analogous bounds hold for one-sided Jacobi SVD.
-
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
FalconGEMM delivers a framework with deployment, group-parallel execution, and analytical decision modules that makes lower-complexity matrix multiplication practical, beating cuBLAS and similar libraries by 7.59-17.85% on LLM tasks.
-
Efficient approximations of matrix multiplication using truncated decompositions
Approximates large matrix multiplication via truncated SVD and circulant decompositions with O(n^2 log n) complexity and ~1% relative error, including LLM operation demonstrations.