Title resolution pending

Statistical mechanics of learning , author= · 2001

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.

citing papers explorer

Showing 2 of 2 citing papers.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 19 · 2 links
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions stat.ML · 2026-05-07 · unverdicted · none · ref 209
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer