ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Training-time batch normalization increases expected local affine-region density in ReLU and piecewise-affine networks by acting as a batch-conditional recentering mechanism on switching hyperplanes.
SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.
citing papers explorer
-
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
-
Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
Training-time batch normalization increases expected local affine-region density in ReLU and piecewise-affine networks by acting as a batch-conditional recentering mechanism on switching hyperplanes.
-
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.