Title resolution pending

Rectifier nonlinearities improve neural network acoustic models , author= · 2013

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.

Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks

cs.LG · 2026-05-06 · unverdicted · novelty 7.0 · 2 refs

Training-time batch normalization increases expected local affine-region density in ReLU and piecewise-affine networks by acting as a batch-conditional recentering mechanism on switching hyperplanes.

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.

citing papers explorer

Showing 3 of 3 citing papers.

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions stat.ML · 2026-05-07 · unverdicted · none · ref 185
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks cs.LG · 2026-05-06 · unverdicted · none · ref 48 · 2 links
Training-time batch normalization increases expected local affine-region density in ReLU and piecewise-affine networks by acting as a batch-conditional recentering mechanism on switching hyperplanes.
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility cs.LG · 2026-05-13 · unverdicted · none · ref 8
SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer