Estimating or propagating gradients through stochastic neurons for conditional computation

Yoshua Bengio, Nicholas Léonard, Aaron Courville · 2013

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Are Compact Rationales Free? Measuring Tile Selection Headroom in Frozen WSI-MIL

eess.IV · 2026-05-12 · unverdicted · novelty 7.0

FOCI adds a post-hoc readout to frozen WSI-MIL models to find compact output-consistent tile subsets and measures selection headroom with SHI, showing transformer-based models allow smaller rationales than attention-pooling baselines.

Fitting Multilinear Polynomials for Logic Gate Networks

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

cs.CL · 2020-06-30 · unverdicted · novelty 6.0

GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Quant.npu provides a fully static quantization pipeline for on-device LLMs on NPUs by combining rotation matrices, bit-width-aware initialization, two-stage selective optimization, and adaptive mixed precision.

citing papers explorer

Showing 4 of 4 citing papers.

Are Compact Rationales Free? Measuring Tile Selection Headroom in Frozen WSI-MIL eess.IV · 2026-05-12 · unverdicted · none · ref 36
FOCI adds a post-hoc readout to frozen WSI-MIL models to find compact output-consistent tile subsets and measures selection headroom with SHI, showing transformer-based models allow smaller rationales than attention-pooling baselines.
Fitting Multilinear Polynomials for Logic Gate Networks cs.LG · 2026-05-09 · unverdicted · none · ref 14
Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding cs.CL · 2020-06-30 · unverdicted · none · ref 46
GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization cs.LG · 2026-05-19 · unverdicted · none · ref 3
Quant.npu provides a fully static quantization pipeline for on-device LLMs on NPUs by combining rotation matrices, bit-width-aware initialization, two-stage selective optimization, and adaptive mixed precision.

Estimating or propagating gradients through stochastic neurons for conditional computation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer