FOCI adds a post-hoc readout to frozen WSI-MIL models to find compact output-consistent tile subsets and measures selection headroom with SHI, showing transformer-based models allow smaller rationales than attention-pooling baselines.
Estimating or propagating gradients through stochastic neurons for conditional computation
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4representative citing papers
Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.
GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.
Quant.npu provides a fully static quantization pipeline for on-device LLMs on NPUs by combining rotation matrices, bit-width-aware initialization, two-stage selective optimization, and adaptive mixed precision.
citing papers explorer
-
Are Compact Rationales Free? Measuring Tile Selection Headroom in Frozen WSI-MIL
FOCI adds a post-hoc readout to frozen WSI-MIL models to find compact output-consistent tile subsets and measures selection headroom with SHI, showing transformer-based models allow smaller rationales than attention-pooling baselines.
-
Fitting Multilinear Polynomials for Logic Gate Networks
Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.
-
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.
-
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
Quant.npu provides a fully static quantization pipeline for on-device LLMs on NPUs by combining rotation matrices, bit-width-aware initialization, two-stage selective optimization, and adaptive mixed precision.