MIPaaL differentiates through mixed integer programs via cutting planes to enable decision-focused learning for general MIPs, outperforming two-stage prediction-plus-optimization and LP-relaxation baselines on real-world domains.
Dropout: a simple way to prevent neural networks from overfitting.The Journal of Machine Learning Research, 15(1):1929–1958, 2014
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2019 4verdicts
UNVERDICTED 4representative citing papers
Augmenting self-attention with persistent memory vectors allows removal of feed-forward layers from Transformers without degrading performance on character and word level language modeling benchmarks.
RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
Structured dropout improves confidence calibration in CNNs by promoting ensemble diversity, with empirical support on SVHN, CIFAR-10, CIFAR-100 and in Bayesian active learning.
citing papers explorer
-
MIPaaL: Mixed Integer Program as a Layer
MIPaaL differentiates through mixed integer programs via cutting planes to enable decision-focused learning for general MIPs, outperforming two-stage prediction-plus-optimization and LP-relaxation baselines on real-world domains.
-
Augmenting Self-attention with Persistent Memory
Augmenting self-attention with persistent memory vectors allows removal of feed-forward layers from Transformers without degrading performance on character and word level language modeling benchmarks.
-
Generalizing from a few environments in safety-critical reinforcement learning
RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
-
Confidence Calibration for Convolutional Neural Networks Using Structured Dropout
Structured dropout improves confidence calibration in CNNs by promoting ensemble diversity, with empirical support on SVHN, CIFAR-10, CIFAR-100 and in Bayesian active learning.