Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, Yoram Singer · 2011

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Augmenting Self-attention with Persistent Memory

cs.LG · 2019-07-02 · unverdicted · novelty 7.0

Augmenting self-attention with persistent memory vectors allows removal of feed-forward layers from Transformers without degrading performance on character and word level language modeling benchmarks.

Learning Effective Loss Functions Efficiently

cs.LG · 2019-06-28 · unverdicted · novelty 6.0

An anytime algorithm for learning loss functions that is asymptotically optimal in the worst case and experimentally faster than prior methods for hyperparameter tuning.

citing papers explorer

Showing 2 of 2 citing papers.

Augmenting Self-attention with Persistent Memory cs.LG · 2019-07-02 · unverdicted · none · ref 11
Augmenting self-attention with persistent memory vectors allows removal of feed-forward layers from Transformers without degrading performance on character and word level language modeling benchmarks.
Learning Effective Loss Functions Efficiently cs.LG · 2019-06-28 · unverdicted · none · ref 5
An anytime algorithm for learning loss functions that is asymptotically optimal in the worst case and experimentally faster than prior methods for hyperparameter tuning.

Adaptive subgradient methods for online learning and stochastic optimization

fields

years

verdicts

representative citing papers

citing papers explorer