9th International Conference on Learning Representations

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner · 2021

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Navigating Potholes with Geometry-Aware Sharpness Minimization

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.

MiVE: Multiscale Vision-language features for reference-guided video Editing

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

MiVE repurposes VLMs as multiscale feature extractors integrated into a unified self-attention Diffusion Transformer, achieving top human preference in reference-guided video editing.

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

cs.LG · 2024-02-27 · unverdicted · novelty 7.0

HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.

HORST: Composing Optimizer Geometries for Sparse Transformer Training

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.

ERPPO: Entropy Regularization-based Proximal Policy Optimization

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

Skipping the Zeros in Diffusion Models for Sparse Data Generation

cs.LG · 2026-05-03 · unverdicted · novelty 5.0

SED modifies diffusion models to generate only non-zero values in sparse data, preserving sparsity patterns, cutting computation, and matching or beating standard DM performance on benchmarks.

citing papers explorer

Showing 6 of 6 citing papers.

Navigating Potholes with Geometry-Aware Sharpness Minimization cs.LG · 2026-05-15 · unverdicted · none · ref 29
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
MiVE: Multiscale Vision-language features for reference-guided video Editing cs.CV · 2026-05-14 · unverdicted · none · ref 31
MiVE repurposes VLMs as multiscale feature extractors integrated into a unified self-attention Diffusion Transformer, achieving top human preference in reference-guided video editing.
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations cs.LG · 2024-02-27 · unverdicted · none · ref 43
HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.
HORST: Composing Optimizer Geometries for Sparse Transformer Training cs.LG · 2026-05-20 · unverdicted · none · ref 245
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
ERPPO: Entropy Regularization-based Proximal Policy Optimization cs.LG · 2026-05-13 · unverdicted · none · ref 269
ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
Skipping the Zeros in Diffusion Models for Sparse Data Generation cs.LG · 2026-05-03 · unverdicted · none · ref 35
SED modifies diffusion models to generate only non-zero values in sparse data, preserving sparsity patterns, cutting computation, and matching or beating standard DM performance on benchmarks.

9th International Conference on Learning Representations

fields

years

verdicts

representative citing papers

citing papers explorer