hub

On the variance of the adaptive learning rate and beyond.arXiv preprint arXiv:1908.03265

Liu, L · 1908 · arXiv 1908.03265

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

unclear 1 use method 1

representative citing papers

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Building Deep Graph Predictors with Graph Imitation Learning

cs.CV · 2026-01-21 · unverdicted · novelty 7.0

GRAIL trains graph predictors via imitation learning by modeling generation as sequential decisions on partial graph embeddings, matching or exceeding prior methods on 18 benchmarks.

On the Convergence of Muon and Beyond

cs.LG · 2025-09-19 · unverdicted · novelty 7.0

Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

cs.CV · 2023-10-06 · unverdicted · novelty 7.0

Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.

Anon: Extrapolating Adaptivity Beyond SGD and Adam

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

Particle transformers for identifying Lorentz-boosted Higgs bosons decaying to a pair of W bosons

hep-ex · 2026-04-10 · unverdicted · novelty 6.0

PaRT achieves >50% tagging efficiency for boosted H->WW jets at 1% background efficiency, decorrelated from jet mass, with data-to-simulation scale factors of 0.9-1.0 on 138 fb^{-1} of 13 TeV collisions.

Delve into the Applicability of Advanced Optimizers for Multi-Task Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

APT augments multi-task learning by adapting advanced optimizers via momentum balancing and light direction preservation, delivering performance gains on four standard MTL datasets.

Improved Techniques for Training Consistency Models

cs.LG · 2023-10-22 · accept · novelty 6.0

Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

cs.LG · 2023-06-24 · unverdicted · novelty 6.0

H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.

Deep neural networks with Fisher vector encoding for medical image classification

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

Fisher vector encoding integrated into CNN-ViT hybrids outperforms benchmarks on MedMNIST datasets and matches literature results on other medical image sets.

AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

cs.LG · 2026-05-01 · unverdicted · novelty 5.0

AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.

Characterizing the Instrumental Profile of LAMOST

astro-ph.IM · 2026-03-10 · unverdicted · novelty 5.0

Neural network derives LAMOST instrumental profiles from arc lamps and reduces RV dispersion by ~3 km/s.

Rapid training of Hamiltonian graph networks using random features

cs.LG · 2025-06-06 · unverdicted · novelty 5.0

Hamiltonian Graph Networks achieve 150-600x faster training via random feature parameter construction while retaining comparable accuracy and physical invariances on N-body systems up to 10,000 particles.

Neural Network-Based Virtual Wheel-Speed Sensor for Enhanced Low-Velocity State Estimation

eess.SY · 2026-05-12 · unverdicted · novelty 4.0

A neural network fuses wheel and motor speed signals to cut wheel-speed estimation error by up to 85% versus the production sensor on real Volkswagen ID.7 data.

Video-guided Machine Translation with Global Video Context

cs.CV · 2026-04-08 · unverdicted · novelty 4.0

A globally video-guided multimodal translation framework retrieves semantically related video segments with a vector database and applies attention mechanisms to improve subtitle translation accuracy in long videos.

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

cs.CV · 2026-04-21 · unverdicted · novelty 3.0

DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.

Scalable Reinforcement Learning via Adaptive Batch Scaling

stat.ML · 2026-05-20

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

cs.MA · 2026-02-02

From Next Token Prediction to (STRIPS) World Models

cs.AI · 2025-09-16

citing papers explorer

Showing 20 of 20 citing papers.

Consistency Models cs.LG · 2023-03-02 · conditional · none · ref 36
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 297
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Building Deep Graph Predictors with Graph Imitation Learning cs.CV · 2026-01-21 · unverdicted · none · ref 39
GRAIL trains graph predictors via imitation learning by modeling generation as sequential decisions on partial graph embeddings, matching or exceeding prior methods on 18 benchmarks.
On the Convergence of Muon and Beyond cs.LG · 2025-09-19 · unverdicted · none · ref 33
Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference cs.CV · 2023-10-06 · unverdicted · none · ref 66
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
Anon: Extrapolating Adaptivity Beyond SGD and Adam cs.AI · 2026-05-04 · unverdicted · none · ref 8
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.
Particle transformers for identifying Lorentz-boosted Higgs bosons decaying to a pair of W bosons hep-ex · 2026-04-10 · unverdicted · none · ref 87
PaRT achieves >50% tagging efficiency for boosted H->WW jets at 1% background efficiency, decorrelated from jet mass, with data-to-simulation scale factors of 0.9-1.0 on 138 fb^{-1} of 13 TeV collisions.
Delve into the Applicability of Advanced Optimizers for Multi-Task Learning cs.LG · 2026-04-10 · unverdicted · none · ref 4
APT augments multi-task learning by adapting advanced optimizers via momentum balancing and light direction preservation, delivering performance gains on four standard MTL datasets.
Improved Techniques for Training Consistency Models cs.LG · 2023-10-22 · accept · none · ref 7
Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models cs.LG · 2023-06-24 · unverdicted · none · ref 82
H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.
Deep neural networks with Fisher vector encoding for medical image classification cs.CV · 2026-05-03 · unverdicted · none · ref 45
Fisher vector encoding integrated into CNN-ViT hybrids outperforms benchmarks on MedMNIST datasets and matches literature results on other medical image sets.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments cs.LG · 2026-05-01 · unverdicted · none · ref 41
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
Characterizing the Instrumental Profile of LAMOST astro-ph.IM · 2026-03-10 · unverdicted · none · ref 22
Neural network derives LAMOST instrumental profiles from arc lamps and reduces RV dispersion by ~3 km/s.
Rapid training of Hamiltonian graph networks using random features cs.LG · 2025-06-06 · unverdicted · none · ref 53
Hamiltonian Graph Networks achieve 150-600x faster training via random feature parameter construction while retaining comparable accuracy and physical invariances on N-body systems up to 10,000 particles.
Neural Network-Based Virtual Wheel-Speed Sensor for Enhanced Low-Velocity State Estimation eess.SY · 2026-05-12 · unverdicted · none · ref 16
A neural network fuses wheel and motor speed signals to cut wheel-speed estimation error by up to 85% versus the production sensor on real Volkswagen ID.7 data.
Video-guided Machine Translation with Global Video Context cs.CV · 2026-04-08 · unverdicted · none · ref 34
A globally video-guided multimodal translation framework retrieves semantically related video segments with a vector database and applies attention mechanisms to improve subtitle translation accuracy in long videos.
Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning cs.CV · 2026-04-21 · unverdicted · none · ref 69
DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.
Scalable Reinforcement Learning via Adaptive Batch Scaling stat.ML · 2026-05-20 · unreviewed · ref 12
TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning cs.MA · 2026-02-02 · unreviewed · ref 11
From Next Token Prediction to (STRIPS) World Models cs.AI · 2025-09-16 · unreviewed · ref 11

On the variance of the adaptive learning rate and beyond.arXiv preprint arXiv:1908.03265

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer