hub

Decoupled weight decay regularization

Ilya Loshchilov, Frank Hutter · 2019

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

browse 13 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

MapFormer: Self-Supervised Learning of Cognitive Maps with Input-Dependent Positional Embeddings

cs.LG · 2025-11-24 · unverdicted · novelty 7.0

MapFormers learn cognitive maps via input-dependent Lie-algebra positional encodings and achieve near-perfect OOD generalization on cognitive tasks where standard transformers fail.

Mantis: A Foundation Model for Mechanistic Disease Forecasting

cs.AI · 2025-08-17 · unverdicted · novelty 7.0

A foundation model trained only on disease simulations achieves top-ranked forecasting accuracy across 16 diseases and beats all CDC COVID-19 hub models on early unseen pandemic data.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization

math.OC · 2026-05-13 · unverdicted · novelty 6.0

Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.

Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning

cs.CV · 2026-05-09 · conditional · novelty 6.0

Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.

NPN: Non-Linear Projections of the Null-Space for Imaging Inverse Problems

cs.CV · 2025-10-02 · unverdicted · novelty 6.0

NPN introduces a neural-network-based regularization that promotes reconstructions lying in a low-dimensional projection of the sensing operator's null-space, with claimed theoretical guarantees and improved empirical performance across compressive sensing, deblurring, super-resolution, CT, and MRI.

WhisperRT -- Turning Whisper into a Causal Streaming Model

cs.CL · 2025-08-17 · conditional · novelty 6.0

WhisperRT converts Whisper to a causal streaming ASR model via encoder causality, decoder synchronization on partial states, and fine-tuning, achieving better performance than non-fine-tuned streaming methods on sub-300ms chunks with lower complexity.

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

cs.CL · 2025-05-30 · conditional · novelty 6.0

Prolonged RL training with KL control and reference policy resetting enables LLMs to develop novel reasoning strategies inaccessible to base models even under extensive sampling.

Nomic Embed: Training a Reproducible Long Context Text Embedder

cs.CL · 2024-02-02 · conditional · novelty 6.0

Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.

Unsupervised Dense Information Retrieval with Contrastive Learning

cs.IR · 2021-12-16 · unverdicted · novelty 6.0

Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.

Polyformer: a generative framework for thermodynamic modeling of polymeric molecules

q-bio.BM · 2026-04-15 · unverdicted · novelty 5.0

Polyformer generates sequence- and temperature-dependent conformational ensembles for proteins that agree with molecular dynamics simulations.

Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

cs.LG · 2026-04-08 · unverdicted · novelty 5.0

Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.

Deep Learning of Solver-Aware Turbulence Closures from Nudged LES Dynamics

physics.flu-dyn · 2026-04-26

citing papers explorer

Showing 13 of 13 citing papers.

MapFormer: Self-Supervised Learning of Cognitive Maps with Input-Dependent Positional Embeddings cs.LG · 2025-11-24 · unverdicted · none · ref 34
MapFormers learn cognitive maps via input-dependent Lie-algebra positional encodings and achieve near-perfect OOD generalization on cognitive tasks where standard transformers fail.
Mantis: A Foundation Model for Mechanistic Disease Forecasting cs.AI · 2025-08-17 · unverdicted · none · ref 20
A foundation model trained only on disease simulations achieves top-ranked forecasting accuracy across 16 diseases and beats all CDC COVID-19 hub models on early unseen pandemic data.
LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 37
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization math.OC · 2026-05-13 · unverdicted · none · ref 29
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning cs.CV · 2026-05-09 · conditional · none · ref 43
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
NPN: Non-Linear Projections of the Null-Space for Imaging Inverse Problems cs.CV · 2025-10-02 · unverdicted · none · ref 36
NPN introduces a neural-network-based regularization that promotes reconstructions lying in a low-dimensional projection of the sensing operator's null-space, with claimed theoretical guarantees and improved empirical performance across compressive sensing, deblurring, super-resolution, CT, and MRI.
WhisperRT -- Turning Whisper into a Causal Streaming Model cs.CL · 2025-08-17 · conditional · none · ref 21
WhisperRT converts Whisper to a causal streaming ASR model via encoder causality, decoder synchronization on partial states, and fine-tuning, achieving better performance than non-fine-tuned streaming methods on sub-300ms chunks with lower complexity.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models cs.CL · 2025-05-30 · conditional · none · ref 20
Prolonged RL training with KL control and reference policy resetting enables LLMs to develop novel reasoning strategies inaccessible to base models even under extensive sampling.
Nomic Embed: Training a Reproducible Long Context Text Embedder cs.CL · 2024-02-02 · conditional · none · ref 34
Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.
Unsupervised Dense Information Retrieval with Contrastive Learning cs.IR · 2021-12-16 · unverdicted · none · ref 155
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Polyformer: a generative framework for thermodynamic modeling of polymeric molecules q-bio.BM · 2026-04-15 · unverdicted · none · ref 20
Polyformer generates sequence- and temperature-dependent conformational ensembles for proteins that agree with molecular dynamics simulations.
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference cs.LG · 2026-04-08 · unverdicted · none · ref 29
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
Deep Learning of Solver-Aware Turbulence Closures from Nudged LES Dynamics physics.flu-dyn · 2026-04-26 · unreviewed · ref 23

Decoupled weight decay regularization

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer