super hub Mixed citations

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter · 2017 · cs.LG · arXiv 1711.05101

Mixed citation behavior. Most common role is method (58%).

952 Pith papers citing it

Method 58% of classified citations

open full Pith review browse 952 citing papers more from Ilya Loshchilov and Frank Hutter arXiv PDF

abstract

L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it "weight decay" in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by \emph{decoupling} the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and (ii) substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). Our proposed decoupled weight decay has already been adopted by many researchers, and the community has implemented it in TensorFlow and PyTorch; the complete source code for our experiments is available at https://github.com/loshchil/AdamW-and-SGDW

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 102 background 64 dataset 5 baseline 3 other 3

citation-polarity summary

use method 102 background 57 unclear 9 use dataset 5 baseline 3 support 1

claims ledger

abstract L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it "weight decay" in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by \emph{decoupling} the weight decay from the optimization steps taken w.r.t. the

authors

Ilya Loshchilov and Frank Hutter

co-cited works

representative citing papers

Every9D-21M: Large-Scale Real-World 9D Canonicalization of Everyday Objects

cs.CV · 2026-05-27 · conditional · novelty 8.0

Every9D-21M supplies 21.8M real-world 9D pose annotations for 700 everyday categories by propagating manual canonical poses through cross-instance alignment in object-centric videos and verifying them multiview.

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

cs.CV · 2026-05-13 · unverdicted · novelty 8.0

AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.

Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation

cs.GR · 2026-05-13 · unverdicted · novelty 8.0

Rigel3D jointly generates rigged 3D meshes with geometry, skeleton topology, joint positions, and skinning weights using coupled surface and skeleton latent representations for image-conditioned animation-ready asset synthesis.

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

cs.CV · 2026-05-12 · unverdicted · novelty 8.0

TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.

Dissecting Jet-Tagger Through Mechanistic Interpretability

hep-ph · 2026-05-11 · accept · novelty 8.0

A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.

LLM Translation of Compiler Intermediate Representation

cs.PL · 2026-05-07 · unverdicted · novelty 8.0

IRIS-14B is the first LLM trained explicitly for GIMPLE-to-LLVM IR translation and outperforms much larger models by up to 44 percentage points on real-world C code.

Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum

cs.LG · 2026-05-03 · unverdicted · novelty 8.0 · 2 refs

Momentum-based async SGD achieves optimal convergence rates for data-dependent delays without biasing updates toward simpler samples.

CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models

cs.CV · 2026-05-03 · unverdicted · novelty 8.0

CADFS supplies a large real-world CAD dataset and FeatureScript representation that, after VLM fine-tuning, produces more accurate and feature-rich designs than prior generative CAD systems.

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

cs.LG · 2026-04-14 · unverdicted · novelty 8.0

CLAD is the first deep learning framework for log anomaly detection that operates directly on compressed byte streams using a dilated convolutional encoder, hybrid Transformer-mLSTM, and two-stage training, achieving 0.9909 average F1-score across five datasets.

Rotation Equivariant Mamba for Vision Tasks

cs.CV · 2026-03-10 · unverdicted · novelty 8.0

EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.

A document is worth a structured record: Principled inductive bias design for document recognition

cs.CV · 2025-07-11 · unverdicted · novelty 8.0

Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

VMamba: Visual State Space Model

cs.CV · 2024-01-18 · conditional · novelty 8.0

VMamba introduces a state-space vision backbone using 2D selective scanning across four routes to achieve linear complexity and strong performance on image tasks.

Progress measures for grokking via mechanistic interpretability

cs.LG · 2023-01-12 · accept · novelty 8.0

Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Decision Transformer: Reinforcement Learning via Sequence Modeling

cs.LG · 2021-06-02 · accept · novelty 8.0

Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.

RoFormer: Enhanced Transformer with Rotary Position Embedding

cs.CL · 2021-04-20 · accept · novelty 8.0

RoFormer introduces rotary position embeddings that encode absolute positions via rotation matrices and relative dependencies in attention, outperforming prior position methods on long text classification tasks.

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

Morphing into Hybrid Attention Models

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

FlashMorph formulates hybrid layer selection as budget-constrained optimization, trains per-layer gates on synthetic retrieval data with linearization regularization, then discretizes and distills to produce efficient hybrid architectures.

Learning from Acquisition: Metadata-driven Multimodal Pre-training for Cardiac MRI

cs.CV · 2026-06-27 · unverdicted · novelty 7.0

MetaCLIP-CMR applies CLIP-style contrastive learning to cardiac MRI by treating acquisition metadata as text labels, delivering 86.8% modality and 86.5% view accuracy plus top Dice scores on ACDC/M&Ms segmentation with far less pre-training data than recent large-scale CMR models.

Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

PRA approximates sequential rollout training in parallel for pixel-space AR models via intermediate states and a pixel decoder, achieving FID 2.58 (135M params) and 1.94 (511M params) on ImageNet-1K 256x256, new SOTA among pixel-space AR models.

citing papers explorer

Showing 50 of 952 citing papers.

Every9D-21M: Large-Scale Real-World 9D Canonicalization of Everyday Objects cs.CV · 2026-05-27 · conditional · none · ref 31 · internal anchor
Every9D-21M supplies 21.8M real-world 9D pose annotations for 700 everyday categories by propagating manual canonical poses through cross-instance alignment in object-centric videos and verifying them multiview.
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation cs.CV · 2026-05-13 · unverdicted · none · ref 46 · internal anchor
AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.
Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation cs.GR · 2026-05-13 · unverdicted · none · ref 6 · internal anchor
Rigel3D jointly generates rigged 3D meshes with geometry, skeleton topology, joint positions, and skinning weights using coupled surface and skeleton latent representations for image-conditioned animation-ready asset synthesis.
TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking cs.CV · 2026-05-12 · unverdicted · none · ref 50 · internal anchor
TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
Dissecting Jet-Tagger Through Mechanistic Interpretability hep-ph · 2026-05-11 · accept · none · ref 43 · internal anchor
A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.
LLM Translation of Compiler Intermediate Representation cs.PL · 2026-05-07 · unverdicted · none · ref 25 · internal anchor
IRIS-14B is the first LLM trained explicitly for GIMPLE-to-LLVM IR translation and outperforms much larger models by up to 44 percentage points on real-world C code.
Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum cs.LG · 2026-05-03 · unverdicted · none · ref 9 · 2 links · internal anchor
Momentum-based async SGD achieves optimal convergence rates for data-dependent delays without biasing updates toward simpler samples.
CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models cs.CV · 2026-05-03 · unverdicted · none · ref 25 · internal anchor
CADFS supplies a large real-world CAD dataset and FeatureScript representation that, after VLM fine-tuning, produces more accurate and feature-rich designs than prior generative CAD systems.
Stability and Generalization in Looped Transformers cs.LG · 2026-04-16 · unverdicted · none · ref 15 · internal anchor
Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations cs.LG · 2026-04-14 · unverdicted · none · ref 14 · internal anchor
CLAD is the first deep learning framework for log anomaly detection that operates directly on compressed byte streams using a dilated convolutional encoder, hybrid Transformer-mLSTM, and two-stage training, achieving 0.9909 average F1-score across five datasets.
Rotation Equivariant Mamba for Vision Tasks cs.CV · 2026-03-10 · unverdicted · none · ref 68 · internal anchor
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
A document is worth a structured record: Principled inductive bias design for document recognition cs.CV · 2025-07-11 · unverdicted · none · ref 61 · internal anchor
Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.
Large Language Diffusion Models cs.CL · 2025-02-14 · unverdicted · none · ref 30 · internal anchor
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
ORPO: Monolithic Preference Optimization without Reference Model cs.CL · 2024-03-12 · conditional · none · ref 35 · internal anchor
ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.
VMamba: Visual State Space Model cs.CV · 2024-01-18 · conditional · none · ref 38 · internal anchor
VMamba introduces a state-space vision backbone using 2D selective scanning across four routes to achieve linear complexity and strong performance on image tasks.
Progress measures for grokking via mechanistic interpretability cs.LG · 2023-01-12 · accept · none · ref 39 · internal anchor
Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.
Discovering Latent Knowledge in Language Models Without Supervision cs.CL · 2022-12-07 · conditional · none · ref 19 · internal anchor
An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow cs.LG · 2022-09-07 · unverdicted · none · ref 45 · internal anchor
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Decision Transformer: Reinforcement Learning via Sequence Modeling cs.LG · 2021-06-02 · accept · none · ref 73 · internal anchor
Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
RoFormer: Enhanced Transformer with Rotary Position Embedding cs.CL · 2021-04-20 · accept · none · ref 13 · internal anchor
RoFormer introduces rotary position embeddings that encode absolute positions via rotation matrices and relative dependencies in attention, outperforming prior position methods on long text classification tasks.
Language Models are Few-Shot Learners cs.CL · 2020-05-28 · accept · none · ref 38 · internal anchor
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
Morphing into Hybrid Attention Models cs.CL · 2026-06-29 · unverdicted · none · ref 39 · internal anchor
FlashMorph formulates hybrid layer selection as budget-constrained optimization, trains per-layer gates on synthetic retrieval data with linearization regularization, then discretizes and distills to produce efficient hybrid architectures.
Learning from Acquisition: Metadata-driven Multimodal Pre-training for Cardiac MRI cs.CV · 2026-06-27 · unverdicted · none · ref 8 · internal anchor
MetaCLIP-CMR applies CLIP-style contrastive learning to cardiac MRI by treating acquisition metadata as text labels, delivering 86.8% modality and 86.5% view accuracy plus top Dice scores on ACDC/M&Ms segmentation with far less pre-training data than recent large-scale CMR models.
Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation cs.CV · 2026-06-26 · unverdicted · none · ref 7 · internal anchor
PRA approximates sequential rollout training in parallel for pixel-space AR models via intermediate states and a pixel decoder, achieving FID 2.58 (135M params) and 1.94 (511M params) on ImageNet-1K 256x256, new SOTA among pixel-space AR models.
ParticleTransformer is all you need for reconstructing hadronic tau leptons hep-ex · 2026-06-16 · unverdicted · none · ref 48 · internal anchor
First fully machine-learned hadronic tau reconstruction at FCC-ee using ParticleTransformer achieves high performance on simulated data for identification, decay mode, charge, and kinematics.
Breaking the Cascade: Compact Nonlinear Optical Computing with Single-Layer Encoder-Decoder Co-Localization physics.optics · 2026-05-31 · unverdicted · none · ref 14 · internal anchor
A single diffractive layer with encoder-decoder co-localization achieves universal approximation of band-limited nonlinear functions via coherent interference under coherent illumination.
MMDG-Bench: A Benchmark for Multimodal Domain Generalization cs.CV · 2026-05-30 · unverdicted · none · ref 26 · internal anchor
MMDG-Bench provides unified protocols and ten baselines for multimodal domain generalization, showing structured DG-MML combinations often outperform prior methods with insights on framework choice and backbone effects.
FlowOVD: Learning Generative Latent Flows for Zero-shot Open-vocabulary Detection cs.CV · 2026-05-30 · unverdicted · none · ref 19 · internal anchor
FlowOVD applies rectified flow to generate continuous latent query dynamics for text-conditioned open-vocabulary detection, reporting 49.5 AP on COCO and 31.5 AP on LVIS.
FiSeR: Fine-Grained Source Representations for Cross-Domain AI Image Detection cs.CV · 2026-05-30 · unverdicted · none · ref 5 · internal anchor
FiSeR uses coarse contrastive separation of natural vs synthetic images plus fine contrastive grouping by generator identity to improve cross-domain AUROC by +10.22 over DIRE baseline on multiple test sets.
Learning Global Motion with Compact Gaussians for Feed-Forward 4D Reconstruction cs.CV · 2026-05-29 · unverdicted · none · ref 62 · internal anchor
C4G introduces compact timestamp-conditioned Gaussian query tokens that aggregate full temporal context to decode 3D Gaussians with timestamp-modulated positions for feed-forward 4D reconstruction from monocular video, plus a diffusion-based rendering module and extension to 4D feature fields.
Physics-Informed Coarsening for Multigrid Graph Neural Surrogates cs.LG · 2026-05-29 · unverdicted · none · ref 13 · internal anchor
Proposes residual-based physics-informed coarsening in multigrid GNNs to allocate capacity to high-activity regions for more stable solid mechanics surrogates.
ElasticMem: Latent Memory as a Learnable Resource for LLM Agents cs.CL · 2026-05-29 · unverdicted · none · ref 27 · internal anchor
ElasticMem enables LLM agents to learn adaptive latent memory retrieval and elastic budget allocation, improving QA accuracy by 24-26% and ALFWorld success by 27-66% over baselines with lower token cost.
OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction cs.LG · 2026-05-28 · unverdicted · none · ref 31 · internal anchor
OOD-GraphLLM is a graphLLM framework that jointly optimizes molecular graph representations and biomedical semantic language representations for out-of-distribution drug synergy prediction.
SeeGroup: Multi-Layer Depth Estimation of Transparent Surfaces via Self-Determined Grouping cs.CV · 2026-05-27 · unverdicted · none · ref 28 · internal anchor
SeeGroup formulates per-pixel multi-layer depth as a point process with permutation-invariant likelihood to support arbitrary groupings, raising quadruplet relative depth accuracy from 61.34% to 70.09% on the LayeredDepth benchmark.
Symbolic Regression via Latent Iterative Refinement cs.LG · 2026-05-26 · unverdicted · none · ref 13 · internal anchor
LEE performs iterative amortized inference in a functionally grounded latent space to produce 2-10x simpler symbolic expressions than strong baselines on SRBench.
Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model Uncertainty cs.LG · 2026-05-25 · unverdicted · none · ref 1 · internal anchor
GoBOED combines amortized variational inference with a differentiable convex decision layer to optimize experimental designs for a decision objective instead of parameter information gain, with a theoretical result on gradient insensitivity to irrelevant parameters.
World Models as Group Actions cs.CV · 2026-05-23 · unverdicted · none · ref 19 · internal anchor
Formalizes video world models as group actions on states and uses latent regularization with synthesized supervision to enforce consistency, introducing GAC and GAR metrics that improve structural correctness in SOTA models.
Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds cs.LG · 2026-05-23 · unverdicted · none · ref 19 · internal anchor
NEO is a mass-aware neural operator that learns the invariant low-frequency eigenspace of the LBO on point clouds for fast spectral geometry.
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction cs.CV · 2026-05-22 · unverdicted · none · ref 69 · internal anchor
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
Point Tracking Improves World Action Models cs.RO · 2026-05-22 · unverdicted · none · ref 84 · internal anchor
JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.
The physics of AI weather models physics.ao-ph · 2026-05-22 · unverdicted · none · ref 28 · internal anchor
AI weather models may simulate the atmosphere via particle positions in latent space whose updates follow gradient flow on a learned free energy functional rather than conventional physical equations.
Valid and Expressive Copulas for Irregular Multivariate Time Series cs.LG · 2026-05-22 · unverdicted · none · ref 15 · internal anchor
CopFITi is the first marginalization-consistent copula for irregular multivariate time series, using normalizing flows for marginals and a Gaussian mixture copula for dependencies to reach new state-of-the-art joint density modeling.
VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset cs.CV · 2026-05-22 · unverdicted · none · ref 30 · internal anchor
VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.
Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion cs.LG · 2026-05-22 · unverdicted · none · ref 49 · internal anchor
CDM amortizes SMC inference for reward-tilted discrete diffusion by training a parameterized twist function on contrastive samples with closed-form kernels.
Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning quant-ph · 2026-05-22 · unverdicted · none · ref 63 · internal anchor
CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.
GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving cs.LG · 2026-05-21 · unverdicted · none · ref 13 · internal anchor
GraphFlow uses a unified wGraph to dynamically instantiate workflows and manage KV caches for LLM agents, reporting 4.95 pp average gains and 4x memory reduction on five benchmarks.
No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos cs.CV · 2026-05-21 · unverdicted · none · ref 41 · internal anchor
NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models cs.CV · 2026-05-20 · unverdicted · none · ref 51 · internal anchor
Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
ShapeBench: A Scalable Benchmark and Diagnostic Suite for Standardized Evaluation in Aerodynamic Shape Optimization cs.LG · 2026-05-20 · unverdicted · none · ref 28 · 2 links · internal anchor
ShapeBench is a new benchmark suite for aerodynamic shape optimization with 103 tasks showing high variance in optimizer rankings across categories.
End-to-End Unmixing with Material Prompts for Hyperspectral Object Tracking cs.CV · 2026-05-20 · unverdicted · none · ref 53 · internal anchor
Introduces a joint optimization framework coupling deep spectral unmixing with target localization via material prompts and a weighted unmixing loss for hyperspectral object tracking.

Decoupled Weight Decay Regularization

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer