Decoupled weight decay regularization

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

cs.DC · 2026-05-15 · unverdicted · novelty 6.0

Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.

A Semi-Supervised Framework for Speech Confidence Detection using Whisper

cs.SD · 2026-05-12 · unverdicted · novelty 6.0

A hybrid semi-supervised framework fusing Whisper embeddings with acoustic and prosodic features achieves 0.751 Macro-F1 for speaker confidence detection and outperforms baselines including WavLM, HuBERT, and Wav2Vec 2.0.

ShardTensor: Domain Parallelism for Scientific Machine Learning

cs.DC · 2026-05-11 · unverdicted · novelty 6.0

ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.

Quantum Injection Pathways for Implicit Graph Neural Networks

quant-ph · 2026-05-09 · unverdicted · novelty 6.0

Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.

Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems

cs.NE · 2026-04-07 · unverdicted · novelty 6.0

CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.

SHIELD: Scalable Optimal Control with Certification using Duality and Convexity

cs.RO · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

SHIELD derives safe certificates from Lagrangian duality to reduce decision variables and constraints in convex programs, accelerated by a transformer network, delivering order-of-magnitude speedups in stochastic MPC for multi-modal traffic with preserved feasibility and safety.

At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts

cs.AR · 2026-04-28 · unverdicted · novelty 3.0

A systolic-array CNN accelerator on the Lattice iCE40UP5K FPGA achieves 98% validation accuracy for SCG feature classification while using 8.55 mW, 95.5 ms inference time, 2,861 LUTs, and 7 DSP blocks.

citing papers explorer

Showing 7 of 7 citing papers.

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training cs.DC · 2026-05-15 · unverdicted · none · ref 1
Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.
A Semi-Supervised Framework for Speech Confidence Detection using Whisper cs.SD · 2026-05-12 · unverdicted · none · ref 45
A hybrid semi-supervised framework fusing Whisper embeddings with acoustic and prosodic features achieves 0.751 Macro-F1 for speaker confidence detection and outperforms baselines including WavLM, HuBERT, and Wav2Vec 2.0.
ShardTensor: Domain Parallelism for Scientific Machine Learning cs.DC · 2026-05-11 · unverdicted · none · ref 35
ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
Quantum Injection Pathways for Implicit Graph Neural Networks quant-ph · 2026-05-09 · unverdicted · none · ref 50
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.
Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems cs.NE · 2026-04-07 · unverdicted · none · ref 30
CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.
SHIELD: Scalable Optimal Control with Certification using Duality and Convexity cs.RO · 2026-05-09 · unverdicted · none · ref 30 · 2 links
SHIELD derives safe certificates from Lagrangian duality to reduce decision variables and constraints in convex programs, accelerated by a transformer network, delivering order-of-magnitude speedups in stochastic MPC for multi-modal traffic with preserved feasibility and safety.
At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts cs.AR · 2026-04-28 · unverdicted · none · ref 26
A systolic-array CNN accelerator on the Lattice iCE40UP5K FPGA achieves 98% validation accuracy for SCG feature classification while using 8.55 mW, 95.5 ms inference time, 2,861 LUTs, and 7 DSP blocks.

Decoupled weight decay regularization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer