Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.
Decoupled weight decay regularization
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7representative citing papers
A hybrid semi-supervised framework fusing Whisper embeddings with acoustic and prosodic features achieves 0.751 Macro-F1 for speaker confidence detection and outperforms baselines including WavLM, HuBERT, and Wav2Vec 2.0.
ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.
CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.
SHIELD derives safe certificates from Lagrangian duality to reduce decision variables and constraints in convex programs, accelerated by a transformer network, delivering order-of-magnitude speedups in stochastic MPC for multi-modal traffic with preserved feasibility and safety.
A systolic-array CNN accelerator on the Lattice iCE40UP5K FPGA achieves 98% validation accuracy for SCG feature classification while using 8.55 mW, 95.5 ms inference time, 2,861 LUTs, and 7 DSP blocks.
citing papers explorer
-
Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training
Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.
-
A Semi-Supervised Framework for Speech Confidence Detection using Whisper
A hybrid semi-supervised framework fusing Whisper embeddings with acoustic and prosodic features achieves 0.751 Macro-F1 for speaker confidence detection and outperforms baselines including WavLM, HuBERT, and Wav2Vec 2.0.
-
ShardTensor: Domain Parallelism for Scientific Machine Learning
ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
-
Quantum Injection Pathways for Implicit Graph Neural Networks
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.
-
Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems
CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.
-
SHIELD: Scalable Optimal Control with Certification using Duality and Convexity
SHIELD derives safe certificates from Lagrangian duality to reduce decision variables and constraints in convex programs, accelerated by a transformer network, delivering order-of-magnitude speedups in stochastic MPC for multi-modal traffic with preserved feasibility and safety.
-
At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts
A systolic-array CNN accelerator on the Lattice iCE40UP5K FPGA achieves 98% validation accuracy for SCG feature classification while using 8.55 mW, 95.5 ms inference time, 2,861 LUTs, and 7 DSP blocks.