Fréchet Distance optimized as FD-loss in representation space by decoupling population size from batch size improves generator quality, enables one-step generation from multi-step models, and motivates a multi-representation metric FDr^k.
hub
Decoupled weight decay regularization
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Neural statistical functions use prefix statistics to unify and directly predict statistical quantities over continuous ranges from pre-trained single-sample models without repeated sampling.
WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.
SBBTS creates a diffusion process that jointly models drift and stochastic volatility in financial time series via a tractable decomposition into conditional transport problems, recovering parameters missed by prior Schrödinger bridge methods and improving downstream ML performance on S&P 500 data.
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
StitchVM stitches clean-image reward models with diffusion backbones to enable efficient value estimation for noisy latents, speeding up diffusion alignment methods like DPS by 3.2x and halving memory.
Federated learning framework for SNNs that adapts to heterogeneous temporal resolutions via neuron parameter integration, recovering accuracy on SHD and DVS-Gesture under varied mismatch scenarios.
UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
A unified spectral condition for μP under width-depth scaling reveals a transition at k=1 vs k≥2 transformations per residual block and enables stable feature learning for practical architectures like Transformers.
Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.
VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.
citing papers explorer
-
Representation Fr\'echet Loss for Visual Generation
Fréchet Distance optimized as FD-loss in representation space by decoupling population size from batch size improves generator quality, enables one-step generation from multi-step models, and motivates a multi-representation metric FDr^k.
-
Neural Statistical Functions
Neural statistical functions use prefix statistics to unify and directly predict statistical quantities over continuous ranges from pre-trained single-sample models without repeated sampling.
-
WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images
WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.
-
SBBTS: A Unified Schr\"odinger-Bass Framework for Synthetic Financial Time Series
SBBTS creates a diffusion process that jointly models drift and stochastic volatility in financial time series via a tractable decomposition into conditional transport problems, recovering parameters missed by prior Schrödinger bridge methods and improving downstream ML performance on S&P 500 data.
-
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
-
Stitched Value Model for Diffusion Alignment
StitchVM stitches clean-image reward models with diffusion backbones to enable efficient value estimation for noisy latents, speeding up diffusion alignment methods like DPS by 3.2x and halving memory.
-
Federated Learning of Spiking Neural Networks under Heterogeneous Temporal Resolutions
Federated learning framework for SNNs that adapts to heterogeneous temporal resolutions via neuron parameter integration, recovering accuracy on SHD and DVS-Gesture under varied mismatch scenarios.
-
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.
-
Nucleus-Image: Sparse MoE for Image Generation
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
-
Spectral Condition for $\mu$P under Width-Depth Scaling
A unified spectral condition for μP under width-depth scaling reveals a transition at k=1 vs k≥2 transformations per residual block and enables stable feature learning for practical architectures like Transformers.
-
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.
-
VGGT-Occ: Geometry-Grounded and Density-Aware Gated Fusion for 3D Occupancy Prediction
VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.