Fréchet Distance optimized as FD-loss in representation space by decoupling population size from batch size improves generator quality, enables one-step generation from multi-step models, and motivates a multi-representation metric FDr^k.
hub
Decoupled weight decay regularization
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Neural statistical functions use prefix statistics to unify and directly predict statistical quantities over continuous ranges from pre-trained single-sample models without repeated sampling.
WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.
SBBTS creates a diffusion process that jointly models drift and stochastic volatility in financial time series via a tractable decomposition into conditional transport problems, recovering parameters missed by prior Schrödinger bridge methods and improving downstream ML performance on S&P 500 data.
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
StitchVM stitches clean-image reward models with diffusion backbones to enable efficient value estimation for noisy latents, speeding up diffusion alignment methods like DPS by 3.2x and halving memory.
Federated learning framework for SNNs that adapts to heterogeneous temporal resolutions via neuron parameter integration, recovering accuracy on SHD and DVS-Gesture under varied mismatch scenarios.
UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
A unified spectral condition for μP under width-depth scaling reveals a transition at k=1 vs k≥2 transformations per residual block and enables stable feature learning for practical architectures like Transformers.
Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.
VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.
citing papers explorer
-
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.