pith. sign in

hub

Decoupled weight decay regularization

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

background 1 method 1

citation-polarity summary

years

2026 11 2025 1

representative citing papers

Representation Fr\'echet Loss for Visual Generation

cs.CV · 2026-04-30 · unverdicted · novelty 8.0

Fréchet Distance optimized as FD-loss in representation space by decoupling population size from batch size improves generator quality, enables one-step generation from multi-step models, and motivates a multi-representation metric FDr^k.

Neural Statistical Functions

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Neural statistical functions use prefix statistics to unify and directly predict statistical quantities over continuous ranges from pre-trained single-sample models without repeated sampling.

SBBTS: A Unified Schr\"odinger-Bass Framework for Synthetic Financial Time Series

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

SBBTS creates a diffusion process that jointly models drift and stochastic volatility in financial time series via a tractable decomposition into conditional transport problems, recovering parameters missed by prior Schrödinger bridge methods and improving downstream ML performance on S&P 500 data.

Stitched Value Model for Diffusion Alignment

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

StitchVM stitches clean-image reward models with diffusion backbones to enable efficient value estimation for noisy latents, speeding up diffusion alignment methods like DPS by 3.2x and halving memory.

Nucleus-Image: Sparse MoE for Image Generation

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.

Spectral Condition for $\mu$P under Width-Depth Scaling

cs.LG · 2026-02-28 · unverdicted · novelty 6.0

A unified spectral condition for μP under width-depth scaling reveals a transition at k=1 vs k≥2 transformations per residual block and enables stable feature learning for practical architectures like Transformers.

citing papers explorer

Showing 12 of 12 citing papers.