CEDAR learns an invertible rotation of vision-language embeddings to concentrate semantics into sparse, axis-aligned coordinates for improved interpretability.
Imagenet: A large-scale hierarchical image database
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A new framework is introduced for end-to-end provable robustness against backdoor attacks by composing randomized smoothing with differentially private training via privacy profiles.
Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.
SSL4RL reformulates self-supervised learning objectives into dense, verifiable reward signals for RL-based fine-tuning of vision-language models, yielding performance gains on reasoning benchmarks.
Improved MeanFlow (iMF) reaches 1.72 FID on ImageNet 256x256 with one function evaluation by reformulating the training objective as a regression on instantaneous velocity and treating guidance as flexible conditioning variables.
citing papers explorer
-
Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models
CEDAR learns an invertible rotation of vision-language embeddings to concentrate semantics into sparse, axis-aligned coordinates for improved interpretability.
-
Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy
A new framework is introduced for end-to-end provable robustness against backdoor attacks by composing randomized smoothing with differentially private training via privacy profiles.
-
Does Engram Do Memory Retrieval in Autoregressive Image Generation?
Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.
-
SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
SSL4RL reformulates self-supervised learning objectives into dense, verifiable reward signals for RL-based fine-tuning of vision-language models, yielding performance gains on reasoning benchmarks.
-
Improved Mean Flows: On the Challenges of Fastforward Generative Models
Improved MeanFlow (iMF) reaches 1.72 FID on ImageNet 256x256 with one function evaluation by reformulating the training objective as a regression on instantaneous velocity and treating guidance as flexible conditioning variables.