ViPS learns a universal, controllable pose space for auto-rigged meshes by transferring motion priors from video diffusion models, matching SOTA performance on plausibility and diversity while enabling zero-shot generalization.
Journal of machine learning research21(140), 1–67 (2020)
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
method 1polarities
use method 1representative citing papers
A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.
RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.
MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
SHIFT learns and applies steering vectors to selected layers and timesteps in DiT models to suppress concepts, shift styles, or bias objects while keeping image quality and prompt adherence intact.
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.
citing papers explorer
-
ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes
ViPS learns a universal, controllable pose space for auto-rigged meshes by transferring motion priors from video diffusion models, matching SOTA performance on plausibility and diversity while enabling zero-shot generalization.
-
Self-supervised pretraining for an iterative image size agnostic vision transformer
A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.
-
DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer
RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.
-
MAGE: Modality-Agnostic Music Generation and Editing
MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
-
SHIFT: Steering Hidden Intermediates in Flow Transformers
SHIFT learns and applies steering vectors to selected layers and timesteps in DiT models to suppress concepts, shift styles, or bias objects while keeping image quality and prompt adherence intact.
-
SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
-
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.