DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.
Denoising diffusion probabilistic models
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
method 1polarities
use method 1representative citing papers
Noise2Map repurposes diffusion model denoising into a direct predictor for semantic segmentation and change detection tasks in remote sensing, achieving top average ranks on benchmark datasets.
UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
DiffVC applies diffusion models for non-autoregressive video captioning, outperforming prior non-AR methods and matching AR ones in quality with faster speed on standard benchmarks.
SD-ReID trains a ViT to extract identity and view conditions, fine-tunes Stable Diffusion to generate view-mimicking features, adds a View-Refined Decoder, and combines both identity and all-view features for retrieval on aerial-ground re-identification benchmarks.
GCDance is a text-and-music-conditioned diffusion framework that generates genre-consistent 3D dance sequences and reports better results than prior methods on FineDance and AIST++.
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.
citing papers explorer
-
DrawMotion: Generating 3D Human Motions by Freehand Drawing
DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.
-
Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection
Noise2Map repurposes diffusion model denoising into a direct predictor for semantic segmentation and change detection tasks in remote sensing, achieving top average ranks on benchmark datasets.
-
Unified Reward Model for Multimodal Understanding and Generation
UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
-
DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning
DiffVC applies diffusion models for non-autoregressive video captioning, outperforming prior non-AR methods and matching AR ones in quality with faster speed on standard benchmarks.
-
SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
SD-ReID trains a ViT to extract identity and view conditions, fine-tunes Stable Diffusion to generate view-mimicking features, adds a View-Refined Decoder, and combines both identity and all-view features for retrieval on aerial-ground re-identification benchmarks.
-
GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation
GCDance is a text-and-music-conditioned diffusion framework that generates genre-consistent 3D dance sequences and reports better results than prior methods on FineDance and AIST++.
-
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.