hub

Controlnext: Powerful and effi- cient control for image and video generation

Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming- Chang Yang, Jiaya Jia · 2024 · arXiv 2408.06070

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

TexADiff integrates a Relative Texture Density Map into diffusion-based super-resolution to address imbalanced textures in remote sensing images, yielding better high-frequency details and downstream task gains.

Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

Immune2V immunizes images against dual-stream I2V generation by enforcing temporally balanced latent divergence and aligning generative features to a precomputed collapse trajectory, yielding stronger persistent degradation than image-level baselines.

GT-SVJ: Generative-Transformer-Based Self-Supervised Video Judge For Efficient Video Reward Modeling

cs.CV · 2026-02-05 · unverdicted · novelty 7.0

GT-SVJ turns video generative models into self-supervised reward judges via EBM reformulation and contrastive training on controlled synthetic degradations, claiming SOTA on GenAI-Bench and MonteBench with 30K annotations.

One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer

cs.CV · 2025-11-28 · unverdicted · novelty 7.0

One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

cs.CV · 2025-09-27 · unverdicted · novelty 7.0

Vid-Freeze immunizes images by adding perturbations that target attention dynamics in I2V models to enforce temporal freezing and suppress motion synthesis.

PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

cs.CV · 2025-05-28 · unverdicted · novelty 7.0

PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.

SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

SignVerse-2M provides a 2-million-clip multilingual pose-native dataset for sign language derived from public videos via DWPose preprocessing to enable robust modeling in real-world conditions.

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

cs.CV · 2026-03-31 · unverdicted · novelty 6.0

HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.

VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification

cs.CV · 2025-12-10 · unverdicted · novelty 6.0

VHOI densifies sparse trajectories into color-encoded HOI mask sequences and conditions a fine-tuned video diffusion model on them to produce controllable human-object interaction videos, including full navigation sequences.

EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

EasyVFX decouples VFX generation via frequency-aware Mixture-of-Experts and test-time training to achieve realistic effects with limited resources.

DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

DepthPilot generates physically consistent and clinically interpretable colonoscopy videos by injecting depth priors into diffusion models through parameter-efficient fine-tuning and replacing linear denoising weights with adaptive splines.

ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation

cs.RO · 2025-09-23 · unverdicted · novelty 5.0

ROPA augments bimanual imitation learning datasets by generating synthetic RGB-D observations and actions via fine-tuned diffusion models with physical consistency constraints.

Open-Sora Plan: Open-Source Large Video Generation Model

cs.CV · 2024-11-28 · unverdicted · novelty 4.0

Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.

citing papers explorer

Showing 13 of 13 citing papers.

Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework cs.CV · 2026-04-15 · unverdicted · none · ref 20
TexADiff integrates a Relative Texture Density Map into diffusion-based super-resolution to address imbalanced textures in remote sensing images, yielding better high-frequency details and downstream task gains.
Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation cs.CV · 2026-04-12 · unverdicted · none · ref 33
Immune2V immunizes images against dual-stream I2V generation by enforcing temporally balanced latent divergence and aligning generative features to a precomputed collapse trajectory, yielding stronger persistent degradation than image-level baselines.
GT-SVJ: Generative-Transformer-Based Self-Supervised Video Judge For Efficient Video Reward Modeling cs.CV · 2026-02-05 · unverdicted · none · ref 29
GT-SVJ turns video generative models into self-supervised reward judges via EBM reformulation and contrastive training on controlled synthetic degradations, claiming SOTA on GenAI-Bench and MonteBench with 30K annotations.
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer cs.CV · 2025-11-28 · unverdicted · none · ref 34
One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing cs.CV · 2025-09-27 · unverdicted · none · ref 11
Vid-Freeze immunizes images by adding perturbations that target attention dynamics in I2V models to enforce temporal freezing and suppress motion synthesis.
PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models cs.CV · 2025-05-28 · unverdicted · none · ref 58
PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.
SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages cs.CV · 2026-05-03 · unverdicted · none · ref 14
SignVerse-2M provides a 2-million-clip multilingual pose-native dataset for sign language derived from public videos via DWPose preprocessing to enable robust modeling in real-world conditions.
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis cs.CV · 2026-03-31 · unverdicted · none · ref 52
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification cs.CV · 2025-12-10 · unverdicted · none · ref 58
VHOI densifies sparse trajectories into color-encoded HOI mask sequences and conditions a fine-tuned video diffusion model on them to produce controllable human-object interaction videos, including full navigation sequences.
EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation cs.CV · 2026-05-21 · unverdicted · none · ref 39
EasyVFX decouples VFX generation via frequency-aware Mixture-of-Experts and test-time training to achieve realistic effects with limited resources.
DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation cs.CV · 2026-04-29 · unverdicted · none · ref 21
DepthPilot generates physically consistent and clinically interpretable colonoscopy videos by injecting depth priors into diffusion models through parameter-efficient fine-tuning and replacing linear denoising weights with adaptive splines.
ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation cs.RO · 2025-09-23 · unverdicted · none · ref 68
ROPA augments bimanual imitation learning datasets by generating synthetic RGB-D observations and actions via fine-tuned diffusion models with physical consistency constraints.
Open-Sora Plan: Open-Source Large Video Generation Model cs.CV · 2024-11-28 · unverdicted · none · ref 14
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.

Controlnext: Powerful and effi- cient control for image and video generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer