hub Canonical reference

Sv4d: Dynamic 3d content generation with multi-frame and multi-view consistency

Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, Varun Jampani · 2024 · arXiv 2407.17470

Canonical reference. 80% of citing Pith papers cite this work as background.

23 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 23 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis

cs.CV · 2026-05-16 · unverdicted · novelty 7.0

DEVIS-GRPO applies online policy gradients with an accumulative small-to-large view sampling strategy and multi-level rewards to improve trajectory-controlled extreme view video generation, reporting gains on Kubric-4D, iPhone, and DL3DV datasets.

Probing into Camera Control of Video Models

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

A training-free method reformulates camera control as geometric displacement fields applied via differentiable latent resampling, enabling control and bias probing in video diffusion models.

DreamStereo: Towards Real-Time Stereo Inpainting for HD Videos

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

DreamStereo uses GAPW, PBDP, and SASI to enable real-time stereo video inpainting at 25 FPS for HD videos by reducing over 70% redundant computation while maintaining quality.

Action Images: End-to-End Policy Learning via Multiview Video Generation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.

OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.

SimWorlds: A Multi-Agent System for Dynamic 3D Scene Creation

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

SimWorlds presents a multi-agent system with planner-coder-reviewer workflow, layered scene protocol, and runtime inspection tools to create dynamic 4D scenes from text, plus the 4DBuildBench benchmark showing outperformance over baselines.

HAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

HAT-4D presents an agentic VLM-plus-human-in-the-loop pipeline for monocular 4D multi-object interaction reconstruction and releases the MVOIK-4D benchmark.

TriMotion: Modality-Agnostic Camera Control for Video Generation

cs.CV · 2026-06-18 · unverdicted · novelty 6.0

TriMotion is a modality-agnostic framework that maps video, pose, and text descriptions of the same camera trajectory into a shared motion embedding space, trained with a new triplet dataset and latent consistency objective, to produce videos that follow the target trajectory.

Streaming Video Generation with Streaming Force Control

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

StreamForce presents a unified causal model for force-controllable streaming video generation using a new force representation and distillation pipeline, claiming SOTA force adherence and 16.6 FPS performance.

E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

E³C is a video diffusion model that disentangles persistent 3D scene structure via point-cloud memory from human dynamics via ego-exo pose controls for improved egocentric video generation on the Nymeria dataset.

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

A training-free Spatio-Temporal Attention Chain framework accelerates 4D mesh generation 13x, improves quality, scales to 16x longer videos, and supports downstream tracking and camera estimation.

Velox: Learning Representations of 4D Geometry and Appearance

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.

Embody4D: A Generalist Data Engine for Embodied 4D World Modeling

cs.CV · 2026-05-03 · unverdicted · novelty 6.0 · 2 refs

Embody4D generates novel-view videos from monocular robot videos via a 3D-aware synthesis pipeline, confidence-aware expert modulation, and interaction-aware attention for embodied 4D world modeling.

Vista4D: Video Reshooting with 4D Point Clouds

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Vista4D re-synthesizes dynamic videos from new viewpoints by grounding them in a 4D point cloud built with static segmentation and multiview training.

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models

cs.CV · 2025-11-01 · unverdicted · novelty 6.0

A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.

Motion-2-To-3: Leveraging 2D Motion Data for 3D Motion Generations

cs.CV · 2024-12-17 · unverdicted · novelty 6.0

A framework disentangles local joint motion from global movement, trains a 2D local generator on text-2D pairs, then fine-tunes on 3D data to output view-consistent 3D motions.

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

cs.RO · 2026-06-16 · unverdicted · novelty 5.0

PAIWorld adds explicit geometric cross-view mechanisms and 3D distillation to DiT world models to achieve multi-view 3D consistency in robotic manipulation benchmarks.

Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

cs.CV · 2026-06-11 · unverdicted · novelty 5.0

A multi-view video diffusion model conditioned on relative camera poses via extended RoPE generates dense synchronized views from sparse inputs for 4D Gaussian splatting reconstruction, claiming SOTA results on human datasets and generalization to animals.

CP4D: Compositional Physics-aware 4D Scene Generation

cs.CV · 2026-06-08 · unverdicted · novelty 5.0

CP4D generates physically consistent 4D scenes via compositional integration of pre-trained 3D models, hybrid simulator-diffusion motion synthesis, and automated scene composition.

SkelMo: Universal Skeletal Motion Generation for 3D Rigged Shapes

cs.CV · 2026-06-01 · unverdicted · novelty 5.0 · 2 refs

SkelMo introduces a category-agnostic diffusion framework for skeletal motion generation from 2D videos, trained on a new dataset of ~20,000 rigged 3D animations with a structural-semantic injection mechanism.

Efficient 3D Content Reconstruction and Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.

Geometry-aware 4D Video Generation for Robot Manipulation

cs.CV · 2025-07-01 · unverdicted · novelty 5.0

A geometry-aware 4D video generation model trained with cross-view pointmap alignment to produce spatio-temporally consistent future videos from novel viewpoints for robot manipulation.

QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning

cs.GR · 2026-05-16 · unverdicted · novelty 4.0

QuadLink generates anisotropic quad-dominant meshes from point clouds via autoregressive anchor prediction and centroid-conditioned linking, with a Tri-to-Quad data converter and quad-first assembly.

citing papers explorer

Showing 20 of 20 citing papers after filters.

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis cs.CV · 2026-05-16 · unverdicted · none · ref 61
DEVIS-GRPO applies online policy gradients with an accumulative small-to-large view sampling strategy and multi-level rewards to improve trajectory-controlled extreme view video generation, reporting gains on Kubric-4D, iPhone, and DL3DV datasets.
Probing into Camera Control of Video Models cs.CV · 2026-05-14 · unverdicted · none · ref 50
A training-free method reformulates camera control as geometric displacement fields applied via differentiable latent resampling, enabling control and bias probing in video diffusion models.
DreamStereo: Towards Real-Time Stereo Inpainting for HD Videos cs.CV · 2026-04-14 · unverdicted · none · ref 39
DreamStereo uses GAPW, PBDP, and SASI to enable real-time stereo video inpainting at 25 FPS for HD videos by reducing over 70% redundant computation while maintaining quality.
Action Images: End-to-End Policy Learning via Multiview Video Generation cs.CV · 2026-04-07 · unverdicted · none · ref 62
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control cs.CV · 2026-04-07 · unverdicted · none · ref 35
OmniCamera disentangles video content and camera motion for multi-task generation with arbitrary camera control via the OmniCAM hybrid dataset and Dual-level Curriculum Co-Training.
SimWorlds: A Multi-Agent System for Dynamic 3D Scene Creation cs.AI · 2026-07-02 · unverdicted · none · ref 48
SimWorlds presents a multi-agent system with planner-coder-reviewer workflow, layered scene protocol, and runtime inspection tools to create dynamic 4D scenes from text, plus the 4DBuildBench benchmark showing outperformance over baselines.
HAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration cs.CV · 2026-06-26 · unverdicted · none · ref 47
HAT-4D presents an agentic VLM-plus-human-in-the-loop pipeline for monocular 4D multi-object interaction reconstruction and releases the MVOIK-4D benchmark.
TriMotion: Modality-Agnostic Camera Control for Video Generation cs.CV · 2026-06-18 · unverdicted · none · ref 50
TriMotion is a modality-agnostic framework that maps video, pose, and text descriptions of the same camera trajectory into a shared motion embedding space, trained with a new triplet dataset and latent consistency objective, to produce videos that follow the target trajectory.
Streaming Video Generation with Streaming Force Control cs.CV · 2026-06-05 · unverdicted · none · ref 68
StreamForce presents a unified causal model for force-controllable streaming video generation using a new force representation and distillation pipeline, claiming SOTA force adherence and 16.6 FPS performance.
E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control cs.CV · 2026-05-25 · unverdicted · none · ref 65
E³C is a video diffusion model that disentangles persistent 3D scene structure via point-cloud memory from human dynamics via ego-exo pose controls for improved egocentric video generation on the Nymeria dataset.
Fast 4D Mesh Generation by Spatio-Temporal Attention Chains cs.CV · 2026-05-19 · unverdicted · none · ref 85
A training-free Spatio-Temporal Attention Chain framework accelerates 4D mesh generation 13x, improves quality, scales to 16x longer videos, and supports downstream tracking and camera estimation.
Velox: Learning Representations of 4D Geometry and Appearance cs.CV · 2026-05-06 · unverdicted · none · ref 105
Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.
Embody4D: A Generalist Data Engine for Embodied 4D World Modeling cs.CV · 2026-05-03 · unverdicted · none · ref 56 · 2 links
Embody4D generates novel-view videos from monocular robot videos via a 3D-aware synthesis pipeline, confidence-aware expert modulation, and interaction-aware attention for embodied 4D world modeling.
Vista4D: Video Reshooting with 4D Point Clouds cs.CV · 2026-04-23 · unverdicted · none · ref 30
Vista4D re-synthesizes dynamic videos from new viewpoints by grounding them in a 4D point cloud built with static segmentation and multiview training.
PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation cs.RO · 2026-06-16 · unverdicted · none · ref 39
PAIWorld adds explicit geometric cross-view mechanisms and 3D distillation to DiT world models to achieve multi-view 3D consistency in robotic manipulation benchmarks.
Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction cs.CV · 2026-06-11 · unverdicted · none · ref 40
A multi-view video diffusion model conditioned on relative camera poses via extended RoPE generates dense synchronized views from sparse inputs for 4D Gaussian splatting reconstruction, claiming SOTA results on human datasets and generalization to animals.
CP4D: Compositional Physics-aware 4D Scene Generation cs.CV · 2026-06-08 · unverdicted · none · ref 34
CP4D generates physically consistent 4D scenes via compositional integration of pre-trained 3D models, hybrid simulator-diffusion motion synthesis, and automated scene composition.
SkelMo: Universal Skeletal Motion Generation for 3D Rigged Shapes cs.CV · 2026-06-01 · unverdicted · none · ref 32 · 2 links
SkelMo introduces a category-agnostic diffusion framework for skeletal motion generation from 2D videos, trained on a new dataset of ~20,000 rigged 3D animations with a structural-semantic injection mechanism.
Efficient 3D Content Reconstruction and Generation cs.CV · 2026-05-18 · unverdicted · none · ref 286
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning cs.GR · 2026-05-16 · unverdicted · none · ref 111
QuadLink generates anisotropic quad-dominant meshes from point clouds via autoregressive anchor prediction and centroid-conditioned linking, with a Tri-to-Quad data converter and quad-first assembly.

Sv4d: Dynamic 3d content generation with multi-frame and multi-view consistency

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer