4dnex: Feed-forward 4d generative modeling made easy

Chen, Z · 2025 · arXiv 2508.13154

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2 baseline 1 dataset 1

citation-polarity summary

background 3 baseline 1

representative citing papers

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

cs.CV · 2026-05-03 · unverdicted · novelty 7.0 · 2 refs

M²-REPA decouples modality-specific features from diffusion intermediates and aligns them to complementary expert foundation models via a multi-modal alignment loss and modality-specific decoupling regularization for improved multimodal video generation.

HiReFF: High-Resolution Feedforward Human Reconstruction from Uncalibrated Sparse-View Video

cs.CV · 2026-06-28 · unverdicted · novelty 6.0

HiReFF presents a feed-forward framework for 2K human video reconstruction from uncalibrated sparse-view videos via scale-synchronized calibration, Gaussian masking, and high-resolution side-tuning.

PointAction: 3D Points as Universal Action Representations for Robot Control

cs.RO · 2026-06-02 · unverdicted · novelty 6.0

PointAction uses predicted dynamic 3D pointmaps from fine-tuned video models as an embodiment-agnostic action representation to map video predictions to executable robot actions.

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models

cs.CV · 2025-11-01 · unverdicted · novelty 6.0

A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.

ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.

Syn4D: A Multiview Synthetic 4D Dataset

cs.CV · 2026-05-06 · unverdicted · novelty 5.0

Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

cs.CV · 2026-04-27 · unverdicted · novelty 4.0 · 3 refs

World-R1 applies reinforcement learning via Flow-GRPO and a text dataset to align text-to-video models with 3D constraints from pre-trained foundation models, improving consistency while keeping original visual quality.

citing papers explorer

Showing 7 of 7 citing papers.

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models cs.CV · 2026-05-03 · unverdicted · none · ref 6 · 2 links
M²-REPA decouples modality-specific features from diffusion intermediates and aligns them to complementary expert foundation models via a multi-modal alignment loss and modality-specific decoupling regularization for improved multimodal video generation.
HiReFF: High-Resolution Feedforward Human Reconstruction from Uncalibrated Sparse-View Video cs.CV · 2026-06-28 · unverdicted · none · ref 6
HiReFF presents a feed-forward framework for 2K human video reconstruction from uncalibrated sparse-view videos via scale-synchronized calibration, Gaussian masking, and high-resolution side-tuning.
PointAction: 3D Points as Universal Action Representations for Robot Control cs.RO · 2026-06-02 · unverdicted · none · ref 17
PointAction uses predicted dynamic 3D pointmaps from fine-tuned video models as an embodiment-agnostic action representation to map video predictions to executable robot actions.
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models cs.CV · 2025-11-01 · unverdicted · none · ref 11
A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation cs.CV · 2026-05-08 · unverdicted · none · ref 60
ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
Syn4D: A Multiview Synthetic 4D Dataset cs.CV · 2026-05-06 · unverdicted · none · ref 21
Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation cs.CV · 2026-04-27 · unverdicted · none · ref 13 · 3 links
World-R1 applies reinforcement learning via Flow-GRPO and a text dataset to align text-to-video models with 3D constraints from pre-trained foundation models, improving consistency while keeping original visual quality.

4dnex: Feed-forward 4d generative modeling made easy

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer