DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis

Fang Liu; Huimin Wu; Licheng Jiao; Lingling Li; Qing Li; Yi Zuo

arxiv: 2605.16937 · v1 · pith:HUNVO5KQnew · submitted 2026-05-16 · 💻 cs.CV

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis

Yi Zuo , Huimin Wu , Lingling Li , Fang Liu , Licheng Jiao , Qing Li This is my paper

Pith reviewed 2026-05-19 20:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords extreme view synthesistrajectory-controlled video generationpolicy gradient optimizationaccumulative samplingmulti-level rewardGRPOcamera motion control

0 comments

The pith

Accumulating small camera increments during sampling lets a policy-gradient model handle extreme-view video generation without paired large-motion training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DEVIS-GRPO, an online policy-gradient framework for trajectory-controlled video generation that targets large camera movements where prior methods break down. Its core innovation is the ADEVIS sampling strategy, which builds large-view motions by repeatedly adding small-view increments instead of requiring specially collected paired videos showing the full motion range. This change removes the need for expensive data annotation, increases the variety of trajectories seen during training, and pairs with a multi-level consistency-quality reward to retain only the best generated samples for model updates. Experiments across Kubric-4D, iPhone, and DL3DV datasets report clear gains in standard image-quality metrics such as PSNR, SSIM, and LPIPS over previous approaches.

Core claim

DEVIS-GRPO is presented as the first online policy gradient method for extreme view video generation. It centers on the Accumulative Dynamic Extreme View Synthesis (ADEVIS) sampling strategy that produces large-view camera motions by progressively accumulating small-view increments. The approach improves training efficiency by eliminating the requirement to collect expensive paired large-view videos for warm-starting and increases sampling diversity through flexible trajectory variation, all while using a multi-level consistency-quality reward function to guide optimization.

What carries the argument

The Accumulative Dynamic Extreme View Synthesis (ADEVIS) sampling strategy, which achieves large-view motions by progressively adding small-view camera increments rather than sampling full large motions directly.

If this is right

Training no longer requires collection of expensive paired large-view videos for warm-starting the policy.
Sampling diversity rises because trajectory configurations can be varied flexibly across increments.
A multi-level reward selects high-quality samples for policy updates, supporting stable optimization.
Reported metric gains include 21.57 percent relative PSNR improvement on Kubric-4D in non-occlusion regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The incremental accumulation idea could extend to other controllable generation settings that currently demand large-change paired data.
Reducing annotation costs this way may make trajectory-controlled video models more practical for applications with limited labeled footage.
Combining ADEVIS with alternative base generators or reward formulations offers a direct route for further performance checks.

Load-bearing premise

Progressively accumulating small-view increments reliably yields high-quality large-view motions and added sampling diversity without introducing artifacts that the multi-level reward cannot filter out.

What would settle it

A controlled comparison that trains one model with ADEVIS and a second model with directly collected paired large-view videos, then measures whether extreme-view output quality on held-out tests is statistically equivalent or better for the accumulative version.

Figures

Figures reproduced from arXiv: 2605.16937 by Fang Liu, Huimin Wu, Licheng Jiao, Lingling Li, Qing Li, Yi Zuo.

**Figure 1.** Figure 1: Under extreme viewpoints (large-view camera motions), existing methods suffer from two issues: (1) failure to follow [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The pipeline of our proposed DEVIS-GRPO . We introduce Accumulative Dynamic Extreme View Synthesis (ADE [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization results of different accumulation [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on Kubric-4D. Our proposed DEVIS-GRPO achieves consistency with both reprojected video [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison on Iphone. We compare DEVIS-GRPO with state-of-the-art methods GCD (Van Hoorick et al. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The more visual results on DL3DV. employ the AdamW optimizer, first fine-tuning on Kubric4D for small-view video generation for better domain transferability with a learning rate of 1 × 10−4 , then optimizing with DEVIS-GRPO at a learning rate of 2 × 10−6 . Since sampling videos requires considerable time, we train multiple steps per sampling. Specifically, we sample 32 groups at a time, each containing… view at source ↗

**Figure 6.** Figure 6: In contrast, our DEVIS-GRPO demonstrates superior [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Trajectory-controlled video generation has become essential for controllable video generation. While current methods perform well under small-view camera motions, they degrade significantly with large-view motions. Existing solutions for extreme-view synthesis typically require dedicated video pairs, demanding substantial annotation effort. To address these limitations, we propose Dynamic Extreme VIew Synthesis-GRPO (DEVIS-GRPO), a GRPO-based framework for trajectory-controlled video generation, the first online policy gradient method for extreme view video generation. Central to our approach is a novel sampling strategy: Accumulative Dynamic Extreme VIew Synthesis (ADEVIS), which achieves large-view camera motions by progressively accumulating small-view increments. This method delivers two key advantages: 1) enhanced training efficiency, as it eliminates the need to warm-start the policy model by collecting expensive paired large-view videos, and 2) increased sampling diversity, achieved by flexibly varying trajectory configurations. Finally, we designed a multi-level consistency-quality reward function to select high-quality samples for model optimization. Experiments on the Kubric-4D, iPhone, and DL3DV datasets demonstrate our method's superiority. On Kubric-4D, we achieve relative improvements of 21.57% in PSNR and 7.31% in SSIM over the second-best method in non-occlusion areas. On iPhone, LPIPS is reduced by 18.56%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DEVIS-GRPO shows a workable RL route to extreme-view video by accumulating small increments instead of paired large-motion data, but the accumulation step needs stronger checks for drift.

read the letter

The core move here is treating trajectory-controlled video generation as an online policy optimization problem with GRPO and replacing the usual need for expensive large-view video pairs with an accumulative sampling trick that builds big motions from repeated small ones. That framing plus the multi-level reward for picking good samples is the actual novelty, and the reported numbers on Kubric-4D (21.57% PSNR lift in non-occluded areas) and iPhone (18.56% LPIPS drop) give it some empirical weight over prior baselines. The approach is straightforward enough that it could be reproduced if the sampling and reward details are written clearly. The soft spot is exactly the one the stress-test flags: nothing in the abstract or summary shows that the reward terms catch long-range geometric drift or texture stretching once you stack many small increments. If the consistency checks stay mostly local, the policy can still reinforce subtly broken trajectories, which would undercut both the efficiency claim and the benchmark gains. The datasets are standard, but without ablations on accumulation length or explicit long-range flow/depth terms the results stay harder to trust. This is the kind of paper that matters for people already working on RL-augmented video synthesis or controllable generation for VR and simulation. It is not foundational, but the data-efficiency angle is practical and the experiments look honest enough to be worth referee time. I would send it out for review with a request for those accumulation and reward ablations.

Referee Report

3 major / 2 minor

Summary. The paper proposes DEVIS-GRPO, a GRPO-based online policy gradient framework for trajectory-controlled video generation under extreme views. Its core innovation is the Accumulative Dynamic Extreme View Synthesis (ADEVIS) sampling strategy, which generates large camera motions by progressively summing small-view increments rather than requiring paired large-view training videos. A multi-level consistency-quality reward is used to filter samples for policy updates. Experiments report relative gains of 21.57% PSNR and 7.31% SSIM on Kubric-4D (non-occlusion areas), 18.56% LPIPS reduction on iPhone, and results on DL3DV.

Significance. If the central assumption holds—that ADEVIS accumulation plus the multi-level reward reliably yields artifact-free extreme-view trajectories without paired data—this would offer a practical efficiency gain for RL-based controllable video models by increasing sampling diversity and removing expensive data collection. The online GRPO formulation and incremental sampling are potentially reusable beyond the specific task.

major comments (3)

[Abstract / ADEVIS sampling strategy] Abstract and ADEVIS description: The central claim that progressively accumulating small-view increments produces usable large-view motions without compounding geometric or photometric drift rests on the multi-level reward filtering such errors. However, the abstract provides no indication that the reward includes explicit long-range terms (e.g., accumulated optical-flow consistency or depth alignment over the full trajectory length), so it is unclear whether local per-frame penalties suffice to prevent the policy from being updated on subtly degraded long trajectories.
[Experiments / Kubric-4D evaluation] Experiments section (Kubric-4D results): The reported 21.57% relative PSNR and 7.31% SSIM gains are presented without error bars, number of random seeds, or an ablation isolating the effect of accumulation steps versus reward weighting. This makes it difficult to determine whether the gains are robust or sensitive to post-hoc dataset or hyperparameter choices.
[Method / Multi-level reward] Reward function design: The multi-level consistency-quality reward is asserted to select high-quality samples, yet no quantitative analysis is given on how reward scores correlate with accumulation length or on failure cases where drift occurs but is not penalized. Without this, the efficiency claim (no need for paired large-view data) remains under-supported.

minor comments (2)

[Abstract and title] The acronym expansion 'Dynamic Extreme VIew Synthesis' contains an apparent capitalization/typo ('VIew' instead of 'View') that should be corrected for consistency.
[Introduction] The claim of being 'the first online policy gradient method for extreme view video generation' requires explicit comparison to the most recent RL-for-video-generation works to avoid overstatement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, clarifying aspects of the method and indicating revisions that will be incorporated to strengthen the presentation and empirical support.

read point-by-point responses

Referee: [Abstract / ADEVIS sampling strategy] Abstract and ADEVIS description: The central claim that progressively accumulating small-view increments produces usable large-view motions without compounding geometric or photometric drift rests on the multi-level reward filtering such errors. However, the abstract provides no indication that the reward includes explicit long-range terms (e.g., accumulated optical-flow consistency or depth alignment over the full trajectory length), so it is unclear whether local per-frame penalties suffice to prevent the policy from being updated on subtly degraded long trajectories.

Authors: We appreciate the referee highlighting the need for clearer linkage in the abstract. The ADEVIS strategy relies on incremental accumulation of small motions to limit per-step drift, with the multi-level reward (detailed in Section 3) applying consistency penalties at both local and accumulated scales, including cross-frame optical flow and depth coherence terms over the growing trajectory. Local penalties are thus augmented by these longer-range checks to filter degraded samples before policy updates. To improve clarity, we will revise the abstract to explicitly reference the long-range consistency components within the reward function. revision: yes
Referee: [Experiments / Kubric-4D evaluation] Experiments section (Kubric-4D results): The reported 21.57% relative PSNR and 7.31% SSIM gains are presented without error bars, number of random seeds, or an ablation isolating the effect of accumulation steps versus reward weighting. This makes it difficult to determine whether the gains are robust or sensitive to post-hoc dataset or hyperparameter choices.

Authors: We agree that additional statistical rigor and ablations would better demonstrate robustness. In the revised manuscript, we will report error bars as standard deviations computed over three independent random seeds for the Kubric-4D metrics. We will also add a dedicated ablation subsection that varies the number of accumulation steps in ADEVIS while holding reward weighting fixed (and vice versa) to isolate their individual contributions to the observed gains. revision: yes
Referee: [Method / Multi-level reward] Reward function design: The multi-level consistency-quality reward is asserted to select high-quality samples, yet no quantitative analysis is given on how reward scores correlate with accumulation length or on failure cases where drift occurs but is not penalized. Without this, the efficiency claim (no need for paired large-view data) remains under-supported.

Authors: This comment correctly identifies an opportunity to provide stronger empirical grounding for the reward's role. While the current experiments show end-to-end improvements without paired large-view data, we will enhance the Method and Experiments sections with a quantitative analysis: a table and plot correlating average reward scores against increasing accumulation lengths, plus a discussion of any detected failure modes where subtle drift evaded penalization. These additions will more directly support the efficiency advantage of the ADEVIS approach. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL method with external dataset validation

full rationale

The paper proposes DEVIS-GRPO, a GRPO-based online policy gradient framework, with ADEVIS as a sampling strategy that accumulates small-view increments for large-view motions. This eliminates the need for paired large-view videos and is paired with a multi-level consistency-quality reward. Performance is demonstrated via experiments on independent external datasets (Kubric-4D, iPhone, DL3DV) with reported metrics such as 21.57% relative PSNR improvement. No equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations reduce the claimed advantages to the inputs by construction. The derivation is self-contained as a novel application of RL techniques evaluated against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that incremental accumulation works for extreme views and that the designed reward selects useful samples; no explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Accumulating small-view increments produces valid large-view trajectories suitable for policy optimization without major quality loss.
Directly invoked as the basis for ADEVIS and its two key advantages in the abstract.

pith-pipeline@v0.9.0 · 5789 in / 1323 out tokens · 51445 ms · 2026-05-19T20:50:48.173269+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Central to our approach is a novel sampling strategy: Accumulative Dynamic Extreme View Synthesis (ADEVIS), which achieves large-view camera motions by progressively accumulating small-view increments.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we designed a multi-level consistency-quality reward function to select high-quality samples for model optimization

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 14 internal anchors

[1]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Vbench: Comprehensive benchmark suite for video generative models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[2]

Cami2v: Camera-controlled image-to-video diffusion model,

Cami2v: Camera-controlled image-to-video diffusion model , author =. arXiv preprint arXiv:2410.15957 , year =

work page arXiv
[3]

Lindell and Sergey Tulyakov , booktitle =

Sherwin Bahmani and Ivan Skorokhodov and Aliaksandr Siarohin and Willi Menapace and Guocheng Qian and Michael Vasilkovsky and Hsin-Ying Lee and Chaoyang Wang and Jiaxu Zou and Andrea Tagliasacchi and David B. Lindell and Sergey Tulyakov , booktitle =. 2025 , url =

work page 2025
[4]

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Cameractrl: Enabling camera control for text-to-video generation , author =. arXiv preprint arXiv:2404.02101 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

Kubric: A scalable dataset generator , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page
[6]

British Journal of Surgery , volume =

Unreal Engine 5 and immersive surgical training: translating advances in gaming technology into extended-reality surgical simulation training programmes , author =. British Journal of Surgery , volume =. 2022 , publisher =

work page 2022
[7]

arXiv preprint arXiv:2405.04496 , year =

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing , author =. arXiv preprint arXiv:2405.04496 , year =

work page arXiv
[8]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Mv-dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[9]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Diffusion model alignment using direct preference optimization , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[10]

Advances in neural information processing systems , volume =

Direct preference optimization: Your language model is secretly a reward model , author =. Advances in neural information processing systems , volume =

work page
[11]

Advances in Neural Information Processing Systems , volume =

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models , author =. Advances in Neural Information Processing Systems , volume =

work page
[12]

The Twelfth International Conference on Learning Representations , year =

Training Diffusion Models with Reinforcement Learning , author =. The Twelfth International Conference on Learning Representations , year =

work page
[13]

European Conference on Computer Vision , pages =

Grounding image matching in 3d with mast3r , author =. European Conference on Computer Vision , pages =. 2024 , organization =

work page 2024
[14]

The Twelfth International Conference on Learning Representations , year =

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting , author =. The Twelfth International Conference on Learning Representations , year =

work page
[15]

ACM SIGGRAPH 2024 Conference Papers , pages =

4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes , author =. ACM SIGGRAPH 2024 Conference Papers , pages =

work page 2024
[16]

IEEE Transactions on Geoscience and Remote Sensing , volume =

Robust instance-based semi-supervised learning change detection for remote sensing images , author =. IEEE Transactions on Geoscience and Remote Sensing , volume =. 2024 , publisher =

work page 2024
[17]

Matterport3D: Learning from RGB-D Data in Indoor Environments

Matterport3d: Learning from rgb-d data in indoor environments , author =. arXiv preprint arXiv:1709.06158 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Look outside the room: Synthesizing a consistent long-term 3d scene video from a single image , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[19]

Advances in Neural Information Processing Systems , volume =

Collaborative video diffusion: Consistent multi-video generation with camera control , author =. Advances in Neural Information Processing Systems , volume =

work page
[20]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Yu, Mark and Hu, Wenbo and Xing, Jinbo and Shan, Ying , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

work page 2025
[21]

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Animatediff: Animate your personalized text-to-image diffusion models without specific tuning , author =. arXiv preprint arXiv:2307.04725 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[22]

ACM SIGGRAPH 2024 Conference Papers , pages =

Motionctrl: A unified and flexible motion controller for video generation , author =. ACM SIGGRAPH 2024 Conference Papers , pages =

work page 2024
[23]

Advances in Neural Information Processing Systems , volume =

Epipolar-free 3d gaussian splatting for generalizable novel view synthesis , author =. Advances in Neural Information Processing Systems , volume =

work page
[24]

Proceedings of the IEEE/CVF international conference on computer vision , pages =

Zero-1-to-3: Zero-shot one image to 3d object , author =. Proceedings of the IEEE/CVF international conference on computer vision , pages =

work page
[25]

MVDream: Multi-view Diffusion for 3D Generation

Mvdream: Multi-view diffusion for 3d generation , author =. arXiv preprint arXiv:2308.16512 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Light-x: Generative 4d video render- ing with camera and illumination control.arXiv preprint arXiv:2512.05115, 2025

Light-X: Generative 4D Video Rendering with Camera and Illumination Control , author =. arXiv preprint arXiv:2512.05115 , year =

work page arXiv
[27]

3d scene prompting for scene-consistent camera-controllable video generation.arXiv preprint arXiv:2510.14945, 2025

3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation , author =. arXiv preprint arXiv:2510.14945 , year =

work page arXiv
[28]

European Conference on Computer Vision , pages =

Generative camera dolly: Extreme monocular dynamic novel view synthesis , author =. European Conference on Computer Vision , pages =. 2024 , organization =

work page 2024
[29]

arXiv e-prints , pages =

Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model , author =. arXiv e-prints , pages =

work page
[30]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

Scannet: Richly-annotated 3d reconstructions of indoor scenes , author =. Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

work page
[31]

ACM Transactions on Graphics (ToG) , volume =

Tanks and temples: Benchmarking large-scale scene reconstruction , author =. ACM Transactions on Graphics (ToG) , volume =. 2017 , publisher =

work page 2017
[32]

Stereo Magnification: Learning View Synthesis using Multiplane Images

Stereo magnification: Learning view synthesis using multiplane images , author =. arXiv preprint arXiv:1805.09817 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Bai, Jianhong and Xia, Menghan and Fu, Xiao and Wang, Xintao and Mu, Lianrui and Cao, Jinwen and Liu, Zuozhu and Hu, Haoji and Bai, Xiang and Wan, Pengfei and Zhang, Di , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

work page 2025
[34]

, author =

3D Gaussian splatting for real-time radiance field rendering. , author =. ACM Trans. Graph. , volume =

work page
[35]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

UniDepth: Universal monocular metric depth estimation , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[36]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

Depth anything: Unleashing the power of large-scale unlabeled data , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page
[37]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Met3r: Measuring multi-view consistency in generated images , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[38]

Ex-4d: Extreme viewpoint 4d video synthesis via depth watertight mesh.arXiv preprint arXiv:2506.05554, 2025

EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh , author =. arXiv preprint arXiv:2506.05554 , year =

work page arXiv
[39]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Flovd: Optical flow meets video diffusion model for enhanced camera-controlled video synthesis , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[40]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Ac3d: Analyzing and improving 3d camera control in video diffusion transformers , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[41]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Vggt: Visual geometry grounded transformer , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[42]

GAIA-1: A Generative World Model for Autonomous Driving

Gaia-1: A generative world model for autonomous driving , author =. arXiv preprint arXiv:2309.17080 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[43]

European Conference on Computer Vision , pages =

Waymo open dataset: Panoramic video panoptic segmentation , author =. European Conference on Computer Vision , pages =. 2022 , organization =

work page 2022
[44]

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Videocrafter1: Open diffusion models for high-quality video generation , author =. arXiv preprint arXiv:2310.19512 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[45]

SIGGRAPH Asia 2024 Conference Papers , pages =

Lumiere: A space-time diffusion model for video generation , author =. SIGGRAPH Asia 2024 Conference Papers , pages =

work page 2024
[46]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

4d gaussian splatting for real-time dynamic scene rendering , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page
[47]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

D-nerf: Neural radiance fields for dynamic scenes , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page
[48]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

Mip-nerf 360: Unbounded anti-aliased neural radiance fields , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page
[49]

ACM transactions on graphics (TOG) , volume =

Instant neural graphics primitives with a multiresolution hash encoding , author =. ACM transactions on graphics (TOG) , volume =. 2022 , publisher =

work page 2022
[50]

Communications of the ACM , volume =

Nerf: Representing scenes as neural radiance fields for view synthesis , author =. Communications of the ACM , volume =. 2021 , publisher =

work page 2021
[51]

Forty-second International Conference on Machine Learning , year =

WorldSimBench: Towards Video Generation Models as World Simulators , author =. Forty-second International Conference on Machine Learning , year =

work page
[52]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Stable video diffusion: Scaling latent video diffusion models to large datasets , author =. arXiv preprint arXiv:2311.15127 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[53]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

High-resolution image synthesis with latent diffusion models , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page
[54]

Advances in neural information processing systems , volume =

Denoising diffusion probabilistic models , author =. Advances in neural information processing systems , volume =

work page
[55]

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Sora: A review on background, technology, limitations, and opportunities of large vision models , author =. arXiv preprint arXiv:2402.17177 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[56]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Gen3c: 3d-informed world-consistent video generation with precise camera control , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[57]

Follow-your-creation: Empowering 4d creation through video inpainting.arXiv preprint arXiv:2506.04590, 2025

Follow-Your-Creation: Empowering 4D Creation through Video Inpainting , author =. arXiv preprint arXiv:2506.04590 , year =

work page arXiv
[58]

Trajectory attention for fine-grained video motion control.arXiv preprint arXiv:2411.19324, 2024

Trajectory attention for fine-grained video motion control , author =. arXiv preprint arXiv:2411.19324 , year =

work page arXiv
[59]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

Shape of motion: 4d reconstruction from a single video , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

work page
[60]

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis , year =

Yu, Wangbo and Xing, Jinbo and Yuan, Li and Hu, Wenbo and Li, Xiaoyu and Huang, Zhipeng and Gao, Xiangjun and Wong, Tien-Tsin and Shan, Ying and Tian, Yonghong , journal =. ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis , year =

work page
[61]

Sv4d: Dy- namic 3d content generation with multi-frame and multi-view consistency

Sv4d: Dynamic 3d content generation with multi-frame and multi-view consistency , author =. arXiv preprint arXiv:2407.17470 , year =

work page arXiv
[62]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author =. arXiv preprint arXiv:2402.03300 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[63]

Rlaif: Scaling reinforcement learning from human feedback with ai feedback , author =

work page
[64]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author =. arXiv preprint arXiv:2501.12948 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[65]

Advances in Neural Information Processing Systems , volume =

Imagereward: Learning and evaluating human preferences for text-to-image generation , author =. Advances in Neural Information Processing Systems , volume =

work page
[66]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Depthcrafter: Generating consistent long depth sequences for open-world videos , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[67]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Video depth anything: Consistent depth estimation for super-long videos , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[68]

Advances in Neural Information Processing Systems , volume =

Monocular dynamic view synthesis: A reality check , author =. Advances in Neural Information Processing Systems , volume =

work page
[69]

Jie Liu and Gongye Liu and Jiajun Liang and Yangguang Li and Jiaheng Liu and Xintao Wang and Pengfei Wan and Di ZHANG and Wanli Ouyang , booktitle =. Flow-. 2025 , url =

work page 2025
[70]

DanceGRPO: Unleashing GRPO on Visual Generation

DanceGRPO: Unleashing GRPO on Visual Generation , author =. arXiv preprint arXiv:2505.07818 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[71]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

The unreasonable effectiveness of deep features as a perceptual metric , author =. Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

work page
[72]

Aligning Text-to-Image Models using Human Feedback

Aligning text-to-image models using human feedback , author =. arXiv preprint arXiv:2302.12192 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[73]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[74]

The Thirteenth International Conference on Learning Representations , year =

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer , author =. The Thirteenth International Conference on Learning Representations , year =

work page
[75]

Advances in neural information processing systems , volume =

Training language models to follow instructions with human feedback , author =. Advances in neural information processing systems , volume =

work page
[76]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Dust3r: Geometric 3d vision made easy , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[77]

, author =

Lora: Low-rank adaptation of large language models. , author =. ICLR , volume =

work page
[78]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[79]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Depthsplat: Connecting gaussian splatting and depth , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[1] [1]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Vbench: Comprehensive benchmark suite for video generative models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[2] [2]

Cami2v: Camera-controlled image-to-video diffusion model,

Cami2v: Camera-controlled image-to-video diffusion model , author =. arXiv preprint arXiv:2410.15957 , year =

work page arXiv

[3] [3]

Lindell and Sergey Tulyakov , booktitle =

Sherwin Bahmani and Ivan Skorokhodov and Aliaksandr Siarohin and Willi Menapace and Guocheng Qian and Michael Vasilkovsky and Hsin-Ying Lee and Chaoyang Wang and Jiaxu Zou and Andrea Tagliasacchi and David B. Lindell and Sergey Tulyakov , booktitle =. 2025 , url =

work page 2025

[4] [4]

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Cameractrl: Enabling camera control for text-to-video generation , author =. arXiv preprint arXiv:2404.02101 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

Kubric: A scalable dataset generator , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page

[6] [6]

British Journal of Surgery , volume =

Unreal Engine 5 and immersive surgical training: translating advances in gaming technology into extended-reality surgical simulation training programmes , author =. British Journal of Surgery , volume =. 2022 , publisher =

work page 2022

[7] [7]

arXiv preprint arXiv:2405.04496 , year =

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing , author =. arXiv preprint arXiv:2405.04496 , year =

work page arXiv

[8] [8]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Mv-dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[9] [9]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Diffusion model alignment using direct preference optimization , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[10] [10]

Advances in neural information processing systems , volume =

Direct preference optimization: Your language model is secretly a reward model , author =. Advances in neural information processing systems , volume =

work page

[11] [11]

Advances in Neural Information Processing Systems , volume =

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models , author =. Advances in Neural Information Processing Systems , volume =

work page

[12] [12]

The Twelfth International Conference on Learning Representations , year =

Training Diffusion Models with Reinforcement Learning , author =. The Twelfth International Conference on Learning Representations , year =

work page

[13] [13]

European Conference on Computer Vision , pages =

Grounding image matching in 3d with mast3r , author =. European Conference on Computer Vision , pages =. 2024 , organization =

work page 2024

[14] [14]

The Twelfth International Conference on Learning Representations , year =

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting , author =. The Twelfth International Conference on Learning Representations , year =

work page

[15] [15]

ACM SIGGRAPH 2024 Conference Papers , pages =

4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes , author =. ACM SIGGRAPH 2024 Conference Papers , pages =

work page 2024

[16] [16]

IEEE Transactions on Geoscience and Remote Sensing , volume =

Robust instance-based semi-supervised learning change detection for remote sensing images , author =. IEEE Transactions on Geoscience and Remote Sensing , volume =. 2024 , publisher =

work page 2024

[17] [17]

Matterport3D: Learning from RGB-D Data in Indoor Environments

Matterport3d: Learning from rgb-d data in indoor environments , author =. arXiv preprint arXiv:1709.06158 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Look outside the room: Synthesizing a consistent long-term 3d scene video from a single image , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[19] [19]

Advances in Neural Information Processing Systems , volume =

Collaborative video diffusion: Consistent multi-video generation with camera control , author =. Advances in Neural Information Processing Systems , volume =

work page

[20] [20]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Yu, Mark and Hu, Wenbo and Xing, Jinbo and Shan, Ying , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

work page 2025

[21] [21]

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Animatediff: Animate your personalized text-to-image diffusion models without specific tuning , author =. arXiv preprint arXiv:2307.04725 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

ACM SIGGRAPH 2024 Conference Papers , pages =

Motionctrl: A unified and flexible motion controller for video generation , author =. ACM SIGGRAPH 2024 Conference Papers , pages =

work page 2024

[23] [23]

Advances in Neural Information Processing Systems , volume =

Epipolar-free 3d gaussian splatting for generalizable novel view synthesis , author =. Advances in Neural Information Processing Systems , volume =

work page

[24] [24]

Proceedings of the IEEE/CVF international conference on computer vision , pages =

Zero-1-to-3: Zero-shot one image to 3d object , author =. Proceedings of the IEEE/CVF international conference on computer vision , pages =

work page

[25] [25]

MVDream: Multi-view Diffusion for 3D Generation

Mvdream: Multi-view diffusion for 3d generation , author =. arXiv preprint arXiv:2308.16512 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Light-x: Generative 4d video render- ing with camera and illumination control.arXiv preprint arXiv:2512.05115, 2025

Light-X: Generative 4D Video Rendering with Camera and Illumination Control , author =. arXiv preprint arXiv:2512.05115 , year =

work page arXiv

[27] [27]

3d scene prompting for scene-consistent camera-controllable video generation.arXiv preprint arXiv:2510.14945, 2025

3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation , author =. arXiv preprint arXiv:2510.14945 , year =

work page arXiv

[28] [28]

European Conference on Computer Vision , pages =

Generative camera dolly: Extreme monocular dynamic novel view synthesis , author =. European Conference on Computer Vision , pages =. 2024 , organization =

work page 2024

[29] [29]

arXiv e-prints , pages =

Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model , author =. arXiv e-prints , pages =

work page

[30] [30]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

Scannet: Richly-annotated 3d reconstructions of indoor scenes , author =. Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

work page

[31] [31]

ACM Transactions on Graphics (ToG) , volume =

Tanks and temples: Benchmarking large-scale scene reconstruction , author =. ACM Transactions on Graphics (ToG) , volume =. 2017 , publisher =

work page 2017

[32] [32]

Stereo Magnification: Learning View Synthesis using Multiplane Images

Stereo magnification: Learning view synthesis using multiplane images , author =. arXiv preprint arXiv:1805.09817 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Bai, Jianhong and Xia, Menghan and Fu, Xiao and Wang, Xintao and Mu, Lianrui and Cao, Jinwen and Liu, Zuozhu and Hu, Haoji and Bai, Xiang and Wan, Pengfei and Zhang, Di , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

work page 2025

[34] [34]

, author =

3D Gaussian splatting for real-time radiance field rendering. , author =. ACM Trans. Graph. , volume =

work page

[35] [35]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

UniDepth: Universal monocular metric depth estimation , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[36] [36]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

Depth anything: Unleashing the power of large-scale unlabeled data , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page

[37] [37]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Met3r: Measuring multi-view consistency in generated images , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[38] [38]

Ex-4d: Extreme viewpoint 4d video synthesis via depth watertight mesh.arXiv preprint arXiv:2506.05554, 2025

EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh , author =. arXiv preprint arXiv:2506.05554 , year =

work page arXiv

[39] [39]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Flovd: Optical flow meets video diffusion model for enhanced camera-controlled video synthesis , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[40] [40]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Ac3d: Analyzing and improving 3d camera control in video diffusion transformers , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[41] [41]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Vggt: Visual geometry grounded transformer , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[42] [42]

GAIA-1: A Generative World Model for Autonomous Driving

Gaia-1: A generative world model for autonomous driving , author =. arXiv preprint arXiv:2309.17080 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

European Conference on Computer Vision , pages =

Waymo open dataset: Panoramic video panoptic segmentation , author =. European Conference on Computer Vision , pages =. 2022 , organization =

work page 2022

[44] [44]

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Videocrafter1: Open diffusion models for high-quality video generation , author =. arXiv preprint arXiv:2310.19512 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

SIGGRAPH Asia 2024 Conference Papers , pages =

Lumiere: A space-time diffusion model for video generation , author =. SIGGRAPH Asia 2024 Conference Papers , pages =

work page 2024

[46] [46]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

4d gaussian splatting for real-time dynamic scene rendering , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page

[47] [47]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

D-nerf: Neural radiance fields for dynamic scenes , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page

[48] [48]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

Mip-nerf 360: Unbounded anti-aliased neural radiance fields , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page

[49] [49]

ACM transactions on graphics (TOG) , volume =

Instant neural graphics primitives with a multiresolution hash encoding , author =. ACM transactions on graphics (TOG) , volume =. 2022 , publisher =

work page 2022

[50] [50]

Communications of the ACM , volume =

Nerf: Representing scenes as neural radiance fields for view synthesis , author =. Communications of the ACM , volume =. 2021 , publisher =

work page 2021

[51] [51]

Forty-second International Conference on Machine Learning , year =

WorldSimBench: Towards Video Generation Models as World Simulators , author =. Forty-second International Conference on Machine Learning , year =

work page

[52] [52]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Stable video diffusion: Scaling latent video diffusion models to large datasets , author =. arXiv preprint arXiv:2311.15127 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[53] [53]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

High-resolution image synthesis with latent diffusion models , author =. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages =

work page

[54] [54]

Advances in neural information processing systems , volume =

Denoising diffusion probabilistic models , author =. Advances in neural information processing systems , volume =

work page

[55] [55]

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Sora: A review on background, technology, limitations, and opportunities of large vision models , author =. arXiv preprint arXiv:2402.17177 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[56] [56]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Gen3c: 3d-informed world-consistent video generation with precise camera control , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[57] [57]

Follow-your-creation: Empowering 4d creation through video inpainting.arXiv preprint arXiv:2506.04590, 2025

Follow-Your-Creation: Empowering 4D Creation through Video Inpainting , author =. arXiv preprint arXiv:2506.04590 , year =

work page arXiv

[58] [58]

Trajectory attention for fine-grained video motion control.arXiv preprint arXiv:2411.19324, 2024

Trajectory attention for fine-grained video motion control , author =. arXiv preprint arXiv:2411.19324 , year =

work page arXiv

[59] [59]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

Shape of motion: 4d reconstruction from a single video , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

work page

[60] [60]

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis , year =

Yu, Wangbo and Xing, Jinbo and Yuan, Li and Hu, Wenbo and Li, Xiaoyu and Huang, Zhipeng and Gao, Xiangjun and Wong, Tien-Tsin and Shan, Ying and Tian, Yonghong , journal =. ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis , year =

work page

[61] [61]

Sv4d: Dy- namic 3d content generation with multi-frame and multi-view consistency

Sv4d: Dynamic 3d content generation with multi-frame and multi-view consistency , author =. arXiv preprint arXiv:2407.17470 , year =

work page arXiv

[62] [62]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author =. arXiv preprint arXiv:2402.03300 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[63] [63]

Rlaif: Scaling reinforcement learning from human feedback with ai feedback , author =

work page

[64] [64]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author =. arXiv preprint arXiv:2501.12948 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[65] [65]

Advances in Neural Information Processing Systems , volume =

Imagereward: Learning and evaluating human preferences for text-to-image generation , author =. Advances in Neural Information Processing Systems , volume =

work page

[66] [66]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Depthcrafter: Generating consistent long depth sequences for open-world videos , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[67] [67]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Video depth anything: Consistent depth estimation for super-long videos , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[68] [68]

Advances in Neural Information Processing Systems , volume =

Monocular dynamic view synthesis: A reality check , author =. Advances in Neural Information Processing Systems , volume =

work page

[69] [69]

Jie Liu and Gongye Liu and Jiajun Liang and Yangguang Li and Jiaheng Liu and Xintao Wang and Pengfei Wan and Di ZHANG and Wanli Ouyang , booktitle =. Flow-. 2025 , url =

work page 2025

[70] [70]

DanceGRPO: Unleashing GRPO on Visual Generation

DanceGRPO: Unleashing GRPO on Visual Generation , author =. arXiv preprint arXiv:2505.07818 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[71] [71]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

The unreasonable effectiveness of deep features as a perceptual metric , author =. Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

work page

[72] [72]

Aligning Text-to-Image Models using Human Feedback

Aligning text-to-image models using human feedback , author =. arXiv preprint arXiv:2302.12192 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[73] [73]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[74] [74]

The Thirteenth International Conference on Learning Representations , year =

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer , author =. The Thirteenth International Conference on Learning Representations , year =

work page

[75] [75]

Advances in neural information processing systems , volume =

Training language models to follow instructions with human feedback , author =. Advances in neural information processing systems , volume =

work page

[76] [76]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Dust3r: Geometric 3d vision made easy , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[77] [77]

, author =

Lora: Low-rank adaptation of large language models. , author =. ICLR , volume =

work page

[78] [78]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[79] [79]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Depthsplat: Connecting gaussian splatting and depth , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page