Quantitative Video World Model Evaluation for Geometric-Consistency

Jiaxin Wu; Xueyan Zou; Yihao Pi; Yinling Zhang; Yuheng Li

arxiv: 2605.15185 · v1 · pith:HDPXO5DKnew · submitted 2026-05-14 · 💻 cs.CV · cs.AI

Quantitative Video World Model Evaluation for Geometric-Consistency

Jiaxin Wu , Yihao Pi , Yinling Zhang , Yuheng Li , Xueyan Zou This is my paper

Pith reviewed 2026-05-15 03:06 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords video generationgeometric consistencyworld models3D reconstructionevaluation metricsperspective distortionmotion consistencystructural rigidity

0 comments

The pith

PDI-Bench quantifies geometric coherence in generated videos by measuring projective residuals from 3D lifts of tracked points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PDI-Bench as a quantitative framework to audit whether generative video models produce consistent 3D structure and motion. It segments objects, tracks points across frames, lifts them to world-space coordinates via monocular reconstruction, and computes residuals that capture scale-depth alignment, motion consistency, and structural rigidity. These signals expose specific geometric failures across current models that perceptual quality metrics overlook. A sympathetic reader would care because the method supplies an objective diagnostic for treating video generators as physical world models rather than just image synthesizers. The accompanying PDI-Dataset stresses these constraints across varied scenarios to enable systematic comparison.

Core claim

Given a generated video clip, object-centric observations are obtained via segmentation and point tracking, then lifted to 3D world-space coordinates via monocular reconstruction; a set of projective-geometry residuals is computed to quantify three failure dimensions: scale-depth alignment, 3D motion consistency, and 3D structural rigidity. Across state-of-the-art generators this index reveals consistent geometry-specific failure modes invisible to common perceptual metrics and supplies a diagnostic signal for progress toward physically grounded video generation.

What carries the argument

The Perspective Distortion Index (PDI), which aggregates projective-geometry residuals computed on 3D world coordinates lifted from segmented and tracked points to measure scale-depth alignment, motion consistency, and structural rigidity.

If this is right

Video generators can be ranked and improved by targeting measurable failures in scale consistency, motion trajectories, and rigidity instead of relying solely on visual appeal.
Training loops gain an objective gradient signal for enforcing projective constraints that current perceptual losses do not provide.
Evaluation of implicit world models shifts from subjective human ratings to repeatable 3D residual measurements across controlled datasets.
Models that reduce PDI scores on the benchmark are expected to produce outputs more suitable for downstream tasks requiring spatial reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If PDI scores improve over time while perceptual metrics plateau, the field may be making genuine progress on physical plausibility even when human raters notice little change.
PDI could be extended to multi-view or stereo video inputs to cross-validate the monocular reconstruction step and reduce its influence on the final score.
Combining PDI with existing 2D metrics might yield a composite benchmark that better predicts performance in robotics simulation or planning applications.

Load-bearing premise

Monocular 3D reconstruction from the generated video produces accurate enough world-space coordinates to reveal the generator's own geometric errors rather than injecting reconstruction artifacts.

What would settle it

Generate videos with deliberately perfect 3D geometry using known camera paths and rigid objects, run the full PDI pipeline including monocular lift, and verify whether the index scores remain near zero; persistently high scores on perfect inputs would falsify the claim that PDI isolates generator errors.

read the original abstract

Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and weakly diagnostic for geometric failures. We introduce PDI-Bench (Perspective Distortion Index), a quantitative framework for auditing geometric coherence in generated videos. Given a generated clip, we obtain object-centric observations via segmentation and point tracking (e.g., SAM 2, MegaSaM, and CoTracker3), lift them to 3D world-space coordinates via monocular reconstruction, and compute a set of projective-geometry residuals capturing three failure dimensions: scale-depth alignment, 3D motion consistency, and 3D structural rigidity. To support systematic evaluation, we build PDI-Dataset, covering diverse scenarios designed to stress these geometric constraints. Across state-of-the-art video generators, PDI reveals consistent geometry-specific failure modes that are not captured by common perceptual metrics, and provides a diagnostic signal for progress toward physically grounded video generation and physical world model. Our code and dataset can be found at https://pdi-bench.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PDI-Bench adds a concrete geometric diagnostic for video generators via 3D projective residuals, but its signal may be entangled with errors from the monocular reconstructor.

read the letter

The main point is that this paper defines PDI-Bench as a set of residuals computed after lifting tracked points from generated videos into 3D using monocular methods like MegaSaM. It targets three specific issues—scale-depth alignment, motion consistency, and structural rigidity—and pairs the metric with a new dataset of stressing scenarios. They show that current generators produce consistent failures on these checks that standard perceptual scores miss, and they release the code and data.

Referee Report

2 major / 1 minor

Summary. The paper introduces PDI-Bench, a quantitative framework for auditing geometric coherence in generated videos. Given a generated clip, it applies segmentation and point tracking (SAM 2, MegaSaM, CoTracker3), lifts observations to 3D world-space coordinates via monocular reconstruction, and computes projective-geometry residuals across three dimensions: scale-depth alignment, 3D motion consistency, and 3D structural rigidity. It also releases PDI-Dataset covering diverse scenarios and reports that, across state-of-the-art video generators, PDI exposes geometry-specific failure modes not captured by common perceptual metrics, offering a diagnostic signal for physically grounded video generation.

Significance. If the residuals can be shown to be dominated by generator-induced geometric errors rather than upstream reconstruction artifacts, PDI-Bench would supply an objective, geometry-specific complement to existing perceptual and human-judgment metrics, directly supporting evaluation of video models as implicit world models.

major comments (2)

[Abstract] Abstract and evaluation description: the central claim that PDI residuals diagnose generator failures requires evidence that monocular lifting (MegaSaM) produces sufficiently accurate 3D coordinates on generated video; no quantitative validation (e.g., reconstruction error on synthetic ground-truth video, ablation swapping the reconstructor, or correlation with known geometric perturbations) is supplied, leaving open the possibility that residuals are confounded by reconstruction priors on inconsistent lighting, texture, or motion patterns typical of generated content.
[Methods] Methods / PDI-Dataset construction: the three residual definitions (scale-depth alignment, 3D motion consistency, 3D structural rigidity) are derived directly from projective geometry applied to tracked points; without an explicit isolation experiment or ground-truth comparison, it remains unclear whether the reported failure modes are load-bearing for the generator or artifacts of the monocular pipeline.

minor comments (1)

[Abstract] The abstract states that code and dataset are available at https://pdi-bench.github.io/; the manuscript should include a brief reproducibility checklist (exact versions of SAM 2, MegaSaM, CoTracker3, and any post-processing steps) to allow independent verification of the residual computations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need to validate the monocular reconstruction step and isolate generator effects in PDI-Bench. We address each major comment below and will incorporate additional experiments and clarifications in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and evaluation description: the central claim that PDI residuals diagnose generator failures requires evidence that monocular lifting (MegaSaM) produces sufficiently accurate 3D coordinates on generated video; no quantitative validation (e.g., reconstruction error on synthetic ground-truth video, ablation swapping the reconstructor, or correlation with known geometric perturbations) is supplied, leaving open the possibility that residuals are confounded by reconstruction priors on inconsistent lighting, texture, or motion patterns typical of generated content.

Authors: We agree that explicit validation of the monocular lifting on generated content is essential to support the central claim. While the manuscript employs established state-of-the-art methods (MegaSaM for reconstruction alongside SAM 2 and CoTracker3), we acknowledge the absence of dedicated quantitative checks in the current version. In the revision we will add a new validation subsection that (i) measures reconstruction error on synthetic ground-truth videos with known 3D geometry, (ii) performs an ablation by swapping the reconstructor, and (iii) correlates PDI residuals against controlled geometric perturbations injected into otherwise consistent clips. These experiments will demonstrate that the reported residuals are dominated by generator-induced inconsistencies rather than upstream reconstruction artifacts. revision: yes
Referee: [Methods] Methods / PDI-Dataset construction: the three residual definitions (scale-depth alignment, 3D motion consistency, 3D structural rigidity) are derived directly from projective geometry applied to tracked points; without an explicit isolation experiment or ground-truth comparison, it remains unclear whether the reported failure modes are load-bearing for the generator or artifacts of the monocular pipeline.

Authors: The three residual definitions follow directly from projective geometry and are therefore independent of any particular reconstruction implementation. Nevertheless, we recognize the value of explicit isolation. In the revised manuscript we will include ground-truth comparison experiments using rendered videos that provide perfect 3D structure and motion; PDI scores will be computed both on the original renders and on versions with controlled generator-like perturbations. We will also report results across multiple reconstructors and trackers to confirm that the observed failure modes persist and are attributable to the video generators rather than the analysis pipeline. revision: yes

Circularity Check

0 steps flagged

No significant circularity in PDI derivation

full rationale

The paper defines PDI-Bench by lifting tracked points from generated video via external monocular reconstruction (MegaSaM, SAM 2, CoTracker3) then computing direct projective-geometry residuals on scale-depth alignment, 3D motion consistency, and structural rigidity. These residuals follow from standard projective constraints applied to the lifted coordinates; no equations, parameters, or self-citations reduce the reported values to quantities fitted on the same evaluation videos. The central claim therefore rests on an independent geometric calculation rather than tautological re-expression of inputs. This is the expected non-circular outcome for a metric constructed from first-principles geometry.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the framework assumes that off-the-shelf segmentation (SAM 2), tracking (CoTracker3), and monocular reconstruction (MegaSaM) produce 3D coordinates accurate enough to expose generator failures. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption Monocular depth and point tracking tools yield sufficiently accurate 3D world coordinates for the purpose of measuring geometric inconsistency
Invoked when lifting 2D observations to 3D and computing projective residuals

pith-pipeline@v0.9.0 · 5509 in / 1291 out tokens · 46999 ms · 2026-05-15T03:06:28.592571+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

we lift them to 3D world-space coordinates via monocular reconstruction, and compute a set of projective-geometry residuals capturing three failure dimensions: scale-depth alignment, 3D motion consistency, and 3D structural rigidity
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ℎₜ ⋅ 𝑍ₜ = 𝑓 ⋅ 𝐻 = Constant

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 12 internal anchors

[1]

Moving Off-the-Grid: Scene-Grounded Video Representations

K. Allen, C. Doersch, G. Zhou, M. Suhail, D. Driess, I. Rocco, Y. Rubanova, T. Kipf, M. S. M. Sajjadi, K. Murphy, J. Carreira, and S. van Steenkiste. Direct motion models for assessing generated videos,

work page
[2]

URLhttps://arxiv.org/abs/2505.00209

work page arXiv
[3]

M. Asim, C. Wewer, T. Wimmer, B. Schiele, and J. E. Lenssen. Met3r: Measuring multi-view consistency in generated images, 2026. URLhttps://arxiv.org/abs/2501.06336

work page arXiv 2026
[4]

VideoPhy: Evaluating Physical Commonsense for Video Generation

H. Bansal, Z. Lin, T. Xie, Z. Zong, M. Yarom, Y. Bitton, C. Jiang, Y. Sun, K.-W. Chang, and A. Grover. Videophy: Evaluating physical commonsense for video generation, 2024. URLhttps://arxiv. org/abs/2406.03520

work page internal anchor Pith review arXiv 2024
[5]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

A. Blattmann, T. Dockhorn, S. Kulal, D. Mendelevitch, M. Kilian, D. Lorenz, Y. Levi, Z. English, V. Voleti, A. Letts, V. Jampani, and R. Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. URLhttps://arxiv.org/abs/2311.15127

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

The Emotion Recognition Triathlon: DeepSeek vs. ChatGPT vs. Doubao

ByteDance. Doubao: A family of large language models. https://www.volcengine.com/ product/doubao, 2026. Accessed: 2026-05-06

work page 2026
[7]

[xXx]sex!+video+)* www sex videos com xxx sex videos

ByteDance. Seedance 2.0 fast: High-efficiency video generation foundation model.https://www. doubao.com/, 2026. Accessed: 2026-04-19

work page 2026
[8]

W. Chow, J. Mao, B. Li, D. Seita, V. Guizilini, and Y. Wang. Physbench: Benchmarking and enhancing vision-language models for physical world understanding, 2025. URLhttps://arxiv.org/abs/ 2501.16411

work page arXiv 2025
[9]

Worldscore: A unified evaluation benchmark for world generation.arXiv preprint arXiv:2504.00983, 2025

H. Duan, H.-X. Yu, S. Chen, L. Fei-Fei, and J. Wu. Worldscore: A unified evaluation benchmark for world generation, 2025. URLhttps://arxiv.org/abs/2504.00983

work page arXiv 2025
[10]

A Comparative Study of Prompt Engineering Techniques for Consistent AI Image Generation Across Google Gemini, Google Flow, and Freepik Spaces

Google. Flow: Where the next wave of storytelling happens.https://labs.google/fx/tools/ flow, 2026. Accessed: 2026-03-04

work page 2026
[11]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models, 2020. URL https: //arxiv.org/abs/2006.11239

work page internal anchor Pith review Pith/arXiv arXiv 2020
[12]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

X. Huang, Z. Li, G. He, M. Zhou, and E. Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion, 2025. URLhttps://arxiv.org/abs/2506.08009

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Huang et al

Z. Huang, Y. He, J. Yu, F. Zhang, C. Si, Y. Jiang, Y. Zhang, T. Wu, Q. Jin, N. Chanpaisit, Y. Wang, X. Chen, L. Wang, D. Lin, Y. Qiao, and Z. Liu. Vbench: Comprehensive benchmark suite for video generative models, 2023. URLhttps://arxiv.org/abs/2311.17982

work page arXiv 2023
[14]

Cotracker3: Simpler and better point tracking by pseudo-labelling real videos

N. Karaev, I. Makarov, J. Wang, N. Neverova, A. Vedaldi, and C. Rupprecht. Cotracker3: Simpler and better point tracking by pseudo-labelling real videos, 2024. URLhttps://arxiv.org/abs/ 2410.11831

work page arXiv 2024
[15]

W. Kong, Q. Tian, Z. Zhang, R. Min, Z. Dai, J. Zhou, J. Xiong, X. Li, B. Wu, J. Zhang, K. Wu, Q. Lin, J. Yuan, Y. Long, A. Wang, A. Wang, C. Li, D. Huang, F. Yang, H. Tan, H. Wang, J. Song, J. Bai, J. Wu, J. Xue, J. Wang, K. Wang, M. Liu, P. Li, S. Li, W. Wang, W. Yu, X. Deng, Y. Li, Y. Chen, Y. Cui, Y. Peng, 13 Z. Yu, Z. He, Z. Xu, Z. Zhou, Z. Xu, Y. Tao...

work page
[16]

URLhttps://arxiv.org/abs/2412.03603

work page internal anchor Pith review Pith/arXiv arXiv
[17]

D. Li, Y. Fang, Y. Chen, S. Yang, S. Cao, J. Wong, M. Luo, X. Wang, H. Yin, J. E. Gonzalez, I. Stoica, S. Han, and Y. Lu. Worldmodelbench: Judging video generation models as world models, 2025. URLhttps://arxiv.org/abs/2502.20694

work page arXiv 2025
[18]

Z. Li, R. Tucker, F. Cole, Q. Wang, L. Jin, V. Ye, A. Kanazawa, A. Holynski, and N. Snavely. Megasam: Accurate, fast, and robust structure and motion from casual dynamic videos, 2024. URLhttps: //arxiv.org/abs/2412.04463

work page arXiv 2024
[19]

Y. Liu, K. Zhang, Y. Li, Z. Yan, C. Gao, R. Chen, Z. Yuan, Y. Huang, H. Sun, J. Gao, L. He, and L. Sun. Sora: A review on background, technology, limitations, and opportunities of large vision models,

work page
[20]

URLhttps://arxiv.org/abs/2402.17177

work page internal anchor Pith review Pith/arXiv arXiv
[21]

F. Meng, J. Liao, X. Tan, W. Shao, Q. Lu, K. Zhang, Y. Cheng, D. Li, Y. Qiao, and P. Luo. Towards world simulator: Crafting physical commonsense-based benchmark for video generation, 2024. URLhttps://arxiv.org/abs/2410.05363

work page internal anchor Pith review arXiv 2024
[22]

OpenAI Sora: Generate Impressive Videos with Text Instructions

OpenAI. Sora: Creating video from text.https://openai.com/sora, 2025. Accessed: 2026- 03-20

work page 2025
[23]

Learning Transferable Visual Models From Natural Language Supervision

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, 2021. URLhttps://arxiv.org/abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[24]

N. Ravi, V. Gabeur, Y.-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V. Alwala, N. Carion, C.-Y. Wu, R. Girshick, P. Dollár, and C. Feichtenhofer. Sam 2: Segment anything in images and videos, 2024. URLhttps://arxiv.org/abs/2408.00714

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Improved Techniques for Training GANs

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans, 2016. URLhttps://arxiv.org/abs/1606.03498

work page internal anchor Pith review Pith/arXiv arXiv 2016
[26]

K. Sun, K. Huang, X. Liu, Y. Wu, Z. Xu, Z. Li, and X. Liu. T2v-compbench: A comprehensive benchmark for compositional text-to-video generation, 2025. URLhttps://arxiv.org/abs/ 2407.14505

work page arXiv 2025
[27]

Towards Accurate Generative Models of Video: A New Metric & Challenges

T. Unterthiner, S. van Steenkiste, K. Kurach, R. Marinier, M. Michalski, and S. Gelly. Towards accurate generative models of video: A new metric & challenges, 2019. URLhttps://arxiv. org/abs/1812.01717

work page internal anchor Pith review Pith/arXiv arXiv 2019
[28]

de Melo, and Achuta Kadambi

R. Upadhyay, H. Zhang, J. Solomon, A. Agrawal, P. Boreddy, S. S. Narayana, Y. Ba, A. Wong, C. M. de Melo, and A. Kadambi. Worldbench: Disambiguating physics for diagnostic evaluation of world models, 2026. URLhttps://arxiv.org/abs/2601.21282

work page arXiv 2026
[29]

T. Wan, A. Wang, B. Ai, B. Wen, C. Mao, C.-W. Xie, D. Chen, F. Yu, H. Zhao, J. Yang, J. Zeng, J. Wang, J. Zhang, J. Zhou, J. Wang, J. Chen, K. Zhu, K. Zhao, K. Yan, L. Huang, M. Feng, N. Zhang, P. Li, P. Wu, R. Chu, R. Feng, S. Zhang, S. Sun, T. Fang, T. Wang, T. Gui, T. Weng, T. Shen, W. Lin, W. Wang, W. Wang, W. Zhou, W. Wang, W. Shen, W. Yu, X. Shi, X....

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Video for A Large-Scale Empirical Study of COVID-19 Themed GitHub Repositories

Wan-Video. Wan2.2: Wan: Open and advanced large-scale video generative models.https: //github.com/Wan-Video/Wan2.2, 2025. GitHub repository

work page 2025
[31]

B. Xiao, H. Wu, W. Xu, X. Dai, H. Hu, Y. Lu, M. Zeng, C. Liu, and L. Yuan. Florence-2: Advancing a unified representation for a variety of vision tasks, 2023. URLhttps://arxiv.org/abs/2311. 06242

work page 2023
[32]

Full Issue Download Vol. 13 No. 1 2021 The Importance of the Measurement Infrastructure in Economic Recovery from the COVID-19 Pandemic Richard J. C. Brown , Fiona Auty, Eugenio Renedo, Mike King NCSLI Measure | Vol. 13 No. 1 (2021) | doi.org/10.51843/measure.13.1.1 Publisher NCSL International | Published February 2021 | Pages 18-21 Abstract: This paper describes the many, evidenced-based benefits to the economy of a well-developed measurement infrastructure. In particular, it explains how assuring confidence in measurement may be used to accelerate economic recovery from the COVID-19 pandemic including in emerging sectors such as the digital economy. Recommendations are made for providing near term support for national economic recovery whilst also demonstrating the advantages of sustained development of the measurement infrastructure in the medium-term to maximize the potential of future innovative and disruptive technologies. These recommendations, whilst focused on consideration of the UK, should apply globally. References: [1] G. Tassey, "Underinvestment in public good technologies," J Technol. Transfer, Vol. 30, pp. 89-113, 2004. https://doi.org/10.1007/s10961-004-4360-0 [2] M. King, and E. Renedo, "Achieving the 2.4% GDP target: The role of measurement in increasing investment in R&D and innovation," NPL Report IEA 3, NPL, Teddington, UK, March 2020. [3] M. King and G. Tellett, "The National Measurement System: A Customer Survey for Three of the Core Labs in the National Measurement System," NMS Customer Survey Report 2018, NPL Teddington, UK, April 2020 [4] H. Kunzmann, T. Pfeifer, R. Schmitt, H. Schwenke, and A.Weckenmann, "Productive metrology-adding value to manufacture," CIRP Annals, vol. 54, pp. 155-168, 2005. https://doi.org/10.1016/S0007-8506(07)60024-9 [5] N. G. Orji, R. G. Dixson, A. Cordes, B. D. Bunday, and J. A. Allgair, "Measurement traceability and quality assurance in a nanomanufacturing environment," Instrumentation, Metrology, and Standards for Nanomanufacturing III, Proceedings Vol. 7405, 740505, August 2009. https://doi.org/10.1117/12.826606 [6] Belmana, Analysis for Policy "Public Support for Innovation and Business Outcomes," Belmana: London, UK, 2020. [7] R. Hawkins, Standards, systems of innovation and policy in Handbook of Innovation and Standards. Cheltenham, UK: Edward Elgar, 2019. [8] N. Nwaigbo, and M. King, "Evaluating the Impact of the NMS Consultancy Projects on Supported Firms (Working Paper)" NPL, Teddington, UK, 2020. [9] M. King, R. Lambert, and P. Temple, Measurement, standards and productivity spillovers in Handbook of Innovation and Standards. Cheltenham, UK: Edward Elgar, 2017, p. 162. https://doi.org/10.4337/9781783470082.00016 [10] A. Font, K. de Hoogh, M. Leal-Sanchez, D. C. Ashworth, R. J. C. Brown, A. L. Hansell, and G. W. Fuller, "Using metal ratios to detect emissions from municipal waste incinerators in ambient air pollution data," Atmos. Environ., vol. 113, pp. 177-186, July 2015. https://doi.org/10.1016/j.atmosenv.2015.05.002 [11] S. Giannis, M. R. L. Gower, G. D. Sims, G. Pask, and G. Edwards, "Increasing UK competitiveness by enhancing the composite materials regulatory infrastructure," NPL Report MAT 90, NPL, Teddington, UK, October 2019. [12] HM Government, UK Research and Development Roadmap, BEIS, London, July 2020. [13] M. R. Mehra, S. S. Desai, F. Ruschitzka, and A. N. Patel, "Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis," Lancet, 2020, https://doi.org/10.1016/S0140-6736(20)31180-6 (Print: ISSN 1931-5775) (Online: ISSN 2381-0580) ©2021 NCSL International Smart Power Supply Calibration System Iraj Vasaeli , Brandon Umansky NCSLI Measure | Vol. 13 No. 1 (2021) | doi.org/10.51843/measure.13.1.2 Publisher: NCSL International | Published February 2021 | Pages 22-27 Abstract: This paper details the development of an automated procedure to conduct calibrations of power supplies at Jet Propulsion Laboratory, California Institute of Technology (JPL). The fundamentals of power supply calibrations are given, and discussion on the method by which this custom software handles that calibration. Additionally, this technique provides real time uncertainty quantification of the calibrations. This automated system has demonstrated a time savings over existing automated techniques in use today. References: [1] Keysight, "Low-Profile Modular Power System Series N6700 Service Guide", Part Number: 5969 2938, Edition 7, January 2015. [2] B. N. Taylor and C. E. Kuyatt, "Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results", NIST Technical Note 1297, 1994. https://doi.org/10.6028/NIST.TN.1297 [3] JCGM, "Evaluation of measurement data - Guide to the expression of uncertainty in measurement," first edition (GUM 1995 with minor corrections)," JCGM 100, 2008. (Print: ISSN 1931-5775) (Online: ISSN 2381-0580) © 2021 NCSL International Computer Aided Verification of Voltage Dips and Short Interruption Generators for Electromagnetic Compatibility Immunity Test in Accordance with IEC 61000-4-11: 2004 + AMD: 2017 Hau Wah Lai , Cho Man Tsui , Hing Wah Li NCSLI Measure | Vol. 13 No. 1 (2021) | doi.org/10.51843/measure.13.1.3 Publisher: NCSL International | Published February 2021 | Pages 28-39 Abstract: This paper describes a procedure and a computer-aided system developed by the Standards and Calibration Laboratory (SCL) for verification of voltage dip and short interruption generators in accordance with the international standard IEC 61000-4-11:2004+AMD1:2017. The verification is done by calibrating the specified parameters and comparing with the requirements stated in the standard. The parameters that should be calibrated are the ratios of the residual voltages to the rated voltage, the accuracy of the phase angle at switching, and the rise time, fall time, overshoot and undershoot of the switching waveform. A specially built adapter is used to convert the high voltage output waveforms of the generators to lower level signals to be acquired by a digital oscilloscope. The other circuits required for the testing are also provided. In addition, the paper discusses the uncertainty evaluations for the measured parameters. References: [1] T. Williams, and K. Armstrong, "EMC for Systems and Installations Part 6 - Low-Frequency Magnetics Fields (Emissions and Immunity) Mains Dips, Dropouts, Interruptions, Sags, Brownouts and Swells," EMC Compliance Journal, August 2000. [2] M.I. Montrose, and E. M. Nakauchi, Testing for EMC Compliance: Approaches and Techniques, Wiley Interscience, 2004. https://doi.org/10.1002/047164465X [3] International Standard IEC 61000-4-11:2004+AMD1:2017:Electromagnetic Compatibility (EMC) Part 4-11: Testing and measurement techniques - Voltage dips, short interruptions and voltage variations immunity tests. [4] Evaluation of measurement data - Guide to the expression of uncertainty in measurement, First Edition JCGM 100:2008. (Print: ISSN 1931-5775) (Online: ISSN 2381-0580) © 2021 NCSL International Validation of the Photometric Method Used for Micropipette Calibration Elsa Batista , Isabel Godinho, George Rodrigues, Doreen Rumery NCSLI Measure | Vol. 13 No. 1 (2021) | doi.org/10.51843/measure.13.1.4 Publisher: NCSL International | Published February 2021 | Pages 40-45 Abstract: There are two methods generally used for calibration of micropipettes: the gravimetric method described in ISO 8655-6:2002 and the photometric method described in ISO 8655-7:2005. In order to validate the photometric method, several micropipettes of different capacities from 0.1 µL to 1000 µL were calibrated using both methods (gravimetric and photometric) in two different laboratories, IPQ (Portuguese Institute for Quality) and Artel. These tests were performed by six different operators. The uncertainty for both methods was determined and it was verified that the uncertainty component that has a higher contribution to the final uncertainty budget depends on the volume delivered. In the photometric method for small volumes, the repeatability of the pipette is the largest uncertainty component, but for volumes, larger than 100 µL, the photometric instrument is the most significant source of uncertainty. Based on all the results obtained with this study, one may consider the photometric method validated. References: [1] ISO 8655-1/2/6/7, Piston-operated volumetric apparatus, 2002. [2] BIPM, International Vocabulary of Metrology, 3rd edition, JCGM 200:2012. [3] George Rodrigues, Bias and transferability in standards methods of pipette calibration, Artel, June 2003. [4] Taylor, et.al. The definition of primary method of measurement (PMM) of the 'highest metrological quality': a challenge in understanding and communication, Accred. Qual.Assur (2001) 6:103-106. https://doi.org/10.1007/PL00010444 [5] EURAMET project 1353, Volume comparison on Calibration of micropipettes - Gravimetric and photometric methods. [6] ASTM E542: Standard Practice for Calibration of laboratory Volumetric Apparatus, 2000. [7] ISO 4787; Laboratory glassware - Volumetric glassware - Methods for use and testing of capacity, 2010 . [8] ISO 13528:2005 - Statistical methods used in proficiency testing by interlaboratory comparisons. [9] BIPM et al, Guide to the Expression of Uncertainty in Measurement (GUM), 2nd ed., International Organization for Standardization, Genève, 1995. [10] EURAMET guide, cg 19, - Guidelines on the determination of uncertainty in gravimetric volume calibration, version 3.0, 2012. [11] E. Batista et all, A Study of Factors that Influence Micropipette Calibrations, Measure Vol. 10 No. 1, 2015 https://doi.org/10.1080/19315775.2015.11721717 [12] www.BIPM.org. (Print: ISSN 1931-5775) (Online: ISSN 2381-0580) © 2021 NCSL International Material Flow Rate Estimation in Material Extrusion Additive Manufacturing G. P. Greeff NCSLI Measure | Vol. 13 No. 1 (2021) | doi.org/10.51843/measure.13.1.5 Publisher: NCSL International | Published February 2021 | Pages 46-56 Abstract: The additive manufacturing of products promises exciting possibilities. Measurement methodologies, which measure an in-process dataset of these products and interpret the results, are essential. However, before developing such a level of quality assurance several in-process measurands must be realized. One of these is the material flow rate, or rate of adding material during the additive manufacturing process. Yet, measuring this rate directly in material extrusion additive manufacturing presents challenges. This work presents two indirect methods to estimate the volumetric flow rate at the liquefier exit in material extrusion, specifically in Fused Deposition Modeling or Fused Filament Fabrication. The methods are cost effective and may be applied in future sensor integration. The first method is an optical filament feed rate and width measurement and the second is based on the liquefier pressure. Both are used to indirectly estimate the volumetric flow rate. The work also includes a description of linking the G-code command to the final print result, which may be used to create a per extrusion command model of the part. References: [1] T. Wohlers, I. Campbell, O. Diegel, J. Kowen, I. Fidan, and D.L. Bourell, "Wohlers Report 2017: 3D Printing and Additive Manufacturing State of the Industry Annual Worldwide Progress Report," 2017. [2] Additive manufacturing -- General principles -- Terminology. Geneva, CH: International Organization for Standardization, 2015. [3] R. Jones et al., "Reprap - The replicating rapid prototyper," Robotica, vol. 29, no. 1 SPEC. ISSUE, pp. 177-191, 2011, https://doi.org/10.1017/S026357471000069X [4] T. Wohlers and T. Gornet, "History of Additive Manufacturing 2017," 2017. [5] S. A. M. Tofail, E. P. Koumoulos, A. Bandyopadhyay, S. Bose, L. O'Donoghue, and C. Charitidis, "Additive manufacturing: scientific and technological challenges, market uptake and opportunities, "Materials Today, vol. 21, no. 1, pp. 22-37, Jan. 2018, https://doi.org/10.1016/j.mattod.2017.07.001 [6] G. Moroni and S. Petrò, "Managing uncertainty in the new manufacturing era," Procedia CIRP, vol. 75, pp. 1-2, 2018, https://doi.org/10.1016/j.procir.2018.07.001 [7] R. Leach et al., "Information-rich manufacturing metrology,"in Eighth International Precision Assembly Seminar (IPAS), 2018, no. January. https://doi.org/10.1007/978-3-030-05931-6_14 [8] S. Moylan, J. Slotwinski, A. Cooke, K. Jurrens, M. A. Donmez, and A. Donmez, "Proposal for a Standardized Test Artifact for Additive Manufacturing Machines and Processes," Solid Freeform Fabrication Symposium Proceedings, pp. 902-920, 2012. https://doi.org/10.6028/NIST.IR.7858 [9] ASME Y14.46-2017 Product Definition for Additive Manufacturing. New York:The American Society of Mechanical Engineers, 2017. [10] H. Li, T. Wang, J. Sun, and Z. Yu, "The effect of process parameters in fused deposition modelling on bonding degree and mechanical properties," Rapid Prototyping Journal, vol. 24, no. 1, pp. 80-92, Jan. 2018, https://doi.org/10.1108/RPJ-06-2016-0090 [11] A. W. Gebisa and H. G. Lemu, "Investigating effects of Fused-deposition modeling (FDM) processing parameters on flexural properties of ULTEM 9085 using designed experiment, "Materials, vol.11, no. 4, pp. 1-23, 2018, https://doi.org/10.3390/ma11040500 PMid:29584674 PMCid:PMC5951346 [12] B. Wittbrodt and J. M. Pearce, "The effects of PLA color on material properties of 3-D printed components," Additive Manufacturing, vol. 8, pp. 110-116, 2015, https://doi.org/10.1016/j.addma.2015.09.006 [13] O. A. Mohamed, S. H. Masood, and J. L. Bhowmik, "Optimization of fused deposition modeling process parameters: a review of current research and future prospects," Advances in Manufacturing, vol. 3, no. 1, pp. 42-53, Mar. 2015, https://doi.org/10.1007/s40436-014-0097-7 [14] S. K. Everton, M. Hirsch, P. Stravroulakis, R. K. Leach and A. T. Clare, "Review of in-situ process monitoring and in-situ metrology for metal additive manufacturing," Materials and Design, vol. 95, pp. 431-445, 2016, https://doi.org/10.1016/j.matdes.2016.01.099 [15] P. K. Rao, J. P. Liu, D. Roberson, Z. J. Kong, and C. Williams,"Online Real-Time Quality Monitoring in Additive Manufacturing Processes Using Heterogeneous Sensors," Journal of Manufacturing Science and Engineering, vol. 137, no. 6, p.061007, Sep. 2015, https://doi.org/10.1115/1.4029823 [16] J. Pellegrino, T. Makila, S. McQueen, and E. Taylor, "Measurement science roadmap for polymer-based additive manufacturing," Gaithersburg, MD, Dec. 2016. https://doi.org/10.6028/NIST.AMS.100-5 [17] T. R. Kramer, F. M. Proctor, and E. Messina, "The NIST RS274NGC Interpreter -Version 3," Gaithersburg, Maryland, 2000. https://doi.org/10.6028/NIST.IR.6556 [18] B. N. Turner, R. Strong, and S. A. Gold, "A review of melt extrusion additive manufacturing processes: I. Process design and modeling," Rapid Prototyping Journal, vol. 20, no. 3, pp.192-204, Apr. 2014, https://doi.org/10.1108/RPJ-01-2013-0012 [19] Conrad Electronic, "Renkforce RF1000 3D Drucker," 2016. https://www.conrad.de/de/renkforce-rf1000-3d-drucker-single-extruder-inkl-software-franzis-designcad-v24-3d-printrenkforce-edition-1007508.html (accessed Sep. 20, 2016). [20] G. Hodgson, A. Ranellucci, and J. Moe, "Slic3r Manual - Flow Math," 2016. http://manual.slic3r.org/advanced/flow-math (accessed Jun. 21, 2016). [21] Repetier, "Repetier-Firmware Documentation." https://www.repetier.com/documentation/repetier firmware/repetier-firmware-introduction/ (accessed Apr. 17, 2018). [22] B. Weiss, D. W. Storti, and M. A. Ganter, "Low-cost closedloop control of a 3D printer gantry," Rapid Prototyping Journal, vol. 21, no. 5, pp. 482-490, Aug. 2015, https://doi.org/10.1108/RPJ-09-2014-0108 [23] R. L. Zinniel and J. S. Batchelder, "Volumetric Feed Control for Flexible Filament," US 6085957, 2000. [24] W. J. Heij, Applied Metrology in Additive Manufacturing. Delft: Delft University of Technology, 2016. [25] G. P. Greeff and M. Schilling, "Closed loop control of slippage during filament transport in molten material extrusion," Additive Manufacturing, vol. 14, pp. 31-38, 2017, https://doi.org/10.1016/j.addma.2016.12.005 [26] G. P. Greeff, Applied Metrology in Additive Manufacturing, vol. 60. Berlin: Mensch und Buch, 2019. [27] G. P. Greeff and M. Schilling, "Comparing Retraction Methods with Volumetric Exit Flow Measurement in Molten Material Extrusion," in Special Interest Group meeting on Dimensional Accuracy and Surface Finish in Additive Manufacturing, 2017, no. October, pp. 70-74. [28] G. P. Greeff and M. Schilling, "Single print optimisation of fused filament fabrication parameters," The International Journal of Advanced Manufacturing Technology, Aug. 2018, https://doi.org/10.1007/s00170-018-2518-4 [29] A. Bellini, S. Güçeri, and M. Bertoldi, "Liquefier Dynamics in Fused Deposition," Journal of Manufacturing Science and Engineering, vol. 126, no. 2, p. 237, 2004, https://doi.org/10.1115/1.1688377 [30] P. Virtanen et al., "SciPy 1.0: fundamental algorithms for scientific computing in Python," Nature Methods, vol. 17, no. 3, pp. 261-272, Mar. 2020, https://doi.org/10.1038/s41592-019-0686-2 PMid:32015543 PMCid:PMC7056644 (Print: ISSN 1931-5775) (Online: ISSN 2381-0580) © 2021 NCSL International Software to Maximize End-User Uptake of Conformity Assessment with Measurement Uncertainty, Including Bivariate Cases. The European EMPIR CASoft Project

Zhipu AI. Cogvideox-3: Text-to-video diffusion models.https://chatglm.cn/video, 2026. Accessed: 2026-4-18. 15 A. Additional Experimental Details A.1. PDI-Dataset Construction The PDI-Dataset consists of 183 video sequences in total, partitioned into real-world and synthetic subsets. Real-world sequences.The real-world portion of PDI-Dataset contains 15 sh...

work page 2026
[33]

All synthetic videos presented in our benchmark reflect the baseline commercial performance available to end-users at the time of evaluation. Note that the Sora samples in our dataset were generated using the $20 monthly consumer subscription rather than the enterprise API, representing the baseline commercial performance of the model. The 28 text prompts...

work page
[34]

A handheld following shot of a red vintage car driving away on a straight desert highway, harsh noon light and heat haze on the horizon, subtle shake and lateral drift

work page
[35]

A high-speed train moving toward the viewer on a straight track, low-angle handheld perspective, rails and gravel receding toward a clear vanishing point

work page
[36]

A yellow school bus driving away on a straight tree-lined suburban street, the shot tracking from a low position behind, morning light and clean asphalt

work page
[37]

A silver metallic sphere rolling away on a long reflective marble floor in a bright gallery, the shot following closely with slight sway

work page
[38]

A heavy cargo truck moving away on a straight bridge at night, tail lights glowing, subtle frame shake, city lights in the distance

work page
[39]

Adaptive boundary control of an axially moving string system: Application to container cranes

A large shipping container being pushed away on a straight industrial dock, cranes and water behind, moving viewpoint, overcast industrial light. Dynamic Tracking

work page
[40]

A handheld following shot of a red sports car driving on a straight multi-lane highway, city skyline and roadside trees in the background receding rapidly with parallax

work page
[41]

A smooth following shot of an autonomous suitcase moving through a vast airport terminal, repeated columns and floor patterns rushing past in frame

work page
[42]

A close handheld shot following a large chrome sphere rolling along a straight, reflective museum corridor, exhibits and windows flowing past

work page
[43]

A following shot from a vehicle alongside, keeping pace with a large truck carrying a blue container on a long bridge, waves and bridge cables creating dynamic background motion

work page
[44]

A smooth following shot of a metal logistics crate moving along a straight automated conveyor, complex factory machinery in the background rushing past

work page
[45]

Table 11: Leading XR systems with security features (Continued on next page).

A handheld following shot of a large metal ball rolling through a straight modern art gallery, surrounding artworks and viewers receding rapidly with parallax. Biological Motion Continued on next page 17 Table 4 –Continued from previous page Category Text Prompt

work page
[46]

A smooth following shot of a large eagle flying at high speed parallel to a cliff, rock face and sea below, clear sky

work page
[47]

A following shot from a moving boat of a dolphin swimming and leaping in the waves alongside, spray and sunlight

work page
[48]

A handheld shot of a large octopus swimming away in a complex coral reef, tentacles waving, colorful fish and coral, blue water and light shafts

work page
[49]

A backward-moving shot following a snake slithering through dense colorful flowers on the ground, petals and stems, soft daylight

work page
[50]

Introduction: Moving and Shaking

A moving shot following a peacock walking and shaking its tail feathers in a palace garden, fountains and trimmed hedges, ornate tiles. Curved Motion

work page
[51]

Figure 4: <i>Nymphargus</i> <i>laurae</i> (INABIO15383), (A) dorsal view, (B) side view, (C) front view and (D) ventral view.

A handheld tracking perspective follows a silver compact SUV navigating a sharp hairpin turn on a winding mountain road. The view orbits slightly to capture the vehicle transitioning from a front-view to a side-view against the pine forest background

work page
[52]

Investigation of Cabin Noise while Accelerating on Low Mu Track through Simulation Approach Using Full Vehicle ADAMS/Car Model

A low-angle shot follows a sports car drifting through a 90-degree corner on a professional race track. The car rotates intensely while the moving shot emphasizes the shifting vanishing lines of the curb and tire marks

work page
[53]

A Comparative CFD Study of Side-view Mirror and Side-view Camera Usages on a City Bus

A cinematic tracking shot follows a city bus driving through a large, ornate stone round- about. The view maintains a side perspective, showing the bus constantly changing its orientation relative to the central fountain and surrounding city traffic

work page
[54]

Flying-qualities criteria for wings-level-turn maneuvering during anair-to-ground weapon delivery task

A ground-level perspective tracking a small delivery robot as it makes a sharp turn at a sidewalk corner. The shot stays close, highlighting the rotation of the robot’s boxy frame against the detailed brickwork

work page
[55]

Optical see-through augmented reality displays with wide field of view and hard-edge occlusion by using paired conical reflectors

A handheld shot follows a green tractor making a wide turn at the edge of a plowed field. The view moves with the vehicle, capturing the shifting angles of the heavy wheels and mechanical parts against the vast landscape. Partial Occlusion

work page
[56]

A car driving along a street at night, wheels briefly obscured by a low roadside guardrail for under a second, handheld shot moving alongside, street lamps and storefronts

work page
[57]

A train passing behind a row of thin vertical power line poles, the shot tracking its movement from a moving platform, sky and industrial landscape

work page
[58]

A bus moving through a city street, briefly partially hidden by a thin traffic sign, the shot following from the sidewalk

work page
[59]

A vintage car driving past a row of thin trees, never fully leaving the moving view, autumn leaves and road

work page
[60]

(Continued from previous page)

A boat sailing behind a thin pier support, remaining partially visible throughout, handheld shot from the dock, sea and sky. Continued on next page 18 Table 4 –Continued from previous page Category Text Prompt

work page
[61]

physics-perfect

A robot crate moving through a warehouse, passing behind a thin metal rack, the shot following alongside, shelves and boxes, industrial lighting. Reconstruction-aware weighting.The final PDI score is synthesized as a weighted sum of three orthogonal physical residuals: PDI Score=𝑤 1 ⋅RMSE(𝜖 𝑠𝑐𝑎𝑙𝑒 )+𝑤 2 ⋅RMSE(𝜖𝑡𝑟𝑎 𝑗)+𝑤 3 ⋅𝜖 𝑟𝑖𝑔𝑖𝑑𝑖𝑡 𝑦 ,(11) where ∑𝑖 𝑤𝑖 = 1....

work page
[62]

3D Pairwise Rigidity (Primary).We sample world-space pointsq𝑛 𝑡 from Mega-SAM pointmaps at CoTracker locations. Anchor pairs are selected at𝑡= 0by triple filtering: (i) visibility filtering, (ii) depth-gradient reliability filtering, and (iii) pair scoring that favors both large 3D separation and interior-region reliability (distance to mask boundary). Fo...

work page
[63]

3D Height Stability (Fallback when Strategy 1 is not entered).If 3D points are valid but strategy 1 is unavailable at the dispatcher level, we compute per-frame 3D object height from foreground𝑦-span: ℎ3𝐷 𝑡 =𝑃 95(𝑦𝑡)−𝑃 5(𝑦𝑡), and use coefficient of variation: 𝜖 (2) rigid = std({ℎ3𝐷 𝑡 }𝑇 𝑡=1) mean({ℎ3𝐷 𝑡 }𝑇 𝑡=1)+𝜖 . 22

work page
[64]

speed” of receding (trajectory) and its “rate

2D Pairwise Consistency (Degraded fallback).When 3D evidence is unavailable, we use 2D CoTracker pairwise distance ratios: 𝑟2𝐷 𝑖 𝑗 (𝑡)= 𝑑𝑖 𝑗(𝑡) 𝑑𝑖 𝑗(0), 𝜌 2𝐷 𝑡 = std({𝑟2𝐷 𝑖 𝑗 (𝑡)}) mean({𝑟2𝐷 𝑖 𝑗 (𝑡)})+𝜖 , and compute 𝜖 (3) rigid = 1 𝑇 𝑇 ∑ 𝑡=1 𝜌2𝐷 𝑡 . Finally, the rigidity component used by PDI is 𝜖rigid = ⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩ 𝜖 (1) rigid,if Strategy 1 is sel...

work page

[1] [1]

Moving Off-the-Grid: Scene-Grounded Video Representations

K. Allen, C. Doersch, G. Zhou, M. Suhail, D. Driess, I. Rocco, Y. Rubanova, T. Kipf, M. S. M. Sajjadi, K. Murphy, J. Carreira, and S. van Steenkiste. Direct motion models for assessing generated videos,

work page

[2] [2]

URLhttps://arxiv.org/abs/2505.00209

work page arXiv

[3] [3]

M. Asim, C. Wewer, T. Wimmer, B. Schiele, and J. E. Lenssen. Met3r: Measuring multi-view consistency in generated images, 2026. URLhttps://arxiv.org/abs/2501.06336

work page arXiv 2026

[4] [4]

VideoPhy: Evaluating Physical Commonsense for Video Generation

H. Bansal, Z. Lin, T. Xie, Z. Zong, M. Yarom, Y. Bitton, C. Jiang, Y. Sun, K.-W. Chang, and A. Grover. Videophy: Evaluating physical commonsense for video generation, 2024. URLhttps://arxiv. org/abs/2406.03520

work page internal anchor Pith review arXiv 2024

[5] [5]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

A. Blattmann, T. Dockhorn, S. Kulal, D. Mendelevitch, M. Kilian, D. Lorenz, Y. Levi, Z. English, V. Voleti, A. Letts, V. Jampani, and R. Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. URLhttps://arxiv.org/abs/2311.15127

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

The Emotion Recognition Triathlon: DeepSeek vs. ChatGPT vs. Doubao

ByteDance. Doubao: A family of large language models. https://www.volcengine.com/ product/doubao, 2026. Accessed: 2026-05-06

work page 2026

[7] [7]

[xXx]sex!+video+)* www sex videos com xxx sex videos

ByteDance. Seedance 2.0 fast: High-efficiency video generation foundation model.https://www. doubao.com/, 2026. Accessed: 2026-04-19

work page 2026

[8] [8]

W. Chow, J. Mao, B. Li, D. Seita, V. Guizilini, and Y. Wang. Physbench: Benchmarking and enhancing vision-language models for physical world understanding, 2025. URLhttps://arxiv.org/abs/ 2501.16411

work page arXiv 2025

[9] [9]

Worldscore: A unified evaluation benchmark for world generation.arXiv preprint arXiv:2504.00983, 2025

H. Duan, H.-X. Yu, S. Chen, L. Fei-Fei, and J. Wu. Worldscore: A unified evaluation benchmark for world generation, 2025. URLhttps://arxiv.org/abs/2504.00983

work page arXiv 2025

[10] [10]

A Comparative Study of Prompt Engineering Techniques for Consistent AI Image Generation Across Google Gemini, Google Flow, and Freepik Spaces

Google. Flow: Where the next wave of storytelling happens.https://labs.google/fx/tools/ flow, 2026. Accessed: 2026-03-04

work page 2026

[11] [11]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models, 2020. URL https: //arxiv.org/abs/2006.11239

work page internal anchor Pith review Pith/arXiv arXiv 2020

[12] [12]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

X. Huang, Z. Li, G. He, M. Zhou, and E. Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion, 2025. URLhttps://arxiv.org/abs/2506.08009

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Huang et al

Z. Huang, Y. He, J. Yu, F. Zhang, C. Si, Y. Jiang, Y. Zhang, T. Wu, Q. Jin, N. Chanpaisit, Y. Wang, X. Chen, L. Wang, D. Lin, Y. Qiao, and Z. Liu. Vbench: Comprehensive benchmark suite for video generative models, 2023. URLhttps://arxiv.org/abs/2311.17982

work page arXiv 2023

[14] [14]

Cotracker3: Simpler and better point tracking by pseudo-labelling real videos

N. Karaev, I. Makarov, J. Wang, N. Neverova, A. Vedaldi, and C. Rupprecht. Cotracker3: Simpler and better point tracking by pseudo-labelling real videos, 2024. URLhttps://arxiv.org/abs/ 2410.11831

work page arXiv 2024

[15] [15]

W. Kong, Q. Tian, Z. Zhang, R. Min, Z. Dai, J. Zhou, J. Xiong, X. Li, B. Wu, J. Zhang, K. Wu, Q. Lin, J. Yuan, Y. Long, A. Wang, A. Wang, C. Li, D. Huang, F. Yang, H. Tan, H. Wang, J. Song, J. Bai, J. Wu, J. Xue, J. Wang, K. Wang, M. Liu, P. Li, S. Li, W. Wang, W. Yu, X. Deng, Y. Li, Y. Chen, Y. Cui, Y. Peng, 13 Z. Yu, Z. He, Z. Xu, Z. Zhou, Z. Xu, Y. Tao...

work page

[16] [16]

URLhttps://arxiv.org/abs/2412.03603

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

D. Li, Y. Fang, Y. Chen, S. Yang, S. Cao, J. Wong, M. Luo, X. Wang, H. Yin, J. E. Gonzalez, I. Stoica, S. Han, and Y. Lu. Worldmodelbench: Judging video generation models as world models, 2025. URLhttps://arxiv.org/abs/2502.20694

work page arXiv 2025

[18] [18]

Z. Li, R. Tucker, F. Cole, Q. Wang, L. Jin, V. Ye, A. Kanazawa, A. Holynski, and N. Snavely. Megasam: Accurate, fast, and robust structure and motion from casual dynamic videos, 2024. URLhttps: //arxiv.org/abs/2412.04463

work page arXiv 2024

[19] [19]

Y. Liu, K. Zhang, Y. Li, Z. Yan, C. Gao, R. Chen, Z. Yuan, Y. Huang, H. Sun, J. Gao, L. He, and L. Sun. Sora: A review on background, technology, limitations, and opportunities of large vision models,

work page

[20] [20]

URLhttps://arxiv.org/abs/2402.17177

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

F. Meng, J. Liao, X. Tan, W. Shao, Q. Lu, K. Zhang, Y. Cheng, D. Li, Y. Qiao, and P. Luo. Towards world simulator: Crafting physical commonsense-based benchmark for video generation, 2024. URLhttps://arxiv.org/abs/2410.05363

work page internal anchor Pith review arXiv 2024

[22] [22]

OpenAI Sora: Generate Impressive Videos with Text Instructions

OpenAI. Sora: Creating video from text.https://openai.com/sora, 2025. Accessed: 2026- 03-20

work page 2025

[23] [23]

Learning Transferable Visual Models From Natural Language Supervision

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, 2021. URLhttps://arxiv.org/abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021

[24] [24]

N. Ravi, V. Gabeur, Y.-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V. Alwala, N. Carion, C.-Y. Wu, R. Girshick, P. Dollár, and C. Feichtenhofer. Sam 2: Segment anything in images and videos, 2024. URLhttps://arxiv.org/abs/2408.00714

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Improved Techniques for Training GANs

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans, 2016. URLhttps://arxiv.org/abs/1606.03498

work page internal anchor Pith review Pith/arXiv arXiv 2016

[26] [26]

K. Sun, K. Huang, X. Liu, Y. Wu, Z. Xu, Z. Li, and X. Liu. T2v-compbench: A comprehensive benchmark for compositional text-to-video generation, 2025. URLhttps://arxiv.org/abs/ 2407.14505

work page arXiv 2025

[27] [27]

Towards Accurate Generative Models of Video: A New Metric & Challenges

T. Unterthiner, S. van Steenkiste, K. Kurach, R. Marinier, M. Michalski, and S. Gelly. Towards accurate generative models of video: A new metric & challenges, 2019. URLhttps://arxiv. org/abs/1812.01717

work page internal anchor Pith review Pith/arXiv arXiv 2019

[28] [28]

de Melo, and Achuta Kadambi

R. Upadhyay, H. Zhang, J. Solomon, A. Agrawal, P. Boreddy, S. S. Narayana, Y. Ba, A. Wong, C. M. de Melo, and A. Kadambi. Worldbench: Disambiguating physics for diagnostic evaluation of world models, 2026. URLhttps://arxiv.org/abs/2601.21282

work page arXiv 2026

[29] [29]

T. Wan, A. Wang, B. Ai, B. Wen, C. Mao, C.-W. Xie, D. Chen, F. Yu, H. Zhao, J. Yang, J. Zeng, J. Wang, J. Zhang, J. Zhou, J. Wang, J. Chen, K. Zhu, K. Zhao, K. Yan, L. Huang, M. Feng, N. Zhang, P. Li, P. Wu, R. Chu, R. Feng, S. Zhang, S. Sun, T. Fang, T. Wang, T. Gui, T. Weng, T. Shen, W. Lin, W. Wang, W. Wang, W. Zhou, W. Wang, W. Shen, W. Yu, X. Shi, X....

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Video for A Large-Scale Empirical Study of COVID-19 Themed GitHub Repositories

Wan-Video. Wan2.2: Wan: Open and advanced large-scale video generative models.https: //github.com/Wan-Video/Wan2.2, 2025. GitHub repository

work page 2025

[31] [31]

B. Xiao, H. Wu, W. Xu, X. Dai, H. Hu, Y. Lu, M. Zeng, C. Liu, and L. Yuan. Florence-2: Advancing a unified representation for a variety of vision tasks, 2023. URLhttps://arxiv.org/abs/2311. 06242

work page 2023

[33] [33]

All synthetic videos presented in our benchmark reflect the baseline commercial performance available to end-users at the time of evaluation. Note that the Sora samples in our dataset were generated using the $20 monthly consumer subscription rather than the enterprise API, representing the baseline commercial performance of the model. The 28 text prompts...

work page

[34] [34]

A handheld following shot of a red vintage car driving away on a straight desert highway, harsh noon light and heat haze on the horizon, subtle shake and lateral drift

work page

[35] [35]

A high-speed train moving toward the viewer on a straight track, low-angle handheld perspective, rails and gravel receding toward a clear vanishing point

work page

[36] [36]

A yellow school bus driving away on a straight tree-lined suburban street, the shot tracking from a low position behind, morning light and clean asphalt

work page

[37] [37]

A silver metallic sphere rolling away on a long reflective marble floor in a bright gallery, the shot following closely with slight sway

work page

[38] [38]

A heavy cargo truck moving away on a straight bridge at night, tail lights glowing, subtle frame shake, city lights in the distance

work page

[39] [39]

Adaptive boundary control of an axially moving string system: Application to container cranes

A large shipping container being pushed away on a straight industrial dock, cranes and water behind, moving viewpoint, overcast industrial light. Dynamic Tracking

work page

[40] [40]

A handheld following shot of a red sports car driving on a straight multi-lane highway, city skyline and roadside trees in the background receding rapidly with parallax

work page

[41] [41]

A smooth following shot of an autonomous suitcase moving through a vast airport terminal, repeated columns and floor patterns rushing past in frame

work page

[42] [42]

A close handheld shot following a large chrome sphere rolling along a straight, reflective museum corridor, exhibits and windows flowing past

work page

[43] [43]

A following shot from a vehicle alongside, keeping pace with a large truck carrying a blue container on a long bridge, waves and bridge cables creating dynamic background motion

work page

[44] [44]

A smooth following shot of a metal logistics crate moving along a straight automated conveyor, complex factory machinery in the background rushing past

work page

[45] [45]

Table 11: Leading XR systems with security features (Continued on next page).

A handheld following shot of a large metal ball rolling through a straight modern art gallery, surrounding artworks and viewers receding rapidly with parallax. Biological Motion Continued on next page 17 Table 4 –Continued from previous page Category Text Prompt

work page

[46] [46]

A smooth following shot of a large eagle flying at high speed parallel to a cliff, rock face and sea below, clear sky

work page

[47] [47]

A following shot from a moving boat of a dolphin swimming and leaping in the waves alongside, spray and sunlight

work page

[48] [48]

A handheld shot of a large octopus swimming away in a complex coral reef, tentacles waving, colorful fish and coral, blue water and light shafts

work page

[49] [49]

A backward-moving shot following a snake slithering through dense colorful flowers on the ground, petals and stems, soft daylight

work page

[50] [50]

Introduction: Moving and Shaking

A moving shot following a peacock walking and shaking its tail feathers in a palace garden, fountains and trimmed hedges, ornate tiles. Curved Motion

work page

[51] [51]

Figure 4: <i>Nymphargus</i> <i>laurae</i> (INABIO15383), (A) dorsal view, (B) side view, (C) front view and (D) ventral view.

A handheld tracking perspective follows a silver compact SUV navigating a sharp hairpin turn on a winding mountain road. The view orbits slightly to capture the vehicle transitioning from a front-view to a side-view against the pine forest background

work page

[52] [52]

Investigation of Cabin Noise while Accelerating on Low Mu Track through Simulation Approach Using Full Vehicle ADAMS/Car Model

A low-angle shot follows a sports car drifting through a 90-degree corner on a professional race track. The car rotates intensely while the moving shot emphasizes the shifting vanishing lines of the curb and tire marks

work page

[53] [53]

A Comparative CFD Study of Side-view Mirror and Side-view Camera Usages on a City Bus

A cinematic tracking shot follows a city bus driving through a large, ornate stone round- about. The view maintains a side perspective, showing the bus constantly changing its orientation relative to the central fountain and surrounding city traffic

work page

[54] [54]

Flying-qualities criteria for wings-level-turn maneuvering during anair-to-ground weapon delivery task

A ground-level perspective tracking a small delivery robot as it makes a sharp turn at a sidewalk corner. The shot stays close, highlighting the rotation of the robot’s boxy frame against the detailed brickwork

work page

[55] [55]

Optical see-through augmented reality displays with wide field of view and hard-edge occlusion by using paired conical reflectors

A handheld shot follows a green tractor making a wide turn at the edge of a plowed field. The view moves with the vehicle, capturing the shifting angles of the heavy wheels and mechanical parts against the vast landscape. Partial Occlusion

work page

[56] [56]

A car driving along a street at night, wheels briefly obscured by a low roadside guardrail for under a second, handheld shot moving alongside, street lamps and storefronts

work page

[57] [57]

A train passing behind a row of thin vertical power line poles, the shot tracking its movement from a moving platform, sky and industrial landscape

work page

[58] [58]

A bus moving through a city street, briefly partially hidden by a thin traffic sign, the shot following from the sidewalk

work page

[59] [59]

A vintage car driving past a row of thin trees, never fully leaving the moving view, autumn leaves and road

work page

[60] [60]

(Continued from previous page)

A boat sailing behind a thin pier support, remaining partially visible throughout, handheld shot from the dock, sea and sky. Continued on next page 18 Table 4 –Continued from previous page Category Text Prompt

work page

[61] [61]

physics-perfect

A robot crate moving through a warehouse, passing behind a thin metal rack, the shot following alongside, shelves and boxes, industrial lighting. Reconstruction-aware weighting.The final PDI score is synthesized as a weighted sum of three orthogonal physical residuals: PDI Score=𝑤 1 ⋅RMSE(𝜖 𝑠𝑐𝑎𝑙𝑒 )+𝑤 2 ⋅RMSE(𝜖𝑡𝑟𝑎 𝑗)+𝑤 3 ⋅𝜖 𝑟𝑖𝑔𝑖𝑑𝑖𝑡 𝑦 ,(11) where ∑𝑖 𝑤𝑖 = 1....

work page

[62] [62]

3D Pairwise Rigidity (Primary).We sample world-space pointsq𝑛 𝑡 from Mega-SAM pointmaps at CoTracker locations. Anchor pairs are selected at𝑡= 0by triple filtering: (i) visibility filtering, (ii) depth-gradient reliability filtering, and (iii) pair scoring that favors both large 3D separation and interior-region reliability (distance to mask boundary). Fo...

work page

[63] [63]

3D Height Stability (Fallback when Strategy 1 is not entered).If 3D points are valid but strategy 1 is unavailable at the dispatcher level, we compute per-frame 3D object height from foreground𝑦-span: ℎ3𝐷 𝑡 =𝑃 95(𝑦𝑡)−𝑃 5(𝑦𝑡), and use coefficient of variation: 𝜖 (2) rigid = std({ℎ3𝐷 𝑡 }𝑇 𝑡=1) mean({ℎ3𝐷 𝑡 }𝑇 𝑡=1)+𝜖 . 22

work page

[64] [64]

speed” of receding (trajectory) and its “rate

2D Pairwise Consistency (Degraded fallback).When 3D evidence is unavailable, we use 2D CoTracker pairwise distance ratios: 𝑟2𝐷 𝑖 𝑗 (𝑡)= 𝑑𝑖 𝑗(𝑡) 𝑑𝑖 𝑗(0), 𝜌 2𝐷 𝑡 = std({𝑟2𝐷 𝑖 𝑗 (𝑡)}) mean({𝑟2𝐷 𝑖 𝑗 (𝑡)})+𝜖 , and compute 𝜖 (3) rigid = 1 𝑇 𝑇 ∑ 𝑡=1 𝜌2𝐷 𝑡 . Finally, the rigidity component used by PDI is 𝜖rigid = ⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩ 𝜖 (1) rigid,if Strategy 1 is sel...

work page