Semantic-Aware, Physics-Informed, Geometry-Grounded Weather Video Synthesis
Pith reviewed 2026-06-30 09:16 UTC · model grok-4.3
The pith
Factoring weather video synthesis into semantic, dynamic, and geometric signals enables diverse and physically realistic weather effects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By factorizing synthesis into semantic-aware appearance anchoring from scene semantics and user input, physics-informed dynamic simulation of a Gaussian-represented particle field under gravity, wind, and turbulence, and geometry-grounded video synthesis that aligns particles with target scene geometry, an off-the-shelf video editor can be steered to produce diverse global appearances and detailed particle dynamics that are both physically and visually realistic while preserving scene identity, structure, and motion.
What carries the argument
Factorization into three conditional signals—semantics for appearance, physics-informed particle simulation for temporal evolution, and geometry for spatial placement—that together steer an off-the-shelf video editor.
If this is right
- The method generates diverse, physically and visually realistic weather effects with explicit control over temporal evolution and particle motion.
- Synthesized videos preserve original scene identity, structure, and motion.
- Training on the synthesized adverse-weather videos significantly improves the robustness of autonomous driving semantic segmentation models under real adverse conditions.
Where Pith is reading between the lines
- The same three-signal factorization could be adapted to other particle phenomena such as smoke or dust by changing only the force terms in the simulation.
- Synthetic weather data produced this way may reduce the volume of real adverse-weather recordings needed to train robust perception systems.
- Explicit physics and geometry conditioning may help other generative video models overcome suppression of complex dynamic effects.
Load-bearing premise
An off-the-shelf general-purpose video editor will reliably generate dense particle effects when supplied with the three additional conditional signals instead of suppressing them.
What would settle it
Run the method on a set of clear-weather driving videos and measure whether the outputs contain dense, temporally coherent particles whose motion matches the simulated gravity-wind-turbulence field; separately, train a semantic segmentation model on the generated adverse-weather videos and check whether accuracy on real adverse-weather test sets fails to rise above a baseline trained only on clear data.
Figures
read the original abstract
Weather synthesis aims to add weather effects to input videos while preserving scene identity, structure, and motion. The key limitation of existing methods is the lack of diversity in weather appearance and effective control over weather dynamics (e.g., temporal evolution and particle motion). Most approaches rely on text prompts, which are inherently underspecified and often fail to produce detailed weather characteristics. Additionally, general-purpose video editors optimized for clean and aesthetic outputs tend to suppress heavy weather phenomena, making dense particle effects difficult to generate. To address these, we propose a Semantic-Aware, Physics-Informed, and Geometry-Grounded framework that steers an off-the-shelf video editor to synthesize diverse global appearances and detailed particle dynamics. We factorize the synthesis into three conditional signals, so that each provides a distinct and stable source of guidance: semantics specifies what the weather should look like, dynamics governs how it evolves over time, and geometry determines where it should appear in the scene. Specifically, we introduce (1) semantic-aware appearance anchoring to establish the target appearance from scene semantics and user input; (2) physics-informed dynamic simulation to generate particle effects by simulating a Gaussian-represented particle field under gravity, wind, and turbulence; and (3) geometry-grounded video synthesis to align the simulated particles with target scene geometry and synthesize the final video. Experiments demonstrate that our method produces diverse, physically and visually realistic weather effects. Furthermore, we show that our synthesized data significantly improves the robustness of autonomous driving semantic segmentation under adverse weather conditions. Project page: https://jumponthemoon.github.io/w-crafter/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Semantic-Aware, Physics-Informed, Geometry-Grounded framework that steers an off-the-shelf video editor via three conditional signals—semantic-aware appearance anchoring, physics-informed Gaussian particle simulation under gravity/wind/turbulence, and geometry-grounded alignment—to synthesize weather effects in videos while preserving scene identity. It claims this overcomes limitations of text-prompt methods and editor suppression of heavy weather, producing diverse, physically and visually realistic effects, and that the synthesized data significantly improves robustness of autonomous driving semantic segmentation under adverse weather.
Significance. If the central claims hold, the work would offer a controllable approach to weather data synthesis that could aid training of robust perception models, particularly for autonomous driving in adverse conditions. The explicit factorization into distinct semantic, dynamic, and geometric signals is a reasonable response to underspecification in prior methods.
major comments (2)
- [Abstract] Abstract: the claim that experiments 'demonstrate that our method produces diverse, physically and visually realistic weather effects' and that synthesized data 'significantly improves the robustness' of semantic segmentation rests on asserted experimental support, yet the provided text contains no quantitative metrics, ablation details, dataset descriptions, or error analysis. This is load-bearing for both the realism and downstream claims.
- [Abstract] Abstract: the method asserts that the three conditional signals will steer the off-the-shelf editor to produce dense particle effects despite the documented tendency of such editors to 'suppress heavy weather phenomena,' but no explicit mechanism, loss term, or enforcement strategy is described for overriding the editor's training objective. This premise is central to the particle-dynamics claim.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below, clarifying the role of the abstract and the conditioning strategy.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that experiments 'demonstrate that our method produces diverse, physically and visually realistic weather effects' and that synthesized data 'significantly improves the robustness' of semantic segmentation rests on asserted experimental support, yet the provided text contains no quantitative metrics, ablation details, dataset descriptions, or error analysis. This is load-bearing for both the realism and downstream claims.
Authors: The abstract follows standard conventions by summarizing outcomes at a high level without embedding full metrics or ablations. The complete manuscript presents these details in Section 4 (Experiments), including quantitative metrics (FID, LPIPS for realism; mIoU gains on Cityscapes/ACDC for downstream robustness), ablation studies in 4.3, dataset specifications, and error analysis. We will revise the abstract to include one or two key quantitative results for better self-containment. revision: partial
-
Referee: [Abstract] Abstract: the method asserts that the three conditional signals will steer the off-the-shelf editor to produce dense particle effects despite the documented tendency of such editors to 'suppress heavy weather phenomena,' but no explicit mechanism, loss term, or enforcement strategy is described for overriding the editor's training objective. This premise is central to the particle-dynamics claim.
Authors: The steering occurs via dense, structured conditioning inputs derived from the three signals: semantic appearance maps, time-varying particle position fields from the Gaussian physics simulation, and geometry-aligned depth/normal maps. These are injected through the editor's native conditioning pathways (detailed in Section 3), providing explicit per-frame guidance that overrides suppression without modifying the editor or adding custom losses. We will expand the method description with a dedicated paragraph on the conditioning injection process. revision: partial
Circularity Check
No circularity: method is a compositional framework with no self-referential derivations or fitted predictions.
full rationale
The provided abstract and description present the approach as a factorization into three external conditional signals (semantics, dynamics via Gaussian particle simulation, geometry) that steer an off-the-shelf video editor. No equations, parameter fitting, or derivation chain are described that reduce outputs to inputs by construction. Claims rest on empirical experiments and downstream task improvement rather than mathematical self-reference. This matches the default expectation of a non-circular engineering composition.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption General-purpose video editors can be steered with additional conditional signals to produce dense particle weather effects instead of suppressing them.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
arXiv preprint (2025), arXiv:2503.14492 2
Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, et al. Cosmos-Transfer1: Conditional world generation with adaptive multimodal control.arXiv preprint arXiv:2503.14492, 2025
-
[3]
World Simulation with Video Foundation Models for Physical AI
Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical AI.arXiv preprint arXiv:2511.00062, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin, Willi Menapace, Guocheng Qian, Michael Vasilkovsky, Hsin-Ying Lee, Chaoyang Wang, Jiaxu Zou, Andrea Tagliasacchi, et al. VD3D: Taming large video diffusion transformers for 3D camera control.arXiv preprint arXiv:2407.12781, 2024
-
[5]
ReCamMaster: Camera-controlled generative rendering from a single video
Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang, Lianrui Mu, Jinwen Cao, Zuozhu Liu, Haoji Hu, Xiang Bai, Pengfei Wan, et al. ReCamMaster: Camera-controlled generative rendering from a single video. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14834–14844, 2025
2025
-
[6]
MUSES: The multi-sensor semantic perception dataset for driving under uncertainty
Tim Brödermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, and Luc Van Gool. MUSES: The multi-sensor semantic perception dataset for driving under uncertainty. InEuropean Conference on Computer Vision, pages 21–38. Springer, 2024
2024
-
[7]
nuScenes: A multimodal dataset for autonomous driving
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuScenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11621–11631, 2020
2020
-
[8]
Weifeng Chen, Yatai Ji, Jie Wu, Hefeng Wu, Pan Xie, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. Control-a-video: Controllable text-to-video diffusion models with motion prior and reward feedback learning.arXiv preprint arXiv:2305.13840, 2023
-
[9]
Yiyang Chen, Xuanhua He, Xiujun Ma, and Yue Ma. ContextFlow: Training-free video object editing via adaptive context enrichment.arXiv preprint arXiv:2509.17818, 2025
-
[10]
EditMGT: Unleashing potentials of masked generative transformers in image editing
Wei Chow, Linfeng Li, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, and Songhua Liu. EditMGT: Unleashing potentials of masked generative transformers in image editing. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 38038–38048, 2026
2026
-
[11]
Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan- Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, and Sen He. FLATTEN: optical flow-guided attention for consistent text-to-video editing.arXiv preprint arXiv:2310.05922, 2023. 12
-
[12]
RainyGS: Efficient rain synthesis with physically-based gaussian splatting
Qiyu Dai, Xingyu Ni, Qianfan Shen, Wenzheng Chen, Baoquan Chen, and Mengyu Chu. RainyGS: Efficient rain synthesis with physically-based gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16153–16162, 2025
2025
-
[13]
Unreal Engine.Retrieved from Unreal Engine: https://www
Unreal Engine. Unreal Engine.Retrieved from Unreal Engine: https://www. unrealengine. com/en-US/what-is-unreal-engine-4, 2018
2018
-
[14]
CCEdit: Creative and controllable video editing via diffusion models
Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, and Baining Guo. CCEdit: Creative and controllable video editing via diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6712–6722, 2024
2024
-
[15]
Gal Fiebelman, Hadar Averbuch-Elor, and Sagie Benaim. Let it snow! animating 3D gaussian scenes with dynamic weather effects via physics-guided score distillation.arXiv preprint arXiv:2504.05296, 2025
-
[16]
StyleGAN-NADA: CLIP-guided domain adaptation of image generators.ACM Transactions on Graphics, 41(4):1–13, 2022
Rinon Gal, Or Patashnik, Haggai Maron, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. StyleGAN-NADA: CLIP-guided domain adaptation of image generators.ACM Transactions on Graphics, 41(4):1–13, 2022
2022
-
[17]
Qijun Gan, Yi Ren, Chen Zhang, Zhenhui Ye, Pan Xie, Xiang Yin, Zehuan Yuan, Bingyue Peng, and Jianke Zhu. HumanDiT: Pose-guided diffusion transformer for long-form human motion video generation.arXiv preprint arXiv:2502.04847, 2025
-
[18]
PISCO: Precise video instance insertion with sparse control.arXiv preprint arXiv:2602.08277, 2026
Xiangbo Gao, Renjie Li, Xinghao Chen, Yuheng Wu, Suofei Feng, Qing Yin, and Zhengzhong Tu. PISCO: Precise video instance insertion with sparse control.arXiv preprint arXiv:2602.08277, 2026
-
[19]
An implicit compressible SPH solver for snow simulation.ACM Transactions on Graphics, 39(4): 36–1, 2020
Christoph Gissler, Andreas Henne, Stefan Band, Andreas Peer, and Matthias Teschner. An implicit compressible SPH solver for snow simulation.ACM Transactions on Graphics, 39(4): 36–1, 2020
2020
-
[20]
SparseCtrl: Adding sparse controls to text-to-video diffusion models
Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai. SparseCtrl: Adding sparse controls to text-to-video diffusion models. InEuropean Conference on Computer Vision, pages 330–348. Springer, 2024
2024
-
[21]
LTX-Video: Realtime Video Latent Diffusion
Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richard- son, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, Victor Kulikov, Yaki Bitterman, Zeev Melumian, and Ofir Bibi. LTX-Video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
Semantic understanding of foggy scenes with purely synthetic data
Martin Hahner, Dengxin Dai, Christos Sakaridis, Jan-Nico Zaech, and Luc Van Gool. Semantic understanding of foggy scenes with purely synthetic data. InIEEE Intelligent Transportation Systems Conference, pages 3675–3681, 2019
2019
-
[23]
Is your HD map constructor reliable under sensor corruptions? InAdvances in Neural Information Processing Systems, volume 37, pages 22441–22482, 2024
Xiaoshuai Hao et al. Is your HD map constructor reliable under sensor corruptions? InAdvances in Neural Information Processing Systems, volume 37, pages 22441–22482, 2024
2024
-
[24]
SafeMap: Robust HD map construction from incomplete observations
Xiaoshuai Hao et al. SafeMap: Robust HD map construction from incomplete observations. In International Conference on Machine Learning, pages 22091–22102. PMLR, 2025
2025
-
[25]
DAFormer: Improving network architectures and training strategies for domain-adaptive semantic segmentation
Lukas Hoyer, Dengxin Dai, and Luc Van Gool. DAFormer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9924–9935, 2022
2022
-
[26]
HRDA: Context-aware high-resolution domain- adaptive semantic segmentation
Lukas Hoyer, Dengxin Dai, and Luc Van Gool. HRDA: Context-aware high-resolution domain- adaptive semantic segmentation. InEuropean conference on computer vision, pages 372–391. Springer, 2022
2022
-
[27]
V ACE: All-in- one video creation and editing
Zeyinzi Jiang, Zhen Han, Chaojie Mao, Jingfeng Zhang, Yulin Pan, and Yu Liu. V ACE: All-in- one video creation and editing. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17191–17202, 2025. 13
2025
-
[28]
Tuning-free visual effect transfer across videos.arXiv preprint arXiv:2601.07833, 2026
Maxwell Jones, Rameen Abdal, Or Patashnik, Ruslan Salakhutdinov, Sergey Tulyakov, Jun-Yan Zhu, and Kuan-Chieh Jackson Wang. Tuning-free visual effect transfer across videos.arXiv preprint arXiv:2601.07833, 2026
-
[29]
3D Gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3D Gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023
2023
-
[30]
Robo3D: Towards robust and reliable 3D perception against corruptions
Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Robo3D: Towards robust and reliable 3D perception against corruptions. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19994–20006, 2023
2023
-
[31]
Cottereau, and Wei Tsang Ooi
Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, and Wei Tsang Ooi. RoboDepth: Robust out-of-distribution depth estimation under corruptions. InAdvances in Neural Information Processing Systems, volume 36, pages 21298–21342, 2023
2023
-
[32]
Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang, Zeliang Ma, Dengyi Ji, Haiwen Li,...
-
[33]
Multi-modal data-efficient 3D scene understanding for autonomous driving
Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, and Ziwei Liu. Multi-modal data-efficient 3D scene understanding for autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3748–3765, 2025
2025
-
[34]
Lingdong Kong, Wesley Yang, Jianbiao Mei, Youquan Liu, Ao Liang, Dekai Zhu, Dongyue Lu, Wei Yin, Xiaotao Hu, Mingkai Jia, Junyuan Deng, Kaiwen Zhang, Yang Wu, Tianyi Yan, Shenyuan Gao, Song Wang, Linfeng Li, Liang Pan, Yong Liu, Jianke Zhu, Wei Tsang Ooi, Steven C. H. Hoi, and Ziwei Liu. 3D and 4D world modeling: A survey.arXiv preprint arXiv:2509.07996, 2025
-
[35]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. HunyuanVideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Max Ku, Cong Wei, Weiming Ren, Harry Yang, and Wenhu Chen. AnyV2V: A tuning-free framework for any video-to-video editing tasks.arXiv preprint arXiv:2403.14468, 2024
-
[37]
Collaborative video diffusion: Consistent multi-video generation with camera control.Advances in Neural Information Processing Systems, 37:16240–16271, 2024
Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas J Guibas, and Gordon Wetzstein. Collaborative video diffusion: Consistent multi-video generation with camera control.Advances in Neural Information Processing Systems, 37:16240–16271, 2024
2024
-
[38]
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. FLUX. 1 Kontext: Flow matching for in-context image generation and editing in latent space.arXiv preprint arXiv:2506.15742, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Xuelong Li, Kai Kou, and Bin Zhao. Weather GAN: Multi-domain weather translation using generative adversarial networks.arXiv preprint arXiv:2103.05422, 2021
-
[40]
ClimateNeRF: Ex- treme weather synthesis in neural radiance field
Yuan Li, Zhi-Hao Lin, David Forsyth, Jia-Bin Huang, and Shenlong Wang. ClimateNeRF: Ex- treme weather synthesis in neural radiance field. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3227–3238, 2023. 14
2023
-
[41]
Cottereau, Changxin Gao, Liang Pan, Wei Tsang Ooi, and Ziwei Liu
Ao Liang, Lingdong Kong, Tianyi Yan, Hongsi Liu, Wesley Yang, Ziqi Huang, Wei Yin, Jialong Zuo, Yixuan Hu, Dekai Zhu, Dongyue Lu, Youquan Liu, Guangfeng Jiang, Linfeng Li, Xiangtai Li, Long Zhuo, Lai Xing Ng, Benoit R. Cottereau, Changxin Gao, Liang Pan, Wei Tsang Ooi, and Ziwei Liu. WorldLens: Full-spectrum evaluations of driving world models in real wor...
2026
-
[42]
Controllable weather synthesis and removal with video diffusion models
Chih-Hao Lin, Zian Wang, Ruofan Liang, Yuxuan Zhang, Sanja Fidler, Shenlong Wang, and Zan Gojcic. Controllable weather synthesis and removal with video diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13580–13591, 2025
2025
-
[43]
Depth Anything 3: Recovering the Visual Space from Any Views
Haotong Lin, Sili Chen, Junhao Liew, Donny Y Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
DL3DV-10K: A large-scale scene dataset for deep learning-based 3D vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. DL3DV-10K: A large-scale scene dataset for deep learning-based 3D vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024
2024
-
[45]
StableV2V: Stabilizing shape consistency in video-to-video editing.IEEE Transactions on Circuits and Systems for Video Technology, 2025
Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, and Dong Liu. StableV2V: Stabilizing shape consistency in video-to-video editing.IEEE Transactions on Circuits and Systems for Video Technology, 2025
2025
-
[46]
PhysGen: Rigid-body physics-grounded image-to-video generation
Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shenlong Wang. PhysGen: Rigid-body physics-grounded image-to-video generation. InEuropean Conference on Computer Vision, pages 360–378. Springer, 2024
2024
-
[47]
InfiniCube: Unbounded and controllable dynamic 3D driving scene generation with world-guided video models
Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, et al. InfiniCube: Unbounded and controllable dynamic 3D driving scene generation with world-guided video models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27272–27283, 2025
2025
-
[48]
Follow your pose: Pose-guided text-to-video generation using pose-free videos
Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Xiu Li, and Qifeng Chen. Follow your pose: Pose-guided text-to-video generation using pose-free videos. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 4117–4125, 2024
2024
-
[49]
NeRF: Representing scenes as neural radiance fields for view synthesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021
2021
-
[50]
Vision in bad weather
Shree K Nayar and Srinivasa G Narasimhan. Vision in bad weather. InProceedings of the IEEE/CVF International Conference on Computer Vision, volume 2, pages 820–827, 1999
1999
-
[51]
DreamDance: Animating human images by enriching 3D geometry cues from 2D poses
Yatian Pang, Bin Zhu, Bin Lin, Mingzhe Zheng, Francis EH Tay, Ser-Nam Lim, Harry Yang, and Li Yuan. DreamDance: Animating human images by enriching 3D geometry cues from 2D poses. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14039–14050, 2025
2025
-
[52]
One- step image translation with text-to-image models,
Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, and Jun-Yan Zhu. One-step image translation with text-to-image models.arXiv preprint arXiv:2403.12036, 2024
-
[53]
A benchmark dataset and evaluation methodology for video object segmentation
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 724–732, 2016
2016
-
[54]
WeatherDG: LLM-assisted proce- dural weather generation for domain-generalized semantic segmentation.IEEE Robotics and Automation Letters, 10:5919–5926, 2025
Chenghao Qian, Yuhu Guo, Yuhong Mo, and Wenjing Li. WeatherDG: LLM-assisted proce- dural weather generation for domain-generalized semantic segmentation.IEEE Robotics and Automation Letters, 10:5919–5926, 2025
2025
-
[55]
Chenghao Qian, Wenjing Li, Yuhu Guo, and Gustav Markkula. WeatherEdit: Controllable weather editing with 4D gaussian field.arXiv preprint arXiv:2505.20471, 2025. 15
-
[56]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021
2021
-
[57]
ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10765–10775, 2021
2021
-
[58]
Chen Sang, Yeqiang Qian, Jiale Zhang, Chunxiang Wang, and Ming Yang. Weather-Magician: Reconstruction and rendering framework for 4D weather synthesis in real time.arXiv preprint arXiv:2505.19919, 2025
-
[59]
Victor Schmidt, Alexandra Sasha Luccioni, Mélisande Teng, Tianyu Zhang, Alexia Reynaud, Sunand Raghupathi, Gautier Cosne, Adrien Juraver, Vahe Vardanyan, Alex Hernandez-Garcia, et al. ClimateGAN: Raising climate change awareness by generating images of floods.arXiv preprint arXiv:2110.02871, 2021
-
[60]
A material point method for snow simulation.ACM Transactions on Graphics, 32(4):1–10, 2013
Alexey Stomakhin, Craig Schroeder, Lawrence Chai, Joseph Teran, and Andrew Selle. A material point method for snow simulation.ACM Transactions on Graphics, 32(4):1–10, 2013
2013
-
[61]
Rain rendering for evaluating and improving robustness to bad weather.International Journal of Computer Vision, 129(2):341–360, 2021
Maxime Tremblay, Shirsendu Sukanta Halder, Raoul De Charette, and Jean-François Lalonde. Rain rendering for evaluating and improving robustness to bad weather.International Journal of Computer Vision, 129(2):341–360, 2021
2021
-
[62]
VideoDirector: Precise video editing via text-to-video models
Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, and Yulan Guo. VideoDirector: Precise video editing via text-to-video models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2589–2598, 2025
2025
-
[63]
Weathercity: Urban scene reconstruction with controllable multi-weather transformation
Wenhua Wu, Huai Guan, Zhe Liu, and Hesheng Wang. Weathercity: Urban scene reconstruction with controllable multi-weather transformation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 40949–40958, 2026
2026
-
[64]
PandaSet: Advanced sensor suite dataset for autonomous driving
Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. PandaSet: Advanced sensor suite dataset for autonomous driving. InIEEE International Intelligent Transportation Systems Conference, pages 3095–3101, 2021
2021
-
[65]
Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives
Shaoyuan Xie, Lingdong Kong, Yuhao Dong, Chonghao Sima, Wenwei Zhang, Qi Alfred Chen, Ziwei Liu, and Liang Pan. Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6585–6597, 2025
2025
-
[66]
Benchmarking and improving bird’s eye view perception robustness in autonomous driving
Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Benchmarking and improving bird’s eye view perception robustness in autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3878–3894, 2025
2025
-
[67]
Yuezhen Xie, Meiying Zhang, and Qi Hao. ClimateGS: Real-time climate simulation with 3D gaussian style transfer.arXiv preprint arXiv:2503.14845, 2025
-
[68]
Make-your-video: Customized video generation using textual and structural guidance.IEEE Transactions on Visualization and Computer Graphics, 31(2):1526–1541, 2024
Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, et al. Make-your-video: Customized video generation using textual and structural guidance.IEEE Transactions on Visualization and Computer Graphics, 31(2):1526–1541, 2024
2024
-
[69]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. CogVideoX: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024. 16
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.