A Trajectory-Driven Spatio-Temporal Refinement Solution for CVPR 2026 8th UG2+ Challenge Track 3: DOST
Pith reviewed 2026-06-28 18:52 UTC · model grok-4.3
The pith
Data expansion with simulated turbulence plus trajectory-based cleanup lifts a motion segmentation baseline to second place on the DOST challenge.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our method builds upon the Segment Any Motion framework by expanding the training distribution through selected DAVIS sequences subjected to simulated atmospheric degradations and by adding a spatio-temporal post-processing module that removes boundary-connected false foregrounds and short-lived fragmented noise while strictly preserving genuine small targets and maintaining original individual labels across frames.
What carries the argument
The spatio-temporal post-processing module that uses trajectory information to filter masks by persistence and boundary connectivity.
If this is right
- Training on a mixture of clean and synthetically degraded sequences increases robustness to geometric distortions caused by turbulence.
- Trajectory-aware cleanup removes persistent false positives without erasing small moving objects.
- The same post-processing rules preserve per-object identity labels across time.
- The overall pipeline achieves second place among submitted solutions on the DOST benchmark.
Where Pith is reading between the lines
- The approach implies that targeted post-processing can compensate for residual errors that remain after domain-adapted training.
- Similar trajectory filtering may be useful in other video tasks where noise produces short-lived or boundary-attached artifacts.
- Performance gains may shrink if real turbulence statistics differ substantially from the chosen simulation method.
Load-bearing premise
The chosen clean videos, the particular simulated degradations, and the hand-designed cleaning rules will transfer to the hidden test videos without overfitting to the public leaderboard.
What would settle it
Running the trained model on a fresh collection of real turbulence videos recorded under different conditions and measuring whether the reported ranking holds.
Figures
read the original abstract
In this work, we present our solution for the 8th UG2+ Challenge (CVPR 2026) Track 3: Dynamic Object Segmentation in Turbulence (DOST). Our method is built upon the strong baseline framework Segment Any Motion (SegAnyMo), which provides powerful mask generation and motion tracking capabilities. To further boost the segmentation performance under severe atmospheric distortions, we propose two key improvements. First, we employ a data-centric domain adaptation strategy. We significantly expand our training data by incorporating selected sequences from the DAVIS dataset alongside a subset of the DOST dataset, and apply simulated atmospheric fluctuation degradations to enhance the model's robustness against complex geometric distortions. Second, we introduce a spatio-temporal post-processing module. This refinement step effectively removes persistent boundary-connected false foregrounds and short-lived fragmented noise, while strictly preserving genuine small targets and maintaining original individual labels across frames. With these combined strategies, our proposed method ranks the 2st place in the challenge.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a solution for Track 3 (Dynamic Object Segmentation in Turbulence) of the 8th UG2+ Challenge at CVPR 2026. It builds on the Segment Any Motion (SegAnyMo) baseline by adding a data-centric domain adaptation step that augments training data with selected DAVIS sequences and a DOST subset subjected to simulated atmospheric degradations, plus a spatio-temporal post-processing module that removes boundary-connected false foregrounds and short-lived fragments while preserving small targets and label consistency. The work claims a 2nd-place ranking on the challenge leaderboard.
Significance. If the ranking can be attributed to the proposed adaptations rather than leaderboard feedback and if the simulation and rule-based refinement generalize, the combination of domain adaptation via simulated turbulence and lightweight spatio-temporal cleanup could offer a practical route to improving motion-based segmentation under atmospheric distortion. The approach reuses an existing strong baseline and public datasets, which is efficient, but the manuscript supplies no internal evidence that would allow readers to assess whether these components drive the reported ranking.
major comments (2)
- [Abstract] Abstract: The central performance claim (2nd-place ranking) is stated without any quantitative metrics, ablation tables, error analysis, or comparison against the SegAnyMo baseline, so the contribution of the domain-adaptation strategy and the post-processing module cannot be evaluated from the manuscript itself.
- [Abstract] Abstract: The atmospheric degradation simulation model, its parameters, and the exact rules/thresholds of the spatio-temporal refinement module are not described; without these details the claim that the method improves robustness to turbulence and generalizes to the hidden test set remains untestable.
minor comments (1)
- [Abstract] Abstract: Typo in the ranking statement ('ranks the 2st place' should read 'ranks 2nd place').
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our challenge solution manuscript. We address the major comments point-by-point below and will revise the paper to incorporate additional details and evidence as noted.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claim (2nd-place ranking) is stated without any quantitative metrics, ablation tables, error analysis, or comparison against the SegAnyMo baseline, so the contribution of the domain-adaptation strategy and the post-processing module cannot be evaluated from the manuscript itself.
Authors: We agree that the abstract and manuscript lack quantitative metrics, ablations, or direct comparisons to the SegAnyMo baseline, which limits the ability to isolate the contribution of our adaptations. As this is a concise challenge report, the leaderboard ranking serves as the primary outcome metric. To strengthen the paper, we will add a results section with baseline comparisons and any available internal metrics in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: The atmospheric degradation simulation model, its parameters, and the exact rules/thresholds of the spatio-temporal refinement module are not described; without these details the claim that the method improves robustness to turbulence and generalizes to the hidden test set remains untestable.
Authors: We acknowledge that the current manuscript does not provide the specific parameters of the atmospheric degradation simulation or the exact rules and thresholds used in the spatio-temporal refinement module. We will expand the method description in the revised version to include these implementation details, enabling reproducibility and evaluation of the claims. revision: yes
Circularity Check
No significant circularity; ranking anchored to external challenge evaluation.
full rationale
The paper presents an engineering solution extending the SegAnyMo baseline via domain adaptation (DAVIS + DOST sequences with simulated degradations) and a hand-designed spatio-temporal post-processing module. The sole quantitative claim (2nd-place ranking) is reported from the external UG2+ challenge leaderboard rather than derived from any internal equations, fitted parameters, or self-referential predictions. No self-citations, uniqueness theorems, ansatzes, or renamings appear; the derivation chain does not reduce to its inputs by construction and remains self-contained against the external benchmark.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption SegAnyMo already supplies strong mask generation and motion tracking
Reference graph
Works this paper leans on
-
[1]
Bootstap: Bootstrapped training for tracking-any-point
Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, Joao Carreira, et al. Bootstap: Bootstrapped training for tracking-any-point. InProceedings of the Asian Conference on Computer Vision, pages 3257–3274, 2024. 3
2024
-
[2]
Segment any motion in videos
Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, and Qianqian Wang. Segment any motion in videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3406–3416, 2025. 1
2025
-
[3]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
The 2017 DAVIS Challenge on Video Object Segmentation
Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Ar- bel´aez, Alexander Sorkine-Hornung, and Luc Van Gool. The 2017 davis challenge on video object segmentation. arXiv:1704.00675, 2017. 1
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Unsupervised moving object segmentation with atmospheric turbulence
Dehao Qin, Ripon Kumar Saha, Woojeh Chung, Suren Jaya- suriya, Jinwei Ye, and Nianyi Li. Unsupervised moving object segmentation with atmospheric turbulence. InEuropean Con- ference on Computer Vision, pages 18–37. Springer, 2024. 1
2024
-
[6]
Sam 2: Segment any- thing in images and videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R ¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment any- thing in images and videos. InInternational Conference on Learning Representations, pages 28085–28128, 2025. 1
2025
-
[7]
YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark
Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas Huang. Youtube-vos: A large-scale video object segmentation benchmark.arXiv preprint arXiv:1809.03327, 2018. 3
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024. 3 4
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.