A Trajectory-Driven Spatio-Temporal Refinement Solution for CVPR 2026 8th UG2+ Challenge Track 3: DOST

Fengjie Zhu; Hongzhen Li; Leilei Cao; Miao Yu; Yingfang Zhu; Youwei Pan

arxiv: 2606.00522 · v2 · pith:KLZOE323new · submitted 2026-05-30 · 💻 cs.CV

A Trajectory-Driven Spatio-Temporal Refinement Solution for CVPR 2026 8th UG2+ Challenge Track 3: DOST

Hongzhen Li , Miao Yu , Leilei Cao , Youwei Pan , Yingfang Zhu , Fengjie Zhu This is my paper

Pith reviewed 2026-06-28 18:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords dynamic object segmentationatmospheric turbulencevideo segmentationdomain adaptationspatio-temporal refinementchallenge solution

0 comments

The pith

Data expansion with simulated turbulence plus trajectory-based cleanup lifts a motion segmentation baseline to second place on the DOST challenge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work describes a practical pipeline for segmenting moving objects in video sequences corrupted by atmospheric turbulence. It starts from an existing motion-aware segmentation model and augments its training set by adding selected clean videos that have been synthetically degraded. A separate post-processing stage then examines object trajectories across frames to discard persistent false positives attached to boundaries and fleeting noise fragments while retaining genuine small targets and their per-frame identities. These two additions together produce the second-ranked entry in the challenge track.

Core claim

Our method builds upon the Segment Any Motion framework by expanding the training distribution through selected DAVIS sequences subjected to simulated atmospheric degradations and by adding a spatio-temporal post-processing module that removes boundary-connected false foregrounds and short-lived fragmented noise while strictly preserving genuine small targets and maintaining original individual labels across frames.

What carries the argument

The spatio-temporal post-processing module that uses trajectory information to filter masks by persistence and boundary connectivity.

If this is right

Training on a mixture of clean and synthetically degraded sequences increases robustness to geometric distortions caused by turbulence.
Trajectory-aware cleanup removes persistent false positives without erasing small moving objects.
The same post-processing rules preserve per-object identity labels across time.
The overall pipeline achieves second place among submitted solutions on the DOST benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach implies that targeted post-processing can compensate for residual errors that remain after domain-adapted training.
Similar trajectory filtering may be useful in other video tasks where noise produces short-lived or boundary-attached artifacts.
Performance gains may shrink if real turbulence statistics differ substantially from the chosen simulation method.

Load-bearing premise

The chosen clean videos, the particular simulated degradations, and the hand-designed cleaning rules will transfer to the hidden test videos without overfitting to the public leaderboard.

What would settle it

Running the trained model on a fresh collection of real turbulence videos recorded under different conditions and measuring whether the reported ranking holds.

Figures

Figures reproduced from arXiv: 2606.00522 by Fengjie Zhu, Hongzhen Li, Leilei Cao, Miao Yu, Yingfang Zhu, Youwei Pan.

read the original abstract

In this work, we present our solution for the 8th UG2+ Challenge (CVPR 2026) Track 3: Dynamic Object Segmentation in Turbulence (DOST). Our method is built upon the strong baseline framework Segment Any Motion (SegAnyMo), which provides powerful mask generation and motion tracking capabilities. To further boost the segmentation performance under severe atmospheric distortions, we propose two key improvements. First, we employ a data-centric domain adaptation strategy. We significantly expand our training data by incorporating selected sequences from the DAVIS dataset alongside a subset of the DOST dataset, and apply simulated atmospheric fluctuation degradations to enhance the model's robustness against complex geometric distortions. Second, we introduce a spatio-temporal post-processing module. This refinement step effectively removes persistent boundary-connected false foregrounds and short-lived fragmented noise, while strictly preserving genuine small targets and maintaining original individual labels across frames. With these combined strategies, our proposed method ranks the 2st place in the challenge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a routine competition report that extends SegAnyMo with data mixing and heuristic post-processing but supplies zero metrics or ablations to back the ranking claim.

read the letter

The paper is a solution write-up for the DOST track. It starts from the cited SegAnyMo baseline, adds selected DAVIS sequences plus simulated turbulence degradations for training, and applies a post-processing module that drops boundary-connected false positives and short fragments while keeping small targets and label consistency across frames. The description of these steps is clear and practical.

Nothing in the method is new. The data expansion is ordinary domain adaptation, the simulation is not detailed, and the cleaning rules are hand-designed heuristics rather than a learned component. The only result reported is the external 2nd-place ranking.

The central weakness is the total absence of evidence. There are no IoU numbers, no baseline comparison on the same data, no ablation on the added DAVIS data or the post-processor, and no error analysis. The ranking therefore cannot be attributed to the described changes versus leaderboard tuning. The stress-test point about unverified generalization of the simulation parameters and rule thresholds is accurate on the text given.

This work is useful only to other teams trying to reproduce an entry in the same narrow challenge. It adds no reusable technique or insight for the rest of computer vision. I would not bring it to a reading group, would not cite it, and would not send it for peer review; it belongs as a short challenge report rather than a research paper.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a solution for Track 3 (Dynamic Object Segmentation in Turbulence) of the 8th UG2+ Challenge at CVPR 2026. It builds on the Segment Any Motion (SegAnyMo) baseline by adding a data-centric domain adaptation step that augments training data with selected DAVIS sequences and a DOST subset subjected to simulated atmospheric degradations, plus a spatio-temporal post-processing module that removes boundary-connected false foregrounds and short-lived fragments while preserving small targets and label consistency. The work claims a 2nd-place ranking on the challenge leaderboard.

Significance. If the ranking can be attributed to the proposed adaptations rather than leaderboard feedback and if the simulation and rule-based refinement generalize, the combination of domain adaptation via simulated turbulence and lightweight spatio-temporal cleanup could offer a practical route to improving motion-based segmentation under atmospheric distortion. The approach reuses an existing strong baseline and public datasets, which is efficient, but the manuscript supplies no internal evidence that would allow readers to assess whether these components drive the reported ranking.

major comments (2)

[Abstract] Abstract: The central performance claim (2nd-place ranking) is stated without any quantitative metrics, ablation tables, error analysis, or comparison against the SegAnyMo baseline, so the contribution of the domain-adaptation strategy and the post-processing module cannot be evaluated from the manuscript itself.
[Abstract] Abstract: The atmospheric degradation simulation model, its parameters, and the exact rules/thresholds of the spatio-temporal refinement module are not described; without these details the claim that the method improves robustness to turbulence and generalizes to the hidden test set remains untestable.

minor comments (1)

[Abstract] Abstract: Typo in the ranking statement ('ranks the 2st place' should read 'ranks 2nd place').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our challenge solution manuscript. We address the major comments point-by-point below and will revise the paper to incorporate additional details and evidence as noted.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim (2nd-place ranking) is stated without any quantitative metrics, ablation tables, error analysis, or comparison against the SegAnyMo baseline, so the contribution of the domain-adaptation strategy and the post-processing module cannot be evaluated from the manuscript itself.

Authors: We agree that the abstract and manuscript lack quantitative metrics, ablations, or direct comparisons to the SegAnyMo baseline, which limits the ability to isolate the contribution of our adaptations. As this is a concise challenge report, the leaderboard ranking serves as the primary outcome metric. To strengthen the paper, we will add a results section with baseline comparisons and any available internal metrics in the revised manuscript. revision: yes
Referee: [Abstract] Abstract: The atmospheric degradation simulation model, its parameters, and the exact rules/thresholds of the spatio-temporal refinement module are not described; without these details the claim that the method improves robustness to turbulence and generalizes to the hidden test set remains untestable.

Authors: We acknowledge that the current manuscript does not provide the specific parameters of the atmospheric degradation simulation or the exact rules and thresholds used in the spatio-temporal refinement module. We will expand the method description in the revised version to include these implementation details, enabling reproducibility and evaluation of the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; ranking anchored to external challenge evaluation.

full rationale

The paper presents an engineering solution extending the SegAnyMo baseline via domain adaptation (DAVIS + DOST sequences with simulated degradations) and a hand-designed spatio-temporal post-processing module. The sole quantitative claim (2nd-place ranking) is reported from the external UG2+ challenge leaderboard rather than derived from any internal equations, fitted parameters, or self-referential predictions. No self-citations, uniqueness theorems, ansatzes, or renamings appear; the derivation chain does not reduce to its inputs by construction and remains self-contained against the external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unexamined reliability of the SegAnyMo baseline and the challenge test set; no new free parameters, axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption SegAnyMo already supplies strong mask generation and motion tracking
Invoked in the first sentence of the abstract as the foundation for all subsequent improvements.

pith-pipeline@v0.9.1-grok · 5726 in / 1028 out tokens · 19562 ms · 2026-06-28T18:52:13.718093+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 3 canonical work pages · 3 internal anchors

[1]

Bootstap: Bootstrapped training for tracking-any-point

Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, Joao Carreira, et al. Bootstap: Bootstrapped training for tracking-any-point. InProceedings of the Asian Conference on Computer Vision, pages 3257–3274, 2024. 3

2024
[2]

Segment any motion in videos

Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, and Qianqian Wang. Segment any motion in videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3406–3416, 2025. 1

2025
[3]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

The 2017 DAVIS Challenge on Video Object Segmentation

Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Ar- bel´aez, Alexander Sorkine-Hornung, and Luc Van Gool. The 2017 davis challenge on video object segmentation. arXiv:1704.00675, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Unsupervised moving object segmentation with atmospheric turbulence

Dehao Qin, Ripon Kumar Saha, Woojeh Chung, Suren Jaya- suriya, Jinwei Ye, and Nianyi Li. Unsupervised moving object segmentation with atmospheric turbulence. InEuropean Con- ference on Computer Vision, pages 18–37. Springer, 2024. 1

2024
[6]

Sam 2: Segment any- thing in images and videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R ¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment any- thing in images and videos. InInternational Conference on Learning Representations, pages 28085–28128, 2025. 1

2025
[7]

YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark

Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas Huang. Youtube-vos: A large-scale video object segmentation benchmark.arXiv preprint arXiv:1809.03327, 2018. 3

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024. 3 4

2024

[1] [1]

Bootstap: Bootstrapped training for tracking-any-point

Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, Joao Carreira, et al. Bootstap: Bootstrapped training for tracking-any-point. InProceedings of the Asian Conference on Computer Vision, pages 3257–3274, 2024. 3

2024

[2] [2]

Segment any motion in videos

Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, and Qianqian Wang. Segment any motion in videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3406–3416, 2025. 1

2025

[3] [3]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

The 2017 DAVIS Challenge on Video Object Segmentation

Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Ar- bel´aez, Alexander Sorkine-Hornung, and Luc Van Gool. The 2017 davis challenge on video object segmentation. arXiv:1704.00675, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Unsupervised moving object segmentation with atmospheric turbulence

Dehao Qin, Ripon Kumar Saha, Woojeh Chung, Suren Jaya- suriya, Jinwei Ye, and Nianyi Li. Unsupervised moving object segmentation with atmospheric turbulence. InEuropean Con- ference on Computer Vision, pages 18–37. Springer, 2024. 1

2024

[6] [6]

Sam 2: Segment any- thing in images and videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R ¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment any- thing in images and videos. InInternational Conference on Learning Representations, pages 28085–28128, 2025. 1

2025

[7] [7]

YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark

Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas Huang. Youtube-vos: A large-scale video object segmentation benchmark.arXiv preprint arXiv:1809.03327, 2018. 3

work page internal anchor Pith review Pith/arXiv arXiv 2018

[8] [8]

Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37: 21875–21911, 2024. 3 4

2024