OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback
Pith reviewed 2026-05-18 01:51 UTC · model grok-4.3
The pith
OmniTrack++ refines panoramic multi-object tracking by feeding trajectory cues back to stabilize features and associations under 360-degree distortion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OmniTrack++ adopts a feedback-driven framework that progressively refines perception with trajectory cues to address panoramic distortion, large search space, and identity ambiguity under a 360 degree FoV. A DynamicSSM block first stabilizes panoramic features, implicitly alleviating geometric distortion. On top of normalized representations, FlexiTrack Instances use trajectory-informed feedback for flexible localization and reliable short-term association. To ensure long-term robustness, an ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design, enabling recovery from fragmented tracks and reducing identity drift. Finally, a Tracklet Management module adaptively sw
What carries the argument
The feedback-driven framework that progressively refines perception by injecting trajectory cues into feature stabilization, localization, and long-term memory consolidation.
If this is right
- Stabilized panoramic features reduce the impact of geometric distortion on subsequent detection and association steps.
- Trajectory-informed feedback in FlexiTrack Instances improves short-term localization accuracy in large search regions.
- ExpertTrack Memory with mixture-of-experts design recovers from track fragmentation and lowers long-term identity drift.
- Adaptive switching in Tracklet Management balances accuracy and efficiency across varying scene dynamics.
- The EmboTrack benchmark with QuadTrack and BipTrack sequences provides a testbed that spans diverse robotic motion patterns.
Where Pith is reading between the lines
- Similar trajectory feedback loops could be tested on fisheye or other non-panoramic wide-field cameras used in vehicles or drones.
- The mixture-of-experts memory structure might be adapted to handle frequent occlusions in crowded indoor scenes.
- Combining the adaptive mode switch with depth or motion sensors could further reduce errors during rapid robot turns.
- Long-term memory consolidation may offer gains in multi-camera setups where tracks cross between overlapping 360-degree views.
Load-bearing premise
Trajectory cues from earlier tracks can be used to refine current features and associations without accumulating errors or causing identity switches when panoramic distortion and large search spaces are present.
What would settle it
Ablating the DynamicSSM block or the ExpertTrack Memory on JRDB or EmboTrack and measuring whether HOTA scores fall or identity switches rise relative to the full OmniTrack++ model.
Figures
read the original abstract
To address panoramic distortion, large search space, and identity ambiguity under a 360{\deg} FoV, OmniTrack++ adopts a feedback-driven framework that progressively refines perception with trajectory cues. A DynamicSSM block first stabilizes panoramic features, implicitly alleviating geometric distortion. On top of normalized representations, FlexiTrack Instances use trajectory-informed feedback for flexible localization and reliable short-term association. To ensure long-term robustness, an ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design, enabling recovery from fragmented tracks and reducing identity drift. Finally, a Tracklet Management module adaptively switches between end-to-end and tracking-by-detection modes according to scene dynamics, offering a balanced and scalable solution for panoramic MOT. To support rigorous evaluation, we establish the EmboTrack benchmark, a comprehensive dataset for panoramic MOT that includes QuadTrack, captured with a quadruped robot, and BipTrack, collected with a bipedal wheel-legged robot. Together, these datasets span wide-angle environments and diverse motion patterns, providing a challenging testbed for real-world panoramic perception. Extensive experiments on JRDB and EmboTrack demonstrate that OmniTrack++ achieves state-of-the-art performance, yielding substantial HOTA improvements of +3.94 on JRDB and +15.03 on QuadTrack over the original OmniTrack. These results highlight the effectiveness of trajectory-informed feedback, adaptive paradigm switching, and robust long-term memory in advancing panoramic multi-object tracking. Datasets and code will be made available at https://github.com/xifen523/OmniTrack.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents OmniTrack++, a feedback-driven framework for omnidirectional multi-object tracking that addresses panoramic distortion, large search spaces, and identity ambiguity under 360° FoV. Key components include a DynamicSSM block for stabilizing panoramic features, FlexiTrack Instances that use trajectory-informed feedback for localization and short-term association, an ExpertTrack Memory module employing a Mixture-of-Experts design to consolidate appearance cues and recover from fragmented tracks, and a Tracklet Management module for adaptive switching between end-to-end and tracking-by-detection modes. The authors introduce the EmboTrack benchmark (including QuadTrack and BipTrack datasets) and report state-of-the-art results with HOTA improvements of +3.94 on JRDB and +15.03 on QuadTrack over the original OmniTrack.
Significance. If the empirical claims hold under rigorous verification, this work advances panoramic MOT by demonstrating how trajectory feedback can refine perception in challenging wide-FoV settings. The EmboTrack benchmark is a useful addition for evaluating methods on robotic platforms with diverse motion patterns. The Mixture-of-Experts approach in ExpertTrack Memory offers a plausible mechanism for long-term robustness. However, the overall significance is limited by insufficient experimental detail, which prevents full assessment of whether the reported gains generalize or stem from the proposed feedback mechanisms.
major comments (2)
- [Section 4] Section 4 (Experiments): The SOTA claims rest on HOTA deltas of +3.94 on JRDB and +15.03 on QuadTrack, yet the section provides no details on the exact baselines, per-component ablation studies (e.g., isolating DynamicSSM or ExpertTrack Memory), error bars, dataset splits, or statistical significance tests. This makes it impossible to verify that the gains arise from the trajectory-informed feedback rather than implementation choices or post-hoc tuning.
- [Section 3.3] Section 3.3 (ExpertTrack Memory): The description of the Mixture-of-Experts design for recovering from fragmented tracks and reducing identity drift lacks any analysis of error propagation in the feedback loop, per-component identity-switch rates versus track duration, or quantification of how panoramic distortion impacts the normalized representations used by FlexiTrack. Without such analysis, the central premise that the memory module reliably avoids cumulative errors under large-FoV fragmentation remains untested.
minor comments (2)
- [Abstract] The abstract states that datasets and code will be released at a GitHub link, but the manuscript should explicitly confirm in the experiments section that the link is active and includes the EmboTrack data splits used for the reported results.
- [Section 3] Notation for components (DynamicSSM, FlexiTrack, ExpertTrack Memory) would benefit from a single overview figure or pseudocode algorithm in Section 3 to clarify data flow between the feedback modules.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight areas where additional detail can strengthen the presentation of our results. We address each major comment below and commit to incorporating the suggested analyses and clarifications in the revised manuscript.
read point-by-point responses
-
Referee: [Section 4] Section 4 (Experiments): The SOTA claims rest on HOTA deltas of +3.94 on JRDB and +15.03 on QuadTrack, yet the section provides no details on the exact baselines, per-component ablation studies (e.g., isolating DynamicSSM or ExpertTrack Memory), error bars, dataset splits, or statistical significance tests. This makes it impossible to verify that the gains arise from the trajectory-informed feedback rather than implementation choices or post-hoc tuning.
Authors: We agree that the current experimental section lacks the granularity needed for full verification. In the revised manuscript we will expand Section 4 to provide: explicit implementation details and hyper-parameters for all baselines; comprehensive per-component ablations that isolate DynamicSSM, FlexiTrack Instances, ExpertTrack Memory, and Tracklet Management; error bars obtained from multiple independent runs; precise descriptions of the training/validation/test splits on JRDB and EmboTrack; and statistical significance tests (e.g., paired t-tests) on the reported HOTA improvements. These additions will allow readers to confirm that the observed gains originate from the trajectory-feedback mechanisms rather than other factors. revision: yes
-
Referee: [Section 3.3] Section 3.3 (ExpertTrack Memory): The description of the Mixture-of-Experts design for recovering from fragmented tracks and reducing identity drift lacks any analysis of error propagation in the feedback loop, per-component identity-switch rates versus track duration, or quantification of how panoramic distortion impacts the normalized representations used by FlexiTrack. Without such analysis, the central premise that the memory module reliably avoids cumulative errors under large-FoV fragmentation remains untested.
Authors: We concur that further empirical analysis is required to substantiate the robustness claims for ExpertTrack Memory. In the revision we will augment Section 3.3 and the experimental results with: a study of error propagation through the feedback loop; identity-switch rates broken down by track duration and by component; and quantitative measurements (together with qualitative examples) of how panoramic distortion affects the normalized representations inside FlexiTrack, demonstrating the stabilizing effect of the Mixture-of-Experts design. These analyses will directly test the premise that the memory module limits cumulative errors under large-FoV fragmentation. revision: yes
Circularity Check
No circularity; empirical claims rest on external benchmarks and prior method comparisons
full rationale
The manuscript describes an algorithmic pipeline (DynamicSSM for feature stabilization, FlexiTrack for trajectory-informed localization, ExpertTrack Memory with Mixture-of-Experts, and adaptive Tracklet Management) evaluated on JRDB and the newly introduced EmboTrack (QuadTrack/BipTrack) datasets. Reported HOTA gains (+3.94 on JRDB, +15.03 on QuadTrack) are presented as direct experimental outcomes against the original OmniTrack baseline. No equations, fitted parameters, or predictions are defined in terms of the target metrics; no self-citation chain is invoked to justify uniqueness or force a result; the derivation chain consists of design choices validated externally rather than reducing to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Trajectory cues from prior detections provide reliable feedback that can progressively refine perception, localization, and association under 360-degree distortion.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A DynamicSSM block first stabilizes panoramic features... ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design... Tracklet Management module adaptively switches between end-to-end and tracking-by-detection modes
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experiments on JRDB and EmboTrack... HOTA improvements of +3.94 on JRDB and +15.03 on QuadTrack
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
H. Ai, Z. Cao, and L. Wang, “A survey of representation learning, optimization strategies, and applications for omnidirectional vision,” International Journal of Computer Vision, 2025
work page 2025
-
[2]
Spherical DNNs and their applications in 360° images and videos,
Y . Xu, Z. Zhang, and S. Gao, “Spherical DNNs and their applications in 360° images and videos,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
work page 2022
-
[3]
One flight over the gap: A survey from perspective to panoramic vision,
X. Lin, X. Ge, D. Zhang, Z. Wan, X. Wang, X. Li, W. Jiang, B. Du, D. Tao, M.-H. Yang, and L. Qi, “One flight over the gap: A survey from perspective to panoramic vision,”arXiv preprint arXiv:2509.04444, 2025
-
[4]
Panacea: Panoramic and controllable video generation for autonomous driving,
Y . Wen, Y . Zhao, Y . Liu, F. Jia, Y . Wang, C. Luo, C. Zhang, T. Wang, X. Sun, and X. Zhang, “Panacea: Panoramic and controllable video generation for autonomous driving,” inCVPR, 2024
work page 2024
-
[5]
Occlusion-aware seamless segmentation,
Y . Cao, J. Zhang, H. Shi, K. Peng, Y . Zhang, H. Zhang, R. Stiefelhagen, and K. Yang, “Occlusion-aware seamless segmentation,” inECCV, 2024
work page 2024
-
[6]
Visual route following for tiny autonomous robots,
T. van Dijk, C. D. Wagter, and G. C. H. E. de Croon, “Visual route following for tiny autonomous robots,”Science Robotics, 2024. 16
work page 2024
-
[7]
PanoFlow: Learning 360° optical flow for surrounding temporal understanding,
H. Shi, Y . Zhou, K. Yang, X. Yin, Z. Wang, Y . Ye, Z. Yin, S. Meng, P. Li, and K. Wang, “PanoFlow: Learning 360° optical flow for surrounding temporal understanding,”IEEE Transactions on Intelligent Transporta- tion Systems, 2023
work page 2023
-
[8]
Z. Wu, L. Zhao, G. Liu, J. Chai, J. Huang, and X. Ai, “The effect of AR- HUD takeover assistance types on driver situation awareness in highly automated driving: A 360-degree panorama experiment,”International Journal of Human-Computer Interaction, 2024
work page 2024
-
[9]
Panoramic human activity recognition,
R. Han, H. Yan, J. Li, S. Wang, W. Feng, and S. Wang, “Panoramic human activity recognition,” inECCV, 2022
work page 2022
-
[10]
HumanoidPano: Hybrid spherical panoramic- LiDAR cross-modal perception for humanoid robots,
Q. Zhang, Z. Zhang, W. Cui, J. Sun, J. Cao, Y . Guo, G. Han, W. Zhao, J. Wang, C. Sun, L. Zhang, H. Cheng, Y . Chen, L. Wang, J. Tang, and R. Xu, “HumanoidPano: Hybrid spherical panoramic- LiDAR cross-modal perception for humanoid robots,”arXiv preprint arXiv:2503.09010, 2025
-
[11]
J. Li, Z. Liu, X. Xu, J. Liu, S. Yuan, F. Xu, and L. Xie, “LiMo-Calib: On- site fast LiDAR-motor calibration for quadruped robot-based panoramic 3D sensing system,”arXiv preprint arXiv:2502.12655, 2025
-
[12]
360Loc: A dataset and benchmark for omnidirectional visual local- ization with cross-device queries,
H. Huang, C. Liu, Y . Zhu, H. Cheng, T. Braud, and S.-K. Yeung, “360Loc: A dataset and benchmark for omnidirectional visual local- ization with cross-device queries,” inCVPR, 2024
work page 2024
-
[13]
360VOT: A new benchmark dataset for omnidirectional visual object tracking,
H. Huang, Y . Xu, Y . Chen, and S.-K. Yeung, “360VOT: A new benchmark dataset for omnidirectional visual object tracking,” inICCV, 2023
work page 2023
-
[14]
Simple online and realtime tracking,
A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” inICIP, 2016
work page 2016
-
[15]
Simple online and realtime tracking with a deep association metric,
N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” inICIP, 2017
work page 2017
-
[16]
Beyond MOT: Semantic multi-object tracking,
Y . Li, Q. Li, H. Wang, X. Ma, J. Yao, S. Dong, H. Fan, and L. Zhang, “Beyond MOT: Semantic multi-object tracking,” inECCV, 2024
work page 2024
-
[17]
S. Luo, W. Chen, W. Tian, R. Liu, L. Hou, X. Zhang, H. Shen, R. Wu, S. Geng, Y . Zhou, L. Shao, Y . Yang, B. Gao, Q. Li, and G. Wu, “Delving into multi-modal multi-task foundation models for road scene understanding: From learning paradigm perspectives,”IEEE Transactions on Intelligent Vehicles, 2024
work page 2024
-
[18]
USVTrack: A benchmark for multi-object tracking in complex water surface scenes,
B. Xue, Y . Cheng, K. Ding, C. Pan, and S. Xiang, “USVTrack: A benchmark for multi-object tracking in complex water surface scenes,” IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[19]
Y . Wang, Y . Qing, K. Huang, C. Dang, and Z. Wu, “Preformer MOT: A transformer-based approach for multi-object tracking with global trajectory prediction,”Fundamental Research, 2025
work page 2025
-
[20]
MotionTrack: Learning robust short-term and long-term motions for multi-object tracking,
Z. Qin, S. Zhou, L. Wang, J. Duan, G. Hua, and W. Tang, “MotionTrack: Learning robust short-term and long-term motions for multi-object tracking,” inCVPR, 2023
work page 2023
-
[21]
PNAS-MOT: Multi-modal object tracking with pareto neural architecture search,
C. Peng, Z. Zeng, J. Gao, J. Zhou, M. Tomizuka, X. Wang, C. Zhou, and N. Ye, “PNAS-MOT: Multi-modal object tracking with pareto neural architecture search,”IEEE Robotics and Automation Letters, 2024
work page 2024
-
[22]
Temporal task and motion planning with metric time for multiple object navigation,
E. Tosello, A. Valentini, and A. Micheli, “Temporal task and motion planning with metric time for multiple object navigation,” inAAAI, 2025
work page 2025
-
[23]
Planning- oriented autonomous driving,
Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y . Qiao, and H. Li, “Planning- oriented autonomous driving,” inCVPR, 2023
work page 2023
-
[24]
Delving into the trajectory long-tail distribution for muti-object tracking,
S. Chen, E. Yu, J. Li, and W. Tao, “Delving into the trajectory long-tail distribution for muti-object tracking,” inCVPR, 2024
work page 2024
-
[25]
DiffMOT: A real-time diffusion-based multiple object tracker with non-linear prediction,
W. Lv, Y . Huang, N. Zhang, R.-S. Lin, M. Han, and D. Zeng, “DiffMOT: A real-time diffusion-based multiple object tracker with non-linear prediction,” inCVPR, 2024
work page 2024
-
[26]
Multi-object tracking model based on detection tracking paradigm in panoramic scenes,
J. Shen and H. Yang, “Multi-object tracking model based on detection tracking paradigm in panoramic scenes,”Applied Sciences, 2024
work page 2024
-
[27]
MOTR: End-to-end multiple-object tracking with transformer,
F. Zeng, B. Dong, Y . Zhang, T. Wang, X. Zhang, and Y . Wei, “MOTR: End-to-end multiple-object tracking with transformer,” inECCV, 2022
work page 2022
-
[28]
MOTRv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors,
Y . Zhang, T. Wang, and X. Zhang, “MOTRv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors,” inCVPR, 2023
work page 2023
-
[29]
S. Ding, L. Schneider, M. Cordts, and J. Gall, “ADA-Track++: End- to-end multi-camera 3D multi-object tracking with alternating detection and association,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[30]
ByteTrack: Multi-object tracking by associating every detection box,
Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “ByteTrack: Multi-object tracking by associating every detection box,” inECCV, 2022
work page 2022
-
[31]
Hybrid-SORT: Weak cues matter for online multi-object tracking,
M. Yang, G. Han, B. Yan, W. Zhang, J. Qi, H. Lu, and D. Wang, “Hybrid-SORT: Weak cues matter for online multi-object tracking,” in AAAI, 2024
work page 2024
-
[32]
Omnidirectional multi-object tracking,
K. Luo, H. Shi, S. Wu, F. Teng, M. Duan, C. Huang, Y . Wang, K. Wang, and K. Yang, “Omnidirectional multi-object tracking,” inCVPR, 2025
work page 2025
-
[33]
MOT16: A Benchmark for Multi-Object Tracking
A. Milan, L. Leal-Taix ´e, I. D. Reid, S. Roth, and K. Schindler, “MOT16: A benchmark for multi-object tracking,”arXiv preprint arXiv:1603.00831, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[34]
nuScenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Kr- ishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” inCVPR, 2020
work page 2020
-
[35]
MOT20: A benchmark for multi object tracking in crowded scenes
P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. D. Reid, S. Roth, K. Schindler, and L. Leal-Taix ´e, “MOT20: A bench- mark for multi object tracking in crowded scenes,”arXiv preprint arXiv:2003.09003, 2020
-
[36]
SemanticKITTI: A dataset for semantic scene understand- ing of LiDAR sequences,
J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “SemanticKITTI: A dataset for semantic scene understand- ing of LiDAR sequences,” inICCV, 2019
work page 2019
-
[37]
BDD100K: A diverse driving dataset for heterogeneous multitask learning,
F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “BDD100K: A diverse driving dataset for heterogeneous multitask learning,” inCVPR, 2020
work page 2020
-
[38]
SportsMOT: A large multi-object tracking dataset in multiple sports scenes,
Y . Cui, C. Zeng, X. Zhao, Y . Yang, G. Wu, and L. Wang, “SportsMOT: A large multi-object tracking dataset in multiple sports scenes,” in ICCV, 2023
work page 2023
-
[39]
MV A 2025 small multi-object tracking for spotting birds challenge: Dataset, methods, and results,
Y . Kondo, N. Ukita, R. Kanayama, Y . Yoshida, T. Yamaguchi, X. Yu, G. Liang, X. Liu, G. Wang, W. Chu, B. Chuang, J. Lee, P. Kuo, I. Chu, Y . Hsiao, C. Wu, P. Wu, J. Tsou, H. Liu, C. Lee, Y . Yang, K. Shigematsu, A. Shin, and B. Tran, “MV A 2025 small multi-object tracking for spotting birds challenge: Dataset, methods, and results,” in MVA, 2025
work page 2025
-
[40]
360+x: A panoptic multi-modal scene understanding dataset,
H. Chen, Y . Hou, C. Qu, I. Testini, X. Hong, and J. Jiao, “360+x: A panoptic multi-modal scene understanding dataset,” inCVPR, 2024
work page 2024
-
[41]
PanoContext-Former: Panoramic total scene understanding with a transformer,
Y . Dong, C. Fang, L. Bo, Z. Dong, and P. Tan, “PanoContext-Former: Panoramic total scene understanding with a transformer,” inCVPR, 2024
work page 2024
-
[42]
JRDB-Act: A large-scale dataset for spatio-temporal action, social group and activity detection,
M. Ehsanpour, F. S. Saleh, S. Savarese, I. D. Reid, and H. Rezatofighi, “JRDB-Act: A large-scale dataset for spatio-temporal action, social group and activity detection,” inCVPR, 2022
work page 2022
-
[43]
Minimalist and high-quality panoramic imaging with PSF- aware transformers,
Q. Jiang, S. Gao, Y . Gao, K. Yang, Z. Yi, H. Shi, L. Sun, and K. Wang, “Minimalist and high-quality panoramic imaging with PSF- aware transformers,”IEEE Transactions on Image Processing, 2024
work page 2024
-
[44]
PANDORA: A panoramic detection dataset for object with orientation,
H. Xu, Q. Zhao, Y . Ma, X. Li, P. Yuan, B. Feng, C. Yan, and F. Dai, “PANDORA: A panoramic detection dataset for object with orientation,” inECCV, 2022
work page 2022
-
[45]
Spatio-temporal proximity- aware dual-path model for panoramic activity recognition,
S. Lee, Y . Wang, S. Woo, and C. Kim, “Spatio-temporal proximity- aware dual-path model for panoramic activity recognition,” inECCV, 2024
work page 2024
-
[46]
Unified audio-visual saliency model for omnidirectional videos with spatial audio,
D. Zhu, K. Zhang, N. Zhang, Q. Zhou, X. Min, G. Zhai, and X. Yang, “Unified audio-visual saliency model for omnidirectional videos with spatial audio,”IEEE Transactions on Multimedia, 2024
work page 2024
-
[47]
X. Zheng, P. Zhou, A. V . Vasilakos, and L. Wang, “360SFUDA++: Towards source-free UDA for panoramic segmentation by learning reliable category prototypes,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[48]
PanoVOS: Bridging non-panoramic and panoramic views with trans- former for video segmentation,
S. Yan, X. Xu, R. Zhang, L. Hong, W. Chen, W. Zhang, and W. Zhang, “PanoVOS: Bridging non-panoramic and panoramic views with trans- former for video segmentation,” inECCV, 2024
work page 2024
-
[49]
W. Zhang, Y . Liu, X. Zheng, and L. Wang, “GoodSAM: Bridging domain and capacity gaps via segment anything model for distortion- aware panoramic semantic segmentation,” inCVPR, 2024
work page 2024
-
[50]
OmniSAM: Omnidirectional segment anything model for UDA in panoramic semantic segmentation,
D. Zhong, X. Zheng, C. Liao, Y . Lyu, J. Chen, S. Wu, L. Zhang, and X. Hu, “OmniSAM: Omnidirectional segment anything model for UDA in panoramic semantic segmentation,” inICCV, 2025
work page 2025
-
[51]
Multi-source domain adaptation for panoramic semantic segmentation,
J. Jiang, S. Zhao, J. Zhu, W. Tang, Z. Xu, J. Yang, G. Liu, T. Xing, P. Xu, and H. Yao, “Multi-source domain adaptation for panoramic semantic segmentation,”Information Fusion, 2025
work page 2025
-
[52]
GLPanoDepth: Global- to-local panoramic depth estimation,
J. Bai, H. Qin, S. Lai, J. Guo, and Y . Guo, “GLPanoDepth: Global- to-local panoramic depth estimation,”IEEE Transactions on Image Processing, 2024
work page 2024
-
[53]
H. Ai and L. Wang, “Elite360D: Towards efficient 360 depth estimation via semantic-and distance-aware bi-projection fusion,” inCVPR, 2024
work page 2024
-
[54]
BiFuse++: Self-supervised and efficient bi-projection fusion for 360° depth estima- tion,
F.-E. Wang, Y .-H. Yeh, Y .-H. Tsai, W.-C. Chiu, and M. Sun, “BiFuse++: Self-supervised and efficient bi-projection fusion for 360° depth estima- tion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
work page 2023
-
[55]
PanoFormer: Panorama transformer for indoor 360° depth estimation,
Z. Shen, C. Lin, K. Liao, L. Nie, Z. Zheng, and Y . Zhao, “PanoFormer: Panorama transformer for indoor 360° depth estimation,” inECCV, 2022
work page 2022
-
[56]
Depth estimation from indoor panoramas with neural scene representation,
W. Chang, Y . Zhang, and Z. Xiong, “Depth estimation from indoor panoramas with neural scene representation,” inCVPR, 2023
work page 2023
-
[57]
SPDET: Edge-aware self-supervised panoramic depth estimation transformer with spherical 17 geometry,
C. Zhuang, Z. Lu, Y . Wang, J. Xiao, and Y . Wang, “SPDET: Edge-aware self-supervised panoramic depth estimation transformer with spherical 17 geometry,”IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 2023
work page 2023
-
[58]
PanelNet: Understanding 360 indoor environment via panel representation,
H. Yu, L. He, B. Jian, W. Feng, and S. Liu, “PanelNet: Understanding 360 indoor environment via panel representation,” inCVPR, 2023
work page 2023
-
[59]
Z. Shen, Z. Zheng, C. Lin, L. Nie, K. Liao, S. Zheng, and Y . Zhao, “Disentangling orthogonal planes for indoor panoramic room layout estimation with cross-scale distortion awareness,” inCVPR, 2023
work page 2023
-
[60]
PanoSwin: a pano- style swin transformer for panorama understanding,
Z. Ling, Z. Xing, X. Zhou, M. Cao, and G. Zhou, “PanoSwin: a pano- style swin transformer for panorama understanding,” inCVPR, 2023
work page 2023
-
[61]
Z. Shen, C. Lin, J. Zhang, L. Nie, K. Liao, and Y . Zhao, “360 layout estimation via orthogonal planes disentanglement and multi- view geometric consistency perception,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[62]
DreamScene360: Unconstrained text-to-3D scene generation with panoramic gaussian splatting,
S. Zhou, Z. Fan, D. Xu, H. Chang, P. Chari, T. Bharadwaj, S. You, Z. Wang, and A. Kadambi, “DreamScene360: Unconstrained text-to-3D scene generation with panoramic gaussian splatting,” inECCV, 2024
work page 2024
-
[63]
360DVD: Con- trollable panorama video generation with 360-degree video diffusion model,
Q. Wang, W. Li, C. Mou, X. Cheng, and J. Zhang, “360DVD: Con- trollable panorama video generation with 360-degree video diffusion model,” inCVPR, 2024
work page 2024
-
[64]
PanoGen: Text-conditioned panoramic environ- ment generation for vision-and-language navigation,
J. Li and M. Bansal, “PanoGen: Text-conditioned panoramic environ- ment generation for vision-and-language navigation,” inNeurIPS, 2023
work page 2023
-
[65]
W. Ye, C. Ji, Z. Chen, J. Gao, X. Huang, S.-H. Zhang, W. Ouyang, T. He, C. Zhao, and G. Zhang, “DiffPano: Scalable and consistent text to panorama generation with spherical epipolar-aware diffusion,” inNeurIPS, 2024
work page 2024
-
[66]
PERF: Panoramic neural radiance field from a single panorama,
G. Wang, P. Wang, Z. Chen, W. Wang, C. C. Loy, and Z. Liu, “PERF: Panoramic neural radiance field from a single panorama,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[67]
PanopticNeRF-360: Panoramic 3D-to-2D label transfer in urban scenes,
X. Fu, S. Zhang, T. Chen, Y . Lu, X. Zhou, A. Geiger, and Y . Liao, “PanopticNeRF-360: Panoramic 3D-to-2D label transfer in urban scenes,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025
work page 2025
-
[68]
J. Ren, M. Xiang, J. Zhu, and Y . Dai, “PanoSplatt3R: Leveraging per- spective pretraining for generalized unposed wide-baseline panorama reconstruction,” inICCV, 2025
work page 2025
-
[69]
Deep 360° optical flow estimation based on multi-projection fusion,
Y . Li, C. Barnes, K. Huang, and F.-L. Zhang, “Deep 360° optical flow estimation based on multi-projection fusion,” inECCV, 2022
work page 2022
-
[70]
PriOr-Flow: Enhancing primitive panoramic optical flow with orthogonal view,
L. Liu, M. Feng, J. Cheng, J. Xiang, X. Zhu, and X. Yang, “PriOr-Flow: Enhancing primitive panoramic optical flow with orthogonal view,” in ICCV, 2025
work page 2025
-
[71]
Fully-automatic reflection removal for 360-degree images,
J. Park, H. Kim, E. Park, and J.-Y . Sim, “Fully-automatic reflection removal for 360-degree images,” inWACV, 2024
work page 2024
-
[72]
Fully geometric panoramic localiza- tion,
J. Kim, J. Jeong, and Y . M. Kim, “Fully geometric panoramic localiza- tion,” inCVPR, 2024
work page 2024
-
[73]
Learned scanpaths aid blind panoramic video quality assessment,
K. Fan, W. Wen, M. Li, Y . Peng, and K. Ma, “Learned scanpaths aid blind panoramic video quality assessment,” inCVPR, 2024
work page 2024
-
[74]
PAR2Net: End-to-end panoramic image reflection removal,
Y . Hong, Q. Zheng, L. Zhao, X. Jiang, A. C. Kot, and B. Shi, “PAR2Net: End-to-end panoramic image reflection removal,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
work page 2023
-
[75]
Saliency-free and aesthetic-aware panoramic video navigation,
C. Chen, G. Ma, W. Song, S. Li, A. Hao, and H. Qin, “Saliency-free and aesthetic-aware panoramic video navigation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[76]
Spherical vision transformers for audio-visual saliency prediction in 360-degree videos,
M. Cokelek, H. Ozsoy, N. Imamoglu, C. Ozcinar, I. Ayhan, E. Erdem, and A. Erdem, “Spherical vision transformers for audio-visual saliency prediction in 360-degree videos,”IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 2025
work page 2025
-
[77]
UniFuse: Uni- directional fusion for 360° panorama depth estimation,
H. Jiang, Z. Sheng, S. Zhu, Z. Dong, and R. Huang, “UniFuse: Uni- directional fusion for 360° panorama depth estimation,”IEEE Robotics and Automation Letters, 2021
work page 2021
-
[78]
SphereUFormer: A U-shaped transformer for spherical 360 perception,
Y . Benny and L. Wolf, “SphereUFormer: A U-shaped transformer for spherical 360 perception,” inCVPR, 2025
work page 2025
-
[79]
J. Zhang, K. Yang, H. Shi, S. Reiß, K. Peng, C. Ma, H. Fu, P. H. S. Torr, K. Wang, and R. Stiefelhagen, “Behind every domain there is a shift: Adapting distortion-aware vision transformers for panoramic semantic segmentation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[80]
SGAT4PASS: Spher- ical geometry-aware transformer for panoramic semantic segmentation,
X. Li, T. Wu, Z. Qi, G. Wang, Y . Shan, and X. Li, “SGAT4PASS: Spher- ical geometry-aware transformer for panoramic semantic segmentation,” inIJCAI, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.