pith. the verified trust layer for science. sign in

arxiv: 2511.00510 · v2 · submitted 2025-11-01 · 💻 cs.CV · cs.RO· eess.IV

OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback

Pith reviewed 2026-05-18 01:51 UTC · model grok-4.3

classification 💻 cs.CV cs.ROeess.IV
keywords omnidirectional multi-object trackingpanoramic distortiontrajectory feedback360 degree field of viewmulti-object trackingfeature stabilizationlong-term associationrobotic perception
0
0 comments X p. Extension

The pith

OmniTrack++ refines panoramic multi-object tracking by feeding trajectory cues back to stabilize features and associations under 360-degree distortion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a feedback-driven framework can overcome panoramic distortion, large search spaces, and identity ambiguity in omnidirectional multi-object tracking by progressively incorporating trajectory information from prior detections. DynamicSSM blocks first normalize features to reduce geometric warping, FlexiTrack Instances then apply trajectory feedback for precise short-term localization, and ExpertTrack Memory uses a mixture-of-experts structure to retain appearance cues across fragmented sequences. Tracklet Management further adapts between full end-to-end and detection-based modes based on scene conditions. A sympathetic reader would care because these mechanisms enable more reliable tracking in wide-field robotic and surveillance settings where conventional narrow-view trackers lose objects or switch identities.

Core claim

OmniTrack++ adopts a feedback-driven framework that progressively refines perception with trajectory cues to address panoramic distortion, large search space, and identity ambiguity under a 360 degree FoV. A DynamicSSM block first stabilizes panoramic features, implicitly alleviating geometric distortion. On top of normalized representations, FlexiTrack Instances use trajectory-informed feedback for flexible localization and reliable short-term association. To ensure long-term robustness, an ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design, enabling recovery from fragmented tracks and reducing identity drift. Finally, a Tracklet Management module adaptively sw

What carries the argument

The feedback-driven framework that progressively refines perception by injecting trajectory cues into feature stabilization, localization, and long-term memory consolidation.

If this is right

  • Stabilized panoramic features reduce the impact of geometric distortion on subsequent detection and association steps.
  • Trajectory-informed feedback in FlexiTrack Instances improves short-term localization accuracy in large search regions.
  • ExpertTrack Memory with mixture-of-experts design recovers from track fragmentation and lowers long-term identity drift.
  • Adaptive switching in Tracklet Management balances accuracy and efficiency across varying scene dynamics.
  • The EmboTrack benchmark with QuadTrack and BipTrack sequences provides a testbed that spans diverse robotic motion patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar trajectory feedback loops could be tested on fisheye or other non-panoramic wide-field cameras used in vehicles or drones.
  • The mixture-of-experts memory structure might be adapted to handle frequent occlusions in crowded indoor scenes.
  • Combining the adaptive mode switch with depth or motion sensors could further reduce errors during rapid robot turns.
  • Long-term memory consolidation may offer gains in multi-camera setups where tracks cross between overlapping 360-degree views.

Load-bearing premise

Trajectory cues from earlier tracks can be used to refine current features and associations without accumulating errors or causing identity switches when panoramic distortion and large search spaces are present.

What would settle it

Ablating the DynamicSSM block or the ExpertTrack Memory on JRDB or EmboTrack and measuring whether HOTA scores fall or identity switches rise relative to the full OmniTrack++ model.

Figures

Figures reproduced from arXiv: 2511.00510 by Fei Teng, Hao Shi, Kailun Yang, Kai Luo, Kaiwei Wang, Kunyu Peng, Sheng Wu.

Figure 1
Figure 1. Figure 1: Comparison of mainstream tracking paradigms. (a) il [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the EmboTrack benchmark (BipTrack and QuadTrack) and MOT results on the QuadTrack test set. (a) BipTrack [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pipeline overview of OmniTrack++. At frame t, the panoramic input is processed by a shared backbone, a Dynam￾icSSM block, and an encoder to produce learnable instances for the current frame. In parallel, FlexiTrack Instances from frame t−1 are retrieved from the ExpertTrack Memory. These two sets of tokens are concatenated and fed into the decoder to generate object proposals. A Dual-Branch Adapter then ro… view at source ↗
Figure 4
Figure 4. Figure 4: The proposed DynamicSSM Block is integrated into a standard DAB encoder as a plug-in enhancement. Rather than explicitly modeling panoramic geometry, it implicitly calibrates spatial and photometric feature distributions to mitigate geometric distortions and illumination variation. This adaptation yields more robust and stable representations, enabling more reliable decoding and multi-object tracking in pa… view at source ↗
Figure 5
Figure 5. Figure 5: ExpertTrack Memory framework. The module integrates long-term Stable Identity Memory (SIM) and short-term Dynamic Interaction Memory (DIM) to jointly maintain identity consistency and adapt to rapid appearance changes under panoramic distortions. A Hierarchical Memory Controller (HMC) assigns high-confidence features to SIM and recent-frame updates to DIM. A Router then selects the top-Kr features across b… view at source ↗
Figure 7
Figure 7. Figure 7: Instance motion trajectories over time. The horizontal axis [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of bounding box sizes in EmboTrack. The plots illustrate the box size distributions of the QuadTrack and BipTrack datasets across both training and test sets. transitions, complicating object detection and association. (ii) QuadTrack: Collected using a quadruped robotic platform ( [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of query localization. The first row [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effects of the trajectory initialization threshold and update [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of different MOT methods [15], [30], [113] on the JRDB dataset [120], visualized for frames 300∼310 (every other frame) of the sequence nvidia-aud-2019-04-18 0. As shown in the visualizations, OmniTrack++ demonstrates robust tracking performance, effectively maintaining consistent associations even under challenging conditions such as occlusions and motion dynamics. 5.3.4 Analysis of the initia… view at source ↗
Figure 12
Figure 12. Figure 12: Analysis of failure cases on JRDB [120], highlighting scenarios where OmniTrack++ struggles compared to ByteTrack [30]. two frames. A similar issue can be observed in the second row with ByteTrack [30], where partial occlusion leads to trajec￾tory fragmentation and identity inconsistency. In contrast, OC￾SORT [113] (third row) successfully preserves the target trajectory under occlusion, yet introduces ov… view at source ↗
read the original abstract

To address panoramic distortion, large search space, and identity ambiguity under a 360{\deg} FoV, OmniTrack++ adopts a feedback-driven framework that progressively refines perception with trajectory cues. A DynamicSSM block first stabilizes panoramic features, implicitly alleviating geometric distortion. On top of normalized representations, FlexiTrack Instances use trajectory-informed feedback for flexible localization and reliable short-term association. To ensure long-term robustness, an ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design, enabling recovery from fragmented tracks and reducing identity drift. Finally, a Tracklet Management module adaptively switches between end-to-end and tracking-by-detection modes according to scene dynamics, offering a balanced and scalable solution for panoramic MOT. To support rigorous evaluation, we establish the EmboTrack benchmark, a comprehensive dataset for panoramic MOT that includes QuadTrack, captured with a quadruped robot, and BipTrack, collected with a bipedal wheel-legged robot. Together, these datasets span wide-angle environments and diverse motion patterns, providing a challenging testbed for real-world panoramic perception. Extensive experiments on JRDB and EmboTrack demonstrate that OmniTrack++ achieves state-of-the-art performance, yielding substantial HOTA improvements of +3.94 on JRDB and +15.03 on QuadTrack over the original OmniTrack. These results highlight the effectiveness of trajectory-informed feedback, adaptive paradigm switching, and robust long-term memory in advancing panoramic multi-object tracking. Datasets and code will be made available at https://github.com/xifen523/OmniTrack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents OmniTrack++, a feedback-driven framework for omnidirectional multi-object tracking that addresses panoramic distortion, large search spaces, and identity ambiguity under 360° FoV. Key components include a DynamicSSM block for stabilizing panoramic features, FlexiTrack Instances that use trajectory-informed feedback for localization and short-term association, an ExpertTrack Memory module employing a Mixture-of-Experts design to consolidate appearance cues and recover from fragmented tracks, and a Tracklet Management module for adaptive switching between end-to-end and tracking-by-detection modes. The authors introduce the EmboTrack benchmark (including QuadTrack and BipTrack datasets) and report state-of-the-art results with HOTA improvements of +3.94 on JRDB and +15.03 on QuadTrack over the original OmniTrack.

Significance. If the empirical claims hold under rigorous verification, this work advances panoramic MOT by demonstrating how trajectory feedback can refine perception in challenging wide-FoV settings. The EmboTrack benchmark is a useful addition for evaluating methods on robotic platforms with diverse motion patterns. The Mixture-of-Experts approach in ExpertTrack Memory offers a plausible mechanism for long-term robustness. However, the overall significance is limited by insufficient experimental detail, which prevents full assessment of whether the reported gains generalize or stem from the proposed feedback mechanisms.

major comments (2)
  1. [Section 4] Section 4 (Experiments): The SOTA claims rest on HOTA deltas of +3.94 on JRDB and +15.03 on QuadTrack, yet the section provides no details on the exact baselines, per-component ablation studies (e.g., isolating DynamicSSM or ExpertTrack Memory), error bars, dataset splits, or statistical significance tests. This makes it impossible to verify that the gains arise from the trajectory-informed feedback rather than implementation choices or post-hoc tuning.
  2. [Section 3.3] Section 3.3 (ExpertTrack Memory): The description of the Mixture-of-Experts design for recovering from fragmented tracks and reducing identity drift lacks any analysis of error propagation in the feedback loop, per-component identity-switch rates versus track duration, or quantification of how panoramic distortion impacts the normalized representations used by FlexiTrack. Without such analysis, the central premise that the memory module reliably avoids cumulative errors under large-FoV fragmentation remains untested.
minor comments (2)
  1. [Abstract] The abstract states that datasets and code will be released at a GitHub link, but the manuscript should explicitly confirm in the experiments section that the link is active and includes the EmboTrack data splits used for the reported results.
  2. [Section 3] Notation for components (DynamicSSM, FlexiTrack, ExpertTrack Memory) would benefit from a single overview figure or pseudocode algorithm in Section 3 to clarify data flow between the feedback modules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where additional detail can strengthen the presentation of our results. We address each major comment below and commit to incorporating the suggested analyses and clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (Experiments): The SOTA claims rest on HOTA deltas of +3.94 on JRDB and +15.03 on QuadTrack, yet the section provides no details on the exact baselines, per-component ablation studies (e.g., isolating DynamicSSM or ExpertTrack Memory), error bars, dataset splits, or statistical significance tests. This makes it impossible to verify that the gains arise from the trajectory-informed feedback rather than implementation choices or post-hoc tuning.

    Authors: We agree that the current experimental section lacks the granularity needed for full verification. In the revised manuscript we will expand Section 4 to provide: explicit implementation details and hyper-parameters for all baselines; comprehensive per-component ablations that isolate DynamicSSM, FlexiTrack Instances, ExpertTrack Memory, and Tracklet Management; error bars obtained from multiple independent runs; precise descriptions of the training/validation/test splits on JRDB and EmboTrack; and statistical significance tests (e.g., paired t-tests) on the reported HOTA improvements. These additions will allow readers to confirm that the observed gains originate from the trajectory-feedback mechanisms rather than other factors. revision: yes

  2. Referee: [Section 3.3] Section 3.3 (ExpertTrack Memory): The description of the Mixture-of-Experts design for recovering from fragmented tracks and reducing identity drift lacks any analysis of error propagation in the feedback loop, per-component identity-switch rates versus track duration, or quantification of how panoramic distortion impacts the normalized representations used by FlexiTrack. Without such analysis, the central premise that the memory module reliably avoids cumulative errors under large-FoV fragmentation remains untested.

    Authors: We concur that further empirical analysis is required to substantiate the robustness claims for ExpertTrack Memory. In the revision we will augment Section 3.3 and the experimental results with: a study of error propagation through the feedback loop; identity-switch rates broken down by track duration and by component; and quantitative measurements (together with qualitative examples) of how panoramic distortion affects the normalized representations inside FlexiTrack, demonstrating the stabilizing effect of the Mixture-of-Experts design. These analyses will directly test the premise that the memory module limits cumulative errors under large-FoV fragmentation. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external benchmarks and prior method comparisons

full rationale

The manuscript describes an algorithmic pipeline (DynamicSSM for feature stabilization, FlexiTrack for trajectory-informed localization, ExpertTrack Memory with Mixture-of-Experts, and adaptive Tracklet Management) evaluated on JRDB and the newly introduced EmboTrack (QuadTrack/BipTrack) datasets. Reported HOTA gains (+3.94 on JRDB, +15.03 on QuadTrack) are presented as direct experimental outcomes against the original OmniTrack baseline. No equations, fitted parameters, or predictions are defined in terms of the target metrics; no self-citation chain is invoked to justify uniqueness or force a result; the derivation chain consists of design choices validated externally rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach depends on standard deep-learning assumptions for feature extraction and association plus the domain-specific premise that trajectory cues are informative for panoramic correction; no explicit free parameters or new physical entities are detailed in the abstract.

axioms (1)
  • domain assumption Trajectory cues from prior detections provide reliable feedback that can progressively refine perception, localization, and association under 360-degree distortion.
    This premise underpins the entire feedback-driven framework and the roles of DynamicSSM, FlexiTrack, and ExpertTrack Memory.

pith-pipeline@v0.9.0 · 5831 in / 1613 out tokens · 44862 ms · 2026-05-18T01:51:39.776584+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

128 extracted references · 128 canonical work pages · 1 internal anchor

  1. [1]

    A survey of representation learning, optimization strategies, and applications for omnidirectional vision,

    H. Ai, Z. Cao, and L. Wang, “A survey of representation learning, optimization strategies, and applications for omnidirectional vision,” International Journal of Computer Vision, 2025

  2. [2]

    Spherical DNNs and their applications in 360° images and videos,

    Y . Xu, Z. Zhang, and S. Gao, “Spherical DNNs and their applications in 360° images and videos,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

  3. [3]

    One flight over the gap: A survey from perspective to panoramic vision,

    X. Lin, X. Ge, D. Zhang, Z. Wan, X. Wang, X. Li, W. Jiang, B. Du, D. Tao, M.-H. Yang, and L. Qi, “One flight over the gap: A survey from perspective to panoramic vision,”arXiv preprint arXiv:2509.04444, 2025

  4. [4]

    Panacea: Panoramic and controllable video generation for autonomous driving,

    Y . Wen, Y . Zhao, Y . Liu, F. Jia, Y . Wang, C. Luo, C. Zhang, T. Wang, X. Sun, and X. Zhang, “Panacea: Panoramic and controllable video generation for autonomous driving,” inCVPR, 2024

  5. [5]

    Occlusion-aware seamless segmentation,

    Y . Cao, J. Zhang, H. Shi, K. Peng, Y . Zhang, H. Zhang, R. Stiefelhagen, and K. Yang, “Occlusion-aware seamless segmentation,” inECCV, 2024

  6. [6]

    Visual route following for tiny autonomous robots,

    T. van Dijk, C. D. Wagter, and G. C. H. E. de Croon, “Visual route following for tiny autonomous robots,”Science Robotics, 2024. 16

  7. [7]

    PanoFlow: Learning 360° optical flow for surrounding temporal understanding,

    H. Shi, Y . Zhou, K. Yang, X. Yin, Z. Wang, Y . Ye, Z. Yin, S. Meng, P. Li, and K. Wang, “PanoFlow: Learning 360° optical flow for surrounding temporal understanding,”IEEE Transactions on Intelligent Transporta- tion Systems, 2023

  8. [8]

    The effect of AR- HUD takeover assistance types on driver situation awareness in highly automated driving: A 360-degree panorama experiment,

    Z. Wu, L. Zhao, G. Liu, J. Chai, J. Huang, and X. Ai, “The effect of AR- HUD takeover assistance types on driver situation awareness in highly automated driving: A 360-degree panorama experiment,”International Journal of Human-Computer Interaction, 2024

  9. [9]

    Panoramic human activity recognition,

    R. Han, H. Yan, J. Li, S. Wang, W. Feng, and S. Wang, “Panoramic human activity recognition,” inECCV, 2022

  10. [10]

    HumanoidPano: Hybrid spherical panoramic- LiDAR cross-modal perception for humanoid robots,

    Q. Zhang, Z. Zhang, W. Cui, J. Sun, J. Cao, Y . Guo, G. Han, W. Zhao, J. Wang, C. Sun, L. Zhang, H. Cheng, Y . Chen, L. Wang, J. Tang, and R. Xu, “HumanoidPano: Hybrid spherical panoramic- LiDAR cross-modal perception for humanoid robots,”arXiv preprint arXiv:2503.09010, 2025

  11. [11]

    LiMo-Calib: On- site fast LiDAR-motor calibration for quadruped robot-based panoramic 3D sensing system,

    J. Li, Z. Liu, X. Xu, J. Liu, S. Yuan, F. Xu, and L. Xie, “LiMo-Calib: On- site fast LiDAR-motor calibration for quadruped robot-based panoramic 3D sensing system,”arXiv preprint arXiv:2502.12655, 2025

  12. [12]

    360Loc: A dataset and benchmark for omnidirectional visual local- ization with cross-device queries,

    H. Huang, C. Liu, Y . Zhu, H. Cheng, T. Braud, and S.-K. Yeung, “360Loc: A dataset and benchmark for omnidirectional visual local- ization with cross-device queries,” inCVPR, 2024

  13. [13]

    360VOT: A new benchmark dataset for omnidirectional visual object tracking,

    H. Huang, Y . Xu, Y . Chen, and S.-K. Yeung, “360VOT: A new benchmark dataset for omnidirectional visual object tracking,” inICCV, 2023

  14. [14]

    Simple online and realtime tracking,

    A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” inICIP, 2016

  15. [15]

    Simple online and realtime tracking with a deep association metric,

    N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” inICIP, 2017

  16. [16]

    Beyond MOT: Semantic multi-object tracking,

    Y . Li, Q. Li, H. Wang, X. Ma, J. Yao, S. Dong, H. Fan, and L. Zhang, “Beyond MOT: Semantic multi-object tracking,” inECCV, 2024

  17. [17]

    Delving into multi-modal multi-task foundation models for road scene understanding: From learning paradigm perspectives,

    S. Luo, W. Chen, W. Tian, R. Liu, L. Hou, X. Zhang, H. Shen, R. Wu, S. Geng, Y . Zhou, L. Shao, Y . Yang, B. Gao, Q. Li, and G. Wu, “Delving into multi-modal multi-task foundation models for road scene understanding: From learning paradigm perspectives,”IEEE Transactions on Intelligent Vehicles, 2024

  18. [18]

    USVTrack: A benchmark for multi-object tracking in complex water surface scenes,

    B. Xue, Y . Cheng, K. Ding, C. Pan, and S. Xiang, “USVTrack: A benchmark for multi-object tracking in complex water surface scenes,” IEEE Transactions on Circuits and Systems for Video Technology, 2025

  19. [19]

    Preformer MOT: A transformer-based approach for multi-object tracking with global trajectory prediction,

    Y . Wang, Y . Qing, K. Huang, C. Dang, and Z. Wu, “Preformer MOT: A transformer-based approach for multi-object tracking with global trajectory prediction,”Fundamental Research, 2025

  20. [20]

    MotionTrack: Learning robust short-term and long-term motions for multi-object tracking,

    Z. Qin, S. Zhou, L. Wang, J. Duan, G. Hua, and W. Tang, “MotionTrack: Learning robust short-term and long-term motions for multi-object tracking,” inCVPR, 2023

  21. [21]

    PNAS-MOT: Multi-modal object tracking with pareto neural architecture search,

    C. Peng, Z. Zeng, J. Gao, J. Zhou, M. Tomizuka, X. Wang, C. Zhou, and N. Ye, “PNAS-MOT: Multi-modal object tracking with pareto neural architecture search,”IEEE Robotics and Automation Letters, 2024

  22. [22]

    Temporal task and motion planning with metric time for multiple object navigation,

    E. Tosello, A. Valentini, and A. Micheli, “Temporal task and motion planning with metric time for multiple object navigation,” inAAAI, 2025

  23. [23]

    Planning- oriented autonomous driving,

    Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y . Qiao, and H. Li, “Planning- oriented autonomous driving,” inCVPR, 2023

  24. [24]

    Delving into the trajectory long-tail distribution for muti-object tracking,

    S. Chen, E. Yu, J. Li, and W. Tao, “Delving into the trajectory long-tail distribution for muti-object tracking,” inCVPR, 2024

  25. [25]

    DiffMOT: A real-time diffusion-based multiple object tracker with non-linear prediction,

    W. Lv, Y . Huang, N. Zhang, R.-S. Lin, M. Han, and D. Zeng, “DiffMOT: A real-time diffusion-based multiple object tracker with non-linear prediction,” inCVPR, 2024

  26. [26]

    Multi-object tracking model based on detection tracking paradigm in panoramic scenes,

    J. Shen and H. Yang, “Multi-object tracking model based on detection tracking paradigm in panoramic scenes,”Applied Sciences, 2024

  27. [27]

    MOTR: End-to-end multiple-object tracking with transformer,

    F. Zeng, B. Dong, Y . Zhang, T. Wang, X. Zhang, and Y . Wei, “MOTR: End-to-end multiple-object tracking with transformer,” inECCV, 2022

  28. [28]

    MOTRv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors,

    Y . Zhang, T. Wang, and X. Zhang, “MOTRv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors,” inCVPR, 2023

  29. [29]

    ADA-Track++: End- to-end multi-camera 3D multi-object tracking with alternating detection and association,

    S. Ding, L. Schneider, M. Cordts, and J. Gall, “ADA-Track++: End- to-end multi-camera 3D multi-object tracking with alternating detection and association,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  30. [30]

    ByteTrack: Multi-object tracking by associating every detection box,

    Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “ByteTrack: Multi-object tracking by associating every detection box,” inECCV, 2022

  31. [31]

    Hybrid-SORT: Weak cues matter for online multi-object tracking,

    M. Yang, G. Han, B. Yan, W. Zhang, J. Qi, H. Lu, and D. Wang, “Hybrid-SORT: Weak cues matter for online multi-object tracking,” in AAAI, 2024

  32. [32]

    Omnidirectional multi-object tracking,

    K. Luo, H. Shi, S. Wu, F. Teng, M. Duan, C. Huang, Y . Wang, K. Wang, and K. Yang, “Omnidirectional multi-object tracking,” inCVPR, 2025

  33. [33]

    MOT16: A Benchmark for Multi-Object Tracking

    A. Milan, L. Leal-Taix ´e, I. D. Reid, S. Roth, and K. Schindler, “MOT16: A benchmark for multi-object tracking,”arXiv preprint arXiv:1603.00831, 2016

  34. [34]

    nuScenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Kr- ishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” inCVPR, 2020

  35. [35]

    MOT20: A benchmark for multi object tracking in crowded scenes

    P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. D. Reid, S. Roth, K. Schindler, and L. Leal-Taix ´e, “MOT20: A bench- mark for multi object tracking in crowded scenes,”arXiv preprint arXiv:2003.09003, 2020

  36. [36]

    SemanticKITTI: A dataset for semantic scene understand- ing of LiDAR sequences,

    J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “SemanticKITTI: A dataset for semantic scene understand- ing of LiDAR sequences,” inICCV, 2019

  37. [37]

    BDD100K: A diverse driving dataset for heterogeneous multitask learning,

    F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “BDD100K: A diverse driving dataset for heterogeneous multitask learning,” inCVPR, 2020

  38. [38]

    SportsMOT: A large multi-object tracking dataset in multiple sports scenes,

    Y . Cui, C. Zeng, X. Zhao, Y . Yang, G. Wu, and L. Wang, “SportsMOT: A large multi-object tracking dataset in multiple sports scenes,” in ICCV, 2023

  39. [39]

    MV A 2025 small multi-object tracking for spotting birds challenge: Dataset, methods, and results,

    Y . Kondo, N. Ukita, R. Kanayama, Y . Yoshida, T. Yamaguchi, X. Yu, G. Liang, X. Liu, G. Wang, W. Chu, B. Chuang, J. Lee, P. Kuo, I. Chu, Y . Hsiao, C. Wu, P. Wu, J. Tsou, H. Liu, C. Lee, Y . Yang, K. Shigematsu, A. Shin, and B. Tran, “MV A 2025 small multi-object tracking for spotting birds challenge: Dataset, methods, and results,” in MVA, 2025

  40. [40]

    360+x: A panoptic multi-modal scene understanding dataset,

    H. Chen, Y . Hou, C. Qu, I. Testini, X. Hong, and J. Jiao, “360+x: A panoptic multi-modal scene understanding dataset,” inCVPR, 2024

  41. [41]

    PanoContext-Former: Panoramic total scene understanding with a transformer,

    Y . Dong, C. Fang, L. Bo, Z. Dong, and P. Tan, “PanoContext-Former: Panoramic total scene understanding with a transformer,” inCVPR, 2024

  42. [42]

    JRDB-Act: A large-scale dataset for spatio-temporal action, social group and activity detection,

    M. Ehsanpour, F. S. Saleh, S. Savarese, I. D. Reid, and H. Rezatofighi, “JRDB-Act: A large-scale dataset for spatio-temporal action, social group and activity detection,” inCVPR, 2022

  43. [43]

    Minimalist and high-quality panoramic imaging with PSF- aware transformers,

    Q. Jiang, S. Gao, Y . Gao, K. Yang, Z. Yi, H. Shi, L. Sun, and K. Wang, “Minimalist and high-quality panoramic imaging with PSF- aware transformers,”IEEE Transactions on Image Processing, 2024

  44. [44]

    PANDORA: A panoramic detection dataset for object with orientation,

    H. Xu, Q. Zhao, Y . Ma, X. Li, P. Yuan, B. Feng, C. Yan, and F. Dai, “PANDORA: A panoramic detection dataset for object with orientation,” inECCV, 2022

  45. [45]

    Spatio-temporal proximity- aware dual-path model for panoramic activity recognition,

    S. Lee, Y . Wang, S. Woo, and C. Kim, “Spatio-temporal proximity- aware dual-path model for panoramic activity recognition,” inECCV, 2024

  46. [46]

    Unified audio-visual saliency model for omnidirectional videos with spatial audio,

    D. Zhu, K. Zhang, N. Zhang, Q. Zhou, X. Min, G. Zhai, and X. Yang, “Unified audio-visual saliency model for omnidirectional videos with spatial audio,”IEEE Transactions on Multimedia, 2024

  47. [47]

    360SFUDA++: Towards source-free UDA for panoramic segmentation by learning reliable category prototypes,

    X. Zheng, P. Zhou, A. V . Vasilakos, and L. Wang, “360SFUDA++: Towards source-free UDA for panoramic segmentation by learning reliable category prototypes,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  48. [48]

    PanoVOS: Bridging non-panoramic and panoramic views with trans- former for video segmentation,

    S. Yan, X. Xu, R. Zhang, L. Hong, W. Chen, W. Zhang, and W. Zhang, “PanoVOS: Bridging non-panoramic and panoramic views with trans- former for video segmentation,” inECCV, 2024

  49. [49]

    GoodSAM: Bridging domain and capacity gaps via segment anything model for distortion- aware panoramic semantic segmentation,

    W. Zhang, Y . Liu, X. Zheng, and L. Wang, “GoodSAM: Bridging domain and capacity gaps via segment anything model for distortion- aware panoramic semantic segmentation,” inCVPR, 2024

  50. [50]

    OmniSAM: Omnidirectional segment anything model for UDA in panoramic semantic segmentation,

    D. Zhong, X. Zheng, C. Liao, Y . Lyu, J. Chen, S. Wu, L. Zhang, and X. Hu, “OmniSAM: Omnidirectional segment anything model for UDA in panoramic semantic segmentation,” inICCV, 2025

  51. [51]

    Multi-source domain adaptation for panoramic semantic segmentation,

    J. Jiang, S. Zhao, J. Zhu, W. Tang, Z. Xu, J. Yang, G. Liu, T. Xing, P. Xu, and H. Yao, “Multi-source domain adaptation for panoramic semantic segmentation,”Information Fusion, 2025

  52. [52]

    GLPanoDepth: Global- to-local panoramic depth estimation,

    J. Bai, H. Qin, S. Lai, J. Guo, and Y . Guo, “GLPanoDepth: Global- to-local panoramic depth estimation,”IEEE Transactions on Image Processing, 2024

  53. [53]

    Elite360D: Towards efficient 360 depth estimation via semantic-and distance-aware bi-projection fusion,

    H. Ai and L. Wang, “Elite360D: Towards efficient 360 depth estimation via semantic-and distance-aware bi-projection fusion,” inCVPR, 2024

  54. [54]

    BiFuse++: Self-supervised and efficient bi-projection fusion for 360° depth estima- tion,

    F.-E. Wang, Y .-H. Yeh, Y .-H. Tsai, W.-C. Chiu, and M. Sun, “BiFuse++: Self-supervised and efficient bi-projection fusion for 360° depth estima- tion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  55. [55]

    PanoFormer: Panorama transformer for indoor 360° depth estimation,

    Z. Shen, C. Lin, K. Liao, L. Nie, Z. Zheng, and Y . Zhao, “PanoFormer: Panorama transformer for indoor 360° depth estimation,” inECCV, 2022

  56. [56]

    Depth estimation from indoor panoramas with neural scene representation,

    W. Chang, Y . Zhang, and Z. Xiong, “Depth estimation from indoor panoramas with neural scene representation,” inCVPR, 2023

  57. [57]

    SPDET: Edge-aware self-supervised panoramic depth estimation transformer with spherical 17 geometry,

    C. Zhuang, Z. Lu, Y . Wang, J. Xiao, and Y . Wang, “SPDET: Edge-aware self-supervised panoramic depth estimation transformer with spherical 17 geometry,”IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 2023

  58. [58]

    PanelNet: Understanding 360 indoor environment via panel representation,

    H. Yu, L. He, B. Jian, W. Feng, and S. Liu, “PanelNet: Understanding 360 indoor environment via panel representation,” inCVPR, 2023

  59. [59]

    Disentangling orthogonal planes for indoor panoramic room layout estimation with cross-scale distortion awareness,

    Z. Shen, Z. Zheng, C. Lin, L. Nie, K. Liao, S. Zheng, and Y . Zhao, “Disentangling orthogonal planes for indoor panoramic room layout estimation with cross-scale distortion awareness,” inCVPR, 2023

  60. [60]

    PanoSwin: a pano- style swin transformer for panorama understanding,

    Z. Ling, Z. Xing, X. Zhou, M. Cao, and G. Zhou, “PanoSwin: a pano- style swin transformer for panorama understanding,” inCVPR, 2023

  61. [61]

    360 layout estimation via orthogonal planes disentanglement and multi- view geometric consistency perception,

    Z. Shen, C. Lin, J. Zhang, L. Nie, K. Liao, and Y . Zhao, “360 layout estimation via orthogonal planes disentanglement and multi- view geometric consistency perception,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  62. [62]

    DreamScene360: Unconstrained text-to-3D scene generation with panoramic gaussian splatting,

    S. Zhou, Z. Fan, D. Xu, H. Chang, P. Chari, T. Bharadwaj, S. You, Z. Wang, and A. Kadambi, “DreamScene360: Unconstrained text-to-3D scene generation with panoramic gaussian splatting,” inECCV, 2024

  63. [63]

    360DVD: Con- trollable panorama video generation with 360-degree video diffusion model,

    Q. Wang, W. Li, C. Mou, X. Cheng, and J. Zhang, “360DVD: Con- trollable panorama video generation with 360-degree video diffusion model,” inCVPR, 2024

  64. [64]

    PanoGen: Text-conditioned panoramic environ- ment generation for vision-and-language navigation,

    J. Li and M. Bansal, “PanoGen: Text-conditioned panoramic environ- ment generation for vision-and-language navigation,” inNeurIPS, 2023

  65. [65]

    DiffPano: Scalable and consistent text to panorama generation with spherical epipolar-aware diffusion,

    W. Ye, C. Ji, Z. Chen, J. Gao, X. Huang, S.-H. Zhang, W. Ouyang, T. He, C. Zhao, and G. Zhang, “DiffPano: Scalable and consistent text to panorama generation with spherical epipolar-aware diffusion,” inNeurIPS, 2024

  66. [66]

    PERF: Panoramic neural radiance field from a single panorama,

    G. Wang, P. Wang, Z. Chen, W. Wang, C. C. Loy, and Z. Liu, “PERF: Panoramic neural radiance field from a single panorama,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  67. [67]

    PanopticNeRF-360: Panoramic 3D-to-2D label transfer in urban scenes,

    X. Fu, S. Zhang, T. Chen, Y . Lu, X. Zhou, A. Geiger, and Y . Liao, “PanopticNeRF-360: Panoramic 3D-to-2D label transfer in urban scenes,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025

  68. [68]

    PanoSplatt3R: Leveraging per- spective pretraining for generalized unposed wide-baseline panorama reconstruction,

    J. Ren, M. Xiang, J. Zhu, and Y . Dai, “PanoSplatt3R: Leveraging per- spective pretraining for generalized unposed wide-baseline panorama reconstruction,” inICCV, 2025

  69. [69]

    Deep 360° optical flow estimation based on multi-projection fusion,

    Y . Li, C. Barnes, K. Huang, and F.-L. Zhang, “Deep 360° optical flow estimation based on multi-projection fusion,” inECCV, 2022

  70. [70]

    PriOr-Flow: Enhancing primitive panoramic optical flow with orthogonal view,

    L. Liu, M. Feng, J. Cheng, J. Xiang, X. Zhu, and X. Yang, “PriOr-Flow: Enhancing primitive panoramic optical flow with orthogonal view,” in ICCV, 2025

  71. [71]

    Fully-automatic reflection removal for 360-degree images,

    J. Park, H. Kim, E. Park, and J.-Y . Sim, “Fully-automatic reflection removal for 360-degree images,” inWACV, 2024

  72. [72]

    Fully geometric panoramic localiza- tion,

    J. Kim, J. Jeong, and Y . M. Kim, “Fully geometric panoramic localiza- tion,” inCVPR, 2024

  73. [73]

    Learned scanpaths aid blind panoramic video quality assessment,

    K. Fan, W. Wen, M. Li, Y . Peng, and K. Ma, “Learned scanpaths aid blind panoramic video quality assessment,” inCVPR, 2024

  74. [74]

    PAR2Net: End-to-end panoramic image reflection removal,

    Y . Hong, Q. Zheng, L. Zhao, X. Jiang, A. C. Kot, and B. Shi, “PAR2Net: End-to-end panoramic image reflection removal,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  75. [75]

    Saliency-free and aesthetic-aware panoramic video navigation,

    C. Chen, G. Ma, W. Song, S. Li, A. Hao, and H. Qin, “Saliency-free and aesthetic-aware panoramic video navigation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  76. [76]

    Spherical vision transformers for audio-visual saliency prediction in 360-degree videos,

    M. Cokelek, H. Ozsoy, N. Imamoglu, C. Ozcinar, I. Ayhan, E. Erdem, and A. Erdem, “Spherical vision transformers for audio-visual saliency prediction in 360-degree videos,”IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 2025

  77. [77]

    UniFuse: Uni- directional fusion for 360° panorama depth estimation,

    H. Jiang, Z. Sheng, S. Zhu, Z. Dong, and R. Huang, “UniFuse: Uni- directional fusion for 360° panorama depth estimation,”IEEE Robotics and Automation Letters, 2021

  78. [78]

    SphereUFormer: A U-shaped transformer for spherical 360 perception,

    Y . Benny and L. Wolf, “SphereUFormer: A U-shaped transformer for spherical 360 perception,” inCVPR, 2025

  79. [79]

    Behind every domain there is a shift: Adapting distortion-aware vision transformers for panoramic semantic segmentation,

    J. Zhang, K. Yang, H. Shi, S. Reiß, K. Peng, C. Ma, H. Fu, P. H. S. Torr, K. Wang, and R. Stiefelhagen, “Behind every domain there is a shift: Adapting distortion-aware vision transformers for panoramic semantic segmentation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  80. [80]

    SGAT4PASS: Spher- ical geometry-aware transformer for panoramic semantic segmentation,

    X. Li, T. Wu, Z. Qi, G. Wang, Y . Shan, and X. Li, “SGAT4PASS: Spher- ical geometry-aware transformer for panoramic semantic segmentation,” inIJCAI, 2023

Showing first 80 references.