Recognition: 2 theorem links
· Lean TheoremCalibFree: Self-Supervised View Feature Separation for Calibration-Free Multi-Camera Multi-Object Tracking
Pith reviewed 2026-05-12 04:44 UTC · model grok-4.3
The pith
Self-supervised separation of view-agnostic and view-specific features enables multi-camera tracking without calibration or labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By promoting separation between view-agnostic and view-specific representations via single-view distillation and cross-view reconstruction, CalibFree performs multi-camera multi-object tracking without calibration information or labels, yielding higher accuracy and F1 scores on standard benchmarks while adapting to dynamic camera configurations.
What carries the argument
The view-agnostic feature representation produced by single-view distillation together with cross-view reconstruction, which isolates identity-preserving information independent of camera perspective.
If this is right
- Tracking systems can operate in settings where installing calibrated cameras is impractical or expensive.
- No manual labeling is required to train or adapt the tracker to new camera networks.
- Performance remains stable under viewpoint changes and scene dynamics.
- Cross-view association improves without explicit geometric alignment.
Where Pith is reading between the lines
- The same separation idea could replace hand-crafted calibration in other multi-view tasks such as 3D pose estimation.
- Temporary camera arrays for events or robotics could adopt this approach with little setup effort.
- Adding temporal consistency signals might further strengthen identity preservation over long sequences.
- The learned invariance might transfer to non-visual sensors if similar reconstruction objectives are defined.
Load-bearing premise
That single-view distillation combined with cross-view reconstruction produces features that keep object identities consistent across uncalibrated views without any external supervision.
What would settle it
Running the method on a new multi-camera dataset with no calibration data and no labels, then checking whether cross-view identity consistency collapses when the separation losses are removed.
Figures
read the original abstract
Multi-camera multi-object tracking (MCMOT) faces significant challenges in maintaining consistent object identities across varying camera perspectives, particularly when precise calibration and extensive annotations are required. In this paper, we present CalibFree, a self-supervised representation learning framework that does not need any calibration or manual labeling for the MCMOT task. By promoting feature separation between view-agnostic and view-specific representations through single-view distillation and cross-view reconstruction, our method adapts to complex, dynamic scenarios with minimal overhead. Experiments on the MMP-MvMHAT dataset show a 3% improvement in overall accuracy and a 7.5% increase in the average F1 score over state-of-the-art approaches, confirming the effectiveness of our calibration-free design. Moreover, on the more diverse MvMHAT dataset, our approach demonstrates superior over-time tracking and strong cross-view performance, highlighting its adaptability to a wide range of camera configurations. Code will be publicly available upon acceptance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CalibFree, a self-supervised framework for multi-camera multi-object tracking (MCMOT) that separates view-agnostic and view-specific features via single-view distillation and cross-view reconstruction, eliminating the need for camera calibration or manual labels. It reports a 3% gain in overall accuracy and 7.5% gain in average F1 score on the MMP-MvMHAT dataset, plus strong cross-view and over-time tracking results on the more diverse MvMHAT dataset.
Significance. If the central claim is substantiated, the work would be significant for practical MCMOT deployment in uncalibrated, dynamic settings by removing geometric priors and annotation burdens. The self-supervised design and public code commitment are positive attributes that could enable wider adoption if the identity-consistency mechanism is shown to hold.
major comments (3)
- [Method and Abstract] The central claim that single-view distillation plus cross-view reconstruction yields view-agnostic features with consistent object identities across cameras rests on an unverified assumption; the reconstruction objective (described at high level in the method) appears to operate at feature or pixel level without an explicit identity-aware or association term, which risks permitting identity permutations that still satisfy the loss (see skeptic note on data-association).
- [Experiments] Only aggregate performance numbers are reported; no ablation studies, error analysis, or derivation details are provided to isolate the contribution of the view-feature separation or to confirm that gains arise from multi-view consistency rather than improved single-view tracking.
- [Experiments] The 3% accuracy / 7.5% F1 improvements on MMP-MvMHAT are presented without statistical significance tests, variance across runs, or comparisons against strong calibration-free baselines, weakening attribution to the proposed design.
minor comments (1)
- [Abstract] The abstract's claim of 'strong cross-view performance' on MvMHAT would benefit from explicit per-camera or cross-view ID consistency metrics rather than qualitative description.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback. We address each major comment below, providing clarifications from the manuscript and indicating planned revisions to strengthen the presentation.
read point-by-point responses
-
Referee: [Method and Abstract] The central claim that single-view distillation plus cross-view reconstruction yields view-agnostic features with consistent object identities across cameras rests on an unverified assumption; the reconstruction objective (described at high level in the method) appears to operate at feature or pixel level without an explicit identity-aware or association term, which risks permitting identity permutations that still satisfy the loss (see skeptic note on data-association).
Authors: We appreciate this observation on the implicit nature of identity consistency. In Section 3, the single-view distillation loss enforces intra-view feature consistency for the same object, while the cross-view reconstruction operates exclusively on the separated view-agnostic features; the separation itself is intended to isolate identity information from view-specific cues, reducing the likelihood of permutations that would violate reconstruction across views. The training data implicitly provides the association through simultaneous multi-view captures of the same scenes. To address the concern directly, we will expand the method section with a formal derivation of the combined objective and a discussion of the data-association assumption in the revision. revision: partial
-
Referee: [Experiments] Only aggregate performance numbers are reported; no ablation studies, error analysis, or derivation details are provided to isolate the contribution of the view-feature separation or to confirm that gains arise from multi-view consistency rather than improved single-view tracking.
Authors: We agree that isolating the contributions is important. Although the original submission focused on overall results due to space limits, we have performed component ablations (distillation alone vs. full model) and error analysis on identity switches and cross-view consistency. These will be added to the experiments section in the revised manuscript, along with expanded loss derivations, to demonstrate that the gains stem from the multi-view feature separation rather than single-view improvements alone. revision: yes
-
Referee: [Experiments] The 3% accuracy / 7.5% F1 improvements on MMP-MvMHAT are presented without statistical significance tests, variance across runs, or comparisons against strong calibration-free baselines, weakening attribution to the proposed design.
Authors: We acknowledge the value of statistical rigor and clearer baseline attribution. In the revision we will report standard deviations over multiple runs, include paired significance tests, and explicitly categorize the baselines to highlight calibration-free methods. This will better substantiate that the reported gains are attributable to the proposed self-supervised separation. revision: yes
Circularity Check
No circularity: empirical self-supervised framework with independent evaluation
full rationale
The paper introduces a self-supervised method that separates view-agnostic and view-specific features via single-view distillation and cross-view reconstruction losses, then reports tracking accuracy gains on external datasets (MMP-MvMHAT, MvMHAT). No derivation chain, equation, or fitted quantity is shown to reduce to its own inputs by construction. The central result is an empirical performance claim, not a mathematical prediction forced by self-definition or self-citation. Self-citations, if present, are not load-bearing for the uniqueness or correctness of the reported metrics.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption View-agnostic and view-specific representations can be separated effectively through single-view distillation and cross-view reconstruction.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By promoting feature separation between view-agnostic and view-specific representations through single-view distillation and cross-view reconstruction
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
L = L_sep + L_distill + L_recon
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
https://iccv2021-mmp.github.io/
-
[2]
In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W
Akbari, H., Yuan, L., Qian, R., Chuang, W.H., Chang, S.F., Cui, Y., Gong, B.: Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems. vol. 34, pp. 24206– 24221. Curran Associates, Inc. (2021),...
work page 2021
-
[3]
In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W
Bastani, F., He, S., Madden, S.: Self-supervised multi-object tracking with cross- input consistency. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems. vol. 34, pp. 13695–13706. Curran Associates, Inc. (2021),https://proceedings.neurips. cc / paper _ files / paper / 2021 / fil...
work page 2021
-
[4]
2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp
Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 941– 951 (2019),https://api.semanticscholar.org/CorpusID:76665153
work page 2019
-
[5]
EURASIP Journal on Image and Video Processing2008, 1–10 (2008)
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing2008, 1–10 (2008)
work page 2008
-
[6]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Cai, J., Xu, M., Li, W., Xiong, Y., Xia, W., Tu, Z., Soatto, S.: Memot: Multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8090–8100 (2022)
work page 2022
-
[7]
In: IEEE Winter Conference on Applications of Computer Vision
Cai, Y., Medioni, G.: Exploring context information for inter-camera multiple tar- get tracking. In: IEEE Winter Conference on Applications of Computer Vision. pp. 761–768. IEEE (2014)
work page 2014
-
[8]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Cao, J., Pang, J., Weng, X., Khirodkar, R., Kitani, K.: Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9686–9696 (2023)
work page 2023
-
[9]
In: European conference on computer vision
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European conference on computer vision. pp. 213–229. Springer (2020)
work page 2020
-
[10]
A Simple Framework for Contrastive Learning of Visual Representations
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. ArXivabs/2002.05709(2020), https://api.semanticscholar.org/CorpusID:211096730
work page internal anchor Pith review arXiv 2002
-
[11]
Pattern Recognition47(3), 1126–1137 (2014)
Chen, X., Huang, K., Tan, T.: Object tracking across non-overlapping views by learning inter-camera transfer models. Pattern Recognition47(3), 1126–1137 (2014)
work page 2014
-
[12]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Cheng, C.C., Qiu, M.X., Chiang, C.K., Lai, S.H.: Rest: A reconfigurable spatial- temporal graph model for multi-camera multi-object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10051–10060 (2023)
work page 2023
- [13]
-
[14]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Chu, P., Ling, H.: Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6172–6181 (2019) CalibFree 5
work page 2019
-
[15]
In: Proceedings of the IEEE international conference on computer vision
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE international conference on computer vision. pp. 1422–1430 (2015)
work page 2015
-
[16]
Dong, J., Fang, Q., Jiang, W.B., Yang, Y., Huang, Q.X., Bao, H., Zhou, X.: Fast and robust multi-person 3d pose estimation and tracking from multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence44, 6981–6992 (2021),https://api.semanticscholar.org/CorpusID:236159249
work page 2021
-
[17]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. ICLR (2021)
work page 2021
-
[18]
In: Proceedings of the IEEE/CVF international conference on computer vision
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6569–6578 (2019)
work page 2019
-
[19]
IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)
Feng, W., Wang, F., Han, R., Gan, Y., Qian, Z., Hou, J., Wang, S.: Unveiling the power of self-supervision for multi-view multi-human association and tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)
work page 2024
-
[20]
Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence30(2), 267–282 (2008).https://doi.org/10.1109/TPAMI.2007.1174
-
[21]
Gan, Y., Han, R., Yin, L., Feng, W., Wang, S.: Self-supervised multi-view multi- human association and tracking. Proceedings of the 29th ACM International Con- ference on Multimedia (2021),https://api.semanticscholar.org/CorpusID: 239011901
work page 2021
-
[22]
YOLOX: Exceeding YOLO Series in 2021
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
work page internal anchor Pith review arXiv 2021
-
[23]
Gilbert, A., Bowden, R.: Tracking objects across cameras by incrementally learn- ing inter-camera colour calibration and patterns of activity. In: Computer Vision– ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part II 9. pp. 125–136. Springer (2006)
work page 2006
- [24]
-
[25]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Gu, J., Hu, C., Zhang, T., Chen, X., Wang, Y., Wang, Y., Zhao, H.: Vip3d: End- to-end visual trajectory prediction via 3d agent queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5496– 5506 (2023)
work page 2023
-
[26]
IEEE Transactions on Pattern Analysis and Machine Intelligence44(9), 5225–5242 (2021)
Han, R., Feng, W., Zhang, Y., Zhao, J., Wang, S.: Multiple human association and tracking from egocentric and complementary top views. IEEE Transactions on Pattern Analysis and Machine Intelligence44(9), 5225–5242 (2021)
work page 2021
-
[27]
In: Proceedings of the AAAI Con- ference on Artificial Intelligence
Han, R., Feng, W., Zhao, J., Niu, Z., Zhang, Y., Wan, L., Wang, S.: Complementary-view multiple human tracking. In: Proceedings of the AAAI Con- ference on Artificial Intelligence. vol. 34, pp. 10917–10924 (2020)
work page 2020
-
[28]
2022 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR) pp
He, K., Chen, X., Xie, S., Li, Y., Doll’ar, P., Girshick, R.B.: Masked autoen- coders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR) pp. 15979–15988 (2021),https://api. semanticscholar.org/CorpusID:243985980
work page 2022
-
[29]
2020 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) pp
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for un- supervised visual representation learning. 2020 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) pp. 9726–9735 (2019),https: //api.semanticscholar.org/CorpusID:207930212 6 R. Xian et al
work page 2020
-
[30]
IEEE Transactions on Image Processing29, 5191– 5205 (2020)
He,Y.,Wei,X.,Hong,X.,Shi,W.,Gong,Y.:Multi-targetmulti-cameratrackingby tracklet-to-target assignment. IEEE Transactions on Image Processing29, 5191– 5205 (2020)
work page 2020
-
[31]
IEEE Transactions on Pattern Analysis and Machine Intelligence46, 2506–2517 (2022),https://api
Huang, Z., Jin, X., Lu, C., Hou, Q., Cheng, M.M., Fu, D., Shen, X., Feng, J.: Contrastive masked autoencoders are stronger vision learners. IEEE Transactions on Pattern Analysis and Machine Intelligence46, 2506–2517 (2022),https://api. semanticscholar.org/CorpusID:251105242
work page 2022
-
[32]
In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)
Javed, O., Shafique, K., Shah, M.: Appearance modeling for tracking in multiple non-overlapping cameras. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). vol. 2, pp. 26–33. IEEE (2005)
work page 2005
-
[33]
ArXivabs/1811.09795(2018),https://api
Kim, D., Cho, D., Kweon, I.S.: Self-supervised video representation learning with space-time cubic puzzles. ArXivabs/1811.09795(2018),https://api. semanticscholar.org/CorpusID:53762354
-
[34]
In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)
Kim, D., Cho, D., Yoo, D., Kweon, I.S.: Learning image representations by com- pleting damaged jigsaw puzzles. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 793–802. IEEE (2018)
work page 2018
-
[35]
Naval research logistics quarterly2(1-2), 83–97 (1955)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly2(1-2), 83–97 (1955)
work page 1955
-
[36]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Larsson, G., Maire, M., Shakhnarovich, G.: Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6874–6883 (2017)
work page 2017
-
[37]
In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: Siamese cnn for robust target association. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 33–40 (2016)
work page 2016
-
[38]
2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR) pp
Lu, Z., Shuai, B., Chen, Y., Xu, Z., Modolo, D.: Self-supervised multi-object tracking with path consistency. 2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR) pp. 19016–19026 (2024),https://api. semanticscholar.org/CorpusID:269005039
work page 2024
-
[39]
International journal of computer vision129, 548–578 (2021)
Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixé, L., Leibe, B.: Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision129, 548–578 (2021)
work page 2021
-
[40]
In: Proceedings of the IEEE international conference on computer vision
Maksai, A., Wang, X., Fleuret, F., Fua, P.: Non-markovian globally consistent multi-object tracking. In: Proceedings of the IEEE international conference on computer vision. pp. 2544–2554 (2017)
work page 2017
-
[41]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: Multi- object tracking with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8844–8854 (2022)
work page 2022
-
[42]
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery2(1), 86– 97 (2012)
Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery2(1), 86– 97 (2012)
work page 2012
-
[43]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Nguyen, D.M., Henschel, R., Rosenhahn, B., Sonntag, D., Swoboda, P.: Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object track- ing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8866–8875 (2022)
work page 2022
-
[44]
arXiv preprint arXiv:2408.13243 (2024)
Niculescu-Mizil, A., Patel, D., Melvin, I.: Mctr: Multi camera tracking transformer. arXiv preprint arXiv:2408.13243 (2024)
-
[45]
ArXivabs/1603.09246(2016),https://api.semanticscholar
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. ArXivabs/1603.09246(2016),https://api.semanticscholar. org/CorpusID:187547 CalibFree 7
-
[46]
Representation Learning with Contrastive Predictive Coding
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive pre- dictive coding. ArXivabs/1807.03748(2018),https://api.semanticscholar. org/CorpusID:49670925
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[47]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Pang, Z., Li, J., Tokmakov, P., Chen, D., Zagoruyko, S., Wang, Y.X.: Standing between past and future: Spatio-temporal modeling for multi-camera 3d multi- object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17928–17938 (2023)
work page 2023
-
[48]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context en- coders: Feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2536–2544 (2016)
work page 2016
- [49]
-
[50]
Quach, K.G., Nguyen, P., Le, H., Truong, T.D., Duong, C.N., Tran, M.T., Luu, K.: Dyglip: A dynamic graph model with link prediction for accurate multi-camera multipleobjecttracking.In:ProceedingsoftheIEEE/CVFconferenceoncomputer vision and pattern recognition. pp. 13784–13793 (2021)
work page 2021
-
[51]
In: European conference on computer vision
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision. pp. 17–35. Springer (2016)
work page 2016
-
[52]
2018 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition pp
Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. 2018 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition pp. 6036–6046 (2018),https://api.semanticscholar.org/ CorpusID:4462331
work page 2018
-
[53]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Schulter, S., Vernaza, P., Choi, W., Chandraker, M.: Deep network flow for multi- object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6951–6960 (2017)
work page 2017
-
[54]
arXiv preprint arXiv:2012.15460 (2020) 19
Sun, P., Cao, J., Jiang, Y., Zhang, R., Xie, E., Yuan, Z., Wang, C., Luo, P.: Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020)
-
[55]
arXiv preprint arXiv:1706.06196 (2017)
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017)
-
[56]
Inter- national Journal of Computer Vision127, 1303–1320 (2019)
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. Inter- national Journal of Computer Vision127, 1303–1320 (2019)
work page 2019
-
[57]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
work page 2023
-
[58]
In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M
Wang, J., Jiao, J., Liu, Y.H.: Self-supervised video representation learning by pace prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision – ECCV 2020. pp. 504–521. Springer International Publishing, Cham (2020)
work page 2020
-
[59]
ArXivabs/2104.12807 (2021),https://api.semanticscholar.org/CorpusID:233407605
Wang, L., Luc, P., Recasens, A., Alayrac, J.B., van den Oord, A.: Multimodal self-supervised learning of general audio representations. ArXivabs/2104.12807 (2021),https://api.semanticscholar.org/CorpusID:233407605
-
[60]
IEEE Transactions on knowledge and data engineering25(6), 1336–1353 (2012)
Wang, Y.X., Zhang, Y.J.: Nonnegative matrix factorization: A comprehensive re- view. IEEE Transactions on knowledge and data engineering25(6), 1336–1353 (2012)
work page 2012
- [61]
-
[62]
In: 2017 IEEE international conference on image processing (ICIP)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP). pp. 3645–3649. IEEE (2017)
work page 2017
-
[63]
12347–12356 (2021),https://api
Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment:Anonlinemulti-objecttracker.2021IEEE/CVFConferenceonComputer Vision and Pattern Recognition (CVPR) pp. 12347–12356 (2021),https://api. semanticscholar.org/CorpusID:232240682
work page 2021
-
[64]
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2.https:// github.com/facebookresearch/detectron2(2019)
work page 2019
-
[65]
In: Proceedings of the IEEE/CVF international conference on computer vision
Xu, J., Cao, Y., Zhang, Z., Hu, H.: Spatial-temporal relation networks for multi- object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3988–3998 (2019)
work page 2019
-
[66]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Xu, Y., Osep, A., Ban, Y., Horaud, R., Leal-Taixé, L., Alameda-Pineda, X.: How to train your deep multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6787–6796 (2020)
work page 2020
-
[67]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Xu, Y., Liu, X., Liu, Y., Zhu, S.C.: Multi-view people tracking via hierarchical tra- jectory composition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
work page 2016
-
[68]
In: Conference on Multimedia Modeling (2023),https: //api.semanticscholar.org/CorpusID:257986445
Yin, Y., Hua, Y., Song, T., Ma, R., Guan, H.: Self-supervised multi-object tracking with cycle-consistency. In: Conference on Multimedia Modeling (2023),https: //api.semanticscholar.org/CorpusID:257986445
work page 2023
-
[69]
arXiv preprint arXiv:2003.11753 (2020)
You, Q., Jiang, H.: Real-time 3d deep multi-camera tracking. arXiv preprint arXiv:2003.11753 (2020)
-
[70]
In: European Conference on Computer Vision
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: Motr: End-to-end multiple-object tracking with transformer. In: European Conference on Computer Vision. pp. 659–675. Springer (2022)
work page 2022
-
[71]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhang,T.,Chen,X.,Wang,Y.,Wang,Y.,Zhao,H.:Mutr3d:Amulti-cameratrack- ing framework via 3d-to-2d queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4537–4546 (2022)
work page 2022
-
[72]
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X.: Bytetrack: Multi-object tracking by associating every detection box (2022)
work page 2022
-
[73]
In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition
Zhang, Y., Wang, T., Zhang, X.: Motrv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 22056–22065 (2023)
work page 2023
-
[74]
In: European Conference on Computer Vision
Zhao, Z., Wu, Z., Zhuang, Y., Li, B., Jia, J.: Tracking objects as pixel-wise distri- butions. In: European Conference on Computer Vision. pp. 76–94. Springer (2022)
work page 2022
-
[75]
CoRRabs/1711.10295(2017),http://arxiv.org/abs/ 1711.10295
Zhong, Z., Zheng, L., Zheng, Z., Li, S., Yang, Y.: Camera style adaptation for person re-identification. CoRRabs/1711.10295(2017),http://arxiv.org/abs/ 1711.10295
-
[76]
ArXiv abs/2004.01177(2020),https : / / api
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. ArXiv abs/2004.01177(2020),https : / / api . semanticscholar . org / CorpusID : 214775104
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.