Recognition: 2 theorem links
· Lean TheoremYOLOX: Exceeding YOLO Series in 2021
Pith reviewed 2026-05-13 10:27 UTC · model grok-4.3
The pith
YOLOX turns YOLO detectors anchor-free with a decoupled head and SimOTA assignment to reach higher accuracy at real-time speeds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Switching YOLO to anchor-free detection, adding a decoupled classification-regression head, and replacing prior label assignment with SimOTA yields consistent gains across model scales, reaching 50.0 percent AP on COCO for YOLOX-L at 68.9 FPS on Tesla V100, which exceeds YOLOv5-L by 1.8 percent AP while also topping the CVPR 2021 Streaming Perception Challenge with one model.
What carries the argument
Anchor-free center-point prediction paired with a decoupled head and SimOTA label assignment, which dynamically matches positive samples via optimal transport to improve training stability and final accuracy.
If this is right
- YOLOX variants deliver better accuracy-speed trade-offs than prior YOLO models at every size from 0.9 M parameters upward.
- A single YOLOX-L model suffices to win streaming perception benchmarks without ensemble methods.
- The architecture supports direct export to ONNX, TensorRT, NCNN, and OpenVINO for deployment.
Where Pith is reading between the lines
- The same three changes could raise accuracy in other single-stage detectors that still rely on anchors.
- In video pipelines the higher frame rate and accuracy together reduce the need for separate tracking modules.
- Because the gains hold across scales, the approach may generalize to new backbone families without redesigning the head.
Load-bearing premise
The reported accuracy lifts come mainly from the anchor-free shift, decoupled head, and SimOTA rather than from any extra training epochs, data augmentation, or hyperparameter tuning that differs from the YOLOv4 and YOLOv5 baselines.
What would settle it
Train a YOLOv5-L model from scratch using exactly the same data augmentations, optimizer schedule, and hyperparameters reported for YOLOX-L, then measure whether the 1.8 percent AP gap on COCO disappears.
read the original abstract
In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoupled head and the leading label assignment strategy SimOTA to achieve state-of-the-art results across a large scale range of models: For YOLO-Nano with only 0.91M parameters and 1.08G FLOPs, we get 25.3% AP on COCO, surpassing NanoDet by 1.8% AP; for YOLOv3, one of the most widely used detectors in industry, we boost it to 47.3% AP on COCO, outperforming the current best practice by 3.0% AP; for YOLOX-L with roughly the same amount of parameters as YOLOv4-CSP, YOLOv5-L, we achieve 50.0% AP on COCO at a speed of 68.9 FPS on Tesla V100, exceeding YOLOv5-L by 1.8% AP. Further, we won the 1st Place on Streaming Perception Challenge (Workshop on Autonomous Driving at CVPR 2021) using a single YOLOX-L model. We hope this report can provide useful experience for developers and researchers in practical scenes, and we also provide deploy versions with ONNX, TensorRT, NCNN, and Openvino supported. Source code is at https://github.com/Megvii-BaseDetection/YOLOX.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces YOLOX, an anchor-free reformulation of the YOLO detector that incorporates a decoupled classification/regression head and the SimOTA label assignment strategy. It reports COCO results across scales, including YOLOX-Nano (0.91M params, 25.3% AP), an improved YOLOv3 (47.3% AP), and YOLOX-L (50.0% AP at 68.9 FPS on V100, exceeding YOLOv5-L by 1.8% AP at comparable parameters), plus first place in the CVPR 2021 Streaming Perception Challenge; code and deployment support (ONNX, TensorRT, NCNN, OpenVINO) are released.
Significance. If the reported gains are attributable to the architectural changes rather than training-protocol differences, the work supplies a practical, high-performance real-time detector that updates the widely used YOLO family with modern components while maintaining strong speed-accuracy trade-offs. The open-source release and deployment tools directly support reproducibility and industrial adoption.
major comments (1)
- [Abstract and experimental results] Abstract and experimental results section: the central claim that YOLOX-L exceeds YOLOv5-L by 1.8% AP rests on comparisons that use a 300-epoch schedule with Mosaic+MixUp augmentations for YOLOX but do not retrain the YOLOv5 architecture under the identical recipe; internal ablations vary components inside YOLOX only, leaving the fraction of the AP delta due to schedule/hyperparameter differences unquantified and weakening attribution to the anchor-free design, decoupled head, and SimOTA.
minor comments (2)
- [Abstract] Abstract: state the exact parameter count and FLOPs for YOLOX-L to enable immediate side-by-side comparison with the cited YOLOv4-CSP and YOLOv5-L baselines.
- [Methods/experimental setup] Methods or experimental setup: explicitly tabulate the training schedule, augmentation pipeline, and optimizer settings used for YOLOX versus those reported in the original YOLOv5 and YOLOv4 papers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the experimental comparisons. We address the major comment point-by-point below and propose targeted revisions to improve clarity on attribution.
read point-by-point responses
-
Referee: [Abstract and experimental results] Abstract and experimental results section: the central claim that YOLOX-L exceeds YOLOv5-L by 1.8% AP rests on comparisons that use a 300-epoch schedule with Mosaic+MixUp augmentations for YOLOX but do not retrain the YOLOv5 architecture under the identical recipe; internal ablations vary components inside YOLOX only, leaving the fraction of the AP delta due to schedule/hyperparameter differences unquantified and weakening attribution to the anchor-free design, decoupled head, and SimOTA.
Authors: We agree that the comparison would be stronger with a controlled re-training of YOLOv5-L under the exact 300-epoch Mosaic+MixUp schedule used for YOLOX. The reported YOLOv5-L numbers are taken directly from the official YOLOv5 repository (using its recommended protocol), while our ablations isolate the effect of each YOLOX component (anchor-free, decoupled head, SimOTA) within a fixed training recipe. In the revised version we will (1) explicitly state the training-protocol differences in the experimental section and abstract, (2) add a short paragraph quantifying the contribution of our components via the existing ablations, and (3) include a new row showing a YOLOv3 baseline trained with the same 300-epoch recipe for reference. These changes clarify attribution without requiring a full external re-implementation. revision: partial
Circularity Check
No circularity: empirical results rest on direct experimental comparisons
full rationale
The paper's central claims consist of reported AP and FPS numbers obtained by training modified YOLO architectures (anchor-free, decoupled head, SimOTA label assignment) under a stated 300-epoch schedule with Mosaic+MixUp. These are direct empirical measurements against published baseline numbers for YOLOv5-L, YOLOv4-CSP, etc.; no equations, predictions, or first-principles derivations are presented that reduce to fitted parameters or self-citations by construction. Internal ablations vary components inside the YOLOX recipe but do not create self-referential loops. The result is therefore self-contained against external benchmarks and receives score 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption COCO dataset annotations and evaluation protocol are accurate and representative for object-detection performance measurement
Forward citations
Cited by 28 Pith papers
-
Clip-level Uncertainty and Temporal-aware Active Learning for End-to-End Multi-Object Tracking
CUTAL scores multi-frame clips for uncertainty and enforces temporal diversity to train transformer MOT models to near full-supervision performance with 50% of the labels.
-
LAMP: Localization Aware Multi-camera People Tracking in Metric 3D World
LAMP tracks 3D human motion from moving multi-camera headsets by converting 2D detections to a unified metric 3D world frame via device localization and fitting with an end-to-end spatio-temporal transformer.
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional ani...
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos using a production knowledge taxonomy, dual-channel conditioning, style-motion curriculum, and deformation-aware preference optimization, outperforming baselines in animator evaluation...
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos using a structured taxonomy of artistic production variables, dual-channel conditioning, a style-motion curriculum, and deformation-aware optimization to prioritize art over physics.
-
WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects
WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
-
A Data Efficiency Study of Synthetic Fog for Object Detection Using the Clear2Fog Pipeline
Clear2Fog generates realistic synthetic fog from clear scenes, enabling mixed-density training that outperforms full fixed-density data and improves real-world performance by 1.67 mAP after learning-rate adjustment.
-
CalibFree: Self-Supervised View Feature Separation for Calibration-Free Multi-Camera Multi-Object Tracking
CalibFree enables calibration-free multi-camera tracking via self-supervised feature separation through single-view distillation and cross-view reconstruction, reporting 3% higher accuracy and 7.5% better F1 on tested...
-
FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging
FUN is an end-to-end Focal U-Net that performs joint hyperspectral image reconstruction and object detection via multi-task learning with focal modulation, achieving SOTA results with 40% fewer parameters and a new 36...
-
GateMOT: Q-Gated Attention for Dense Object Tracking
GateMOT proposes Q-Gated Attention to enable linear-complexity, spatially aware attention for state-of-the-art dense object tracking on benchmarks like BEE24.
-
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
-
Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization
VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
-
Scale-Gest: Scalable Model-Space Synthesis and Runtime Selection for On-Device Gesture Detection
Scale-Gest creates a runtime-selectable family of tiny-YOLO models with device-calibrated ACE profiles and an ROI gate that cuts per-frame energy by 4x while holding event-level F1 at 0.8-0.9 on a new driving-gesture dataset.
-
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
DINO reaches 51.3 AP on COCO val2017 with a ResNet-50 backbone after 24 epochs, a +2.7 AP gain over the prior best DETR variant.
-
Portable Active Learning for Object Detection
PAL is a portable active learning method for object detection that uses class-specific logistic classifiers for uncertainty and image-level diversity to select annotation batches, showing better label efficiency than ...
-
Utility-Aware Progressive Inference over UDP Packet Blocks for Emergency Communications
Utility-aware progressive inference on UDP packet blocks enables early hazard recognition, reducing packet budget by 34.2% and decision delay by 1209 ms while retaining 91.5% of full-reception accuracy.
-
SAMOFT: Robust Multi-Object Tracking via Region and Flow
SAMOFT improves multi-object tracking by using SAM segmentation and optical flow for pixel-level motion matching, flexible centroid correction, and training-free motion pattern fixes on top of standard Kalman and ReID...
-
Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking
TCMP achieves SOTA MOT metrics (HOTA 63.4%, IDF1 65.0%, AssA 49.1%) with 0.014x parameters and 0.05x FLOPs of the previous best method by using a simple dilated TCN regressor.
-
SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance
SocialMirror reconstructs 3D meshes of closely interacting humans from monocular videos using semantic guidance from vision-language models and geometric constraints in a diffusion model to handle occlusions and maint...
-
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
HyperSSM integrates hypergraphs and state space models to let correlated objects mutually refine motion estimates, stabilizing trajectories under noise and occlusion for state-of-the-art multi-object tracking.
-
Empirical Evaluation of PDF Parsing and Chunking for Financial Question Answering with RAG
Systematic tests show that specific PDF parsers combined with overlapping chunking strategies better preserve structure and improve RAG answer correctness on financial QA benchmarks including the new TableQuest dataset.
-
Hierarchical Prompting with Dual LLM Modules for Robotic Task and Motion Planning
A dual-LLM hierarchical framework for robotic task and motion planning, integrating object detection, achieves 86% success across 24 test scenarios ranging from simple spatial commands to infeasible requests.
-
Hybrid Visual Telemetry for Bandwidth-Constrained Robotic Vision: A Pilot Study with HEVC Base Video and JPEG ROI Stills
A hybrid scheme using HEVC video for continuous awareness plus selective JPEG ROI stills for detail refinement is formalized and experimentally compared to video-only transmission under matched bitrate budgets for rob...
-
Fast Online 3D Multi-Camera Multi-Object Tracking and Pose Estimation
An efficient implementation of a Bayes-optimal filter performs fast 3D multi-camera tracking and pose estimation from 2D inputs while handling intermittent camera disconnections.
-
InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoard
InsightBoard integrates synchronized multi-metric plots, correlation analysis, and group fairness indicators into TensorBoard to reveal subgroup disparities that aggregate metrics hide during model training.
-
World Simulation with Video Foundation Models for Physical AI
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
-
4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview
The report overviews five maritime computer vision benchmark challenges, their datasets, protocols, quantitative results, and top team approaches from the MaCVi 2026 workshop.
-
YOLOv11 Demystified: A Practical Guide to High-Performance Object Detection
YOLOv11 delivers higher mean average precision on standard benchmarks than prior YOLO versions while keeping real-time inference speed through C3K2, SPPF, and C2PSA modules.
Reference graph
Works this paper leans on
-
[1]
YOLOv4: Optimal Speed and Accuracy of Object Detection
Alexey Bochkovskiy, Chien-Yao Wang, and Hong- Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020. 1, 2, 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[2]
End-to- end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. In ECCV, 2020. 1, 4
work page 2020
-
[3]
You only look one-level feature
Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, and Jian Sun. You only look one-level feature. In CVPR, 2021. 3
work page 2021
-
[4]
Ota: Optimal transport assignment for object detection
Zheng Ge, Songtao Liu, Zeming Li, Osamu Yoshie, and Jian Sun. Ota: Optimal transport assignment for object detection. In CVPR, 2021. 1, 4
work page 2021
-
[5]
Lla: Loss-aware label assignment for dense pedestrian detection
Zheng Ge, Jianfeng Wang, Xin Huang, Songtao Liu, and Os- amu Yoshie. Lla: Loss-aware label assignment for dense pedestrian detection. arXiv preprint arXiv:2101.04307 ,
-
[6]
Simple 6 copy-paste is a strong data augmentation method for instance segmentation
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung- Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple 6 copy-paste is a strong data augmentation method for instance segmentation. In CVPR, 2021. 5
work page 2021
-
[7]
glenn jocher et al. yolov5. https://github.com/ ultralytics/yolov5, 2021. 1, 2, 3, 5, 6
work page 2021
-
[8]
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large mini- batch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. 2
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR,
-
[10]
Dauphin Yann, and David Lopez-Paz
Zhang Hongyi, Cisse Moustapha, N. Dauphin Yann, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. ICLR, 2018. 3
work page 2018
-
[11]
Pp-yolov2: A practical object detector
Xin Huang, Xinxin Wang, Wenyu Lv, Xiaying Bai, Xiang Long, Kaipeng Deng, Qingqing Dang, Shumin Han, Qiwen Liu, Xiaoguang Hu, et al. Pp-yolov2: A practical object detector. arXiv preprint arXiv:2104.10419, 2021. 3, 6
-
[12]
Probabilistic anchor assign- ment with iou prediction for object detection
Kang Kim and Hee Seok Lee. Probabilistic anchor assign- ment with iou prediction for object detection. In ECCV,
-
[13]
Parallel feature pyra- mid network for object detection
Seung-Wook Kim, Hyong-Keun Kook, Jee-Young Sun, Mun-Cheon Kang, and Sung-Jea Ko. Parallel feature pyra- mid network for object detection. In ECCV, 2018. 2
work page 2018
-
[14]
Cornernet: Detecting objects as paired keypoints
Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In ECCV, 2018. 1, 3
work page 2018
-
[15]
Mengtian Li, Yuxiong Wang, and Deva Ramanan. Towards streaming perception. In ECCV, 2020. 5, 6
work page 2020
-
[16]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In ICCV,
-
[17]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 2
work page 2014
-
[18]
Learning spa- tial fusion for single-shot object detection
Songtao Liu, Di Huang, and Yunhong Wang. Learning spa- tial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516, 2019. 6
-
[19]
Path aggregation network for instance segmentation
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network for instance segmentation. In CVPR, 2018. 2, 5
work page 2018
-
[20]
Path aggregation network for instance segmentation
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network for instance segmentation. In CVPR, 2018. 2
work page 2018
-
[21]
ICCV, 2021.https://arxiv.or g/abs/2103.14030 35 Supplementary Material S1
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin trans- former: Hierarchical vision transformer using shifted win- dows. arXiv preprint arXiv:2103.14030, 2021. 5
-
[22]
Iqdet: Instance-wise quality distribution sampling for object detec- tion
Yuchen Ma, Songtao Liu, Zeming Li, and Jian Sun. Iqdet: Instance-wise quality distribution sampling for object detec- tion. In CVPR, 2021. 1, 4
work page 2021
-
[23]
You only look once: Unified, real-time object de- tection
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. In CVPR, 2016. 1
work page 2016
-
[24]
Yolo9000: Better, faster, stronger
Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. In CVPR, 2017. 1, 3
work page 2017
-
[25]
YOLOv3: An Incremental Improvement
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. 1, 2, 3
work page internal anchor Pith review arXiv 2018
-
[26]
Faster r-cnn: Towards real-time object detection with region proposal networks
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015. 1
work page 2015
-
[27]
Revisiting the sibling head in object detector
Guanglu Song, Yu Liu, and Xiaogang Wang. Revisiting the sibling head in object detector. In CVPR, 2020. 2
work page 2020
-
[28]
Efficientdet: Scalable and efficient object detection
Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In CVPR, 2020. 6
work page 2020
-
[29]
Fcos: Fully convolutional one-stage object detection
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: Fully convolutional one-stage object detection. In ICCV,
-
[30]
Scaled-yolov4: Scaling cross stage partial network
Chien-Yao Wang, Alexey Bochkovskiy, and Hong- Yuan Mark Liao. Scaled-yolov4: Scaling cross stage partial network. arXiv preprint arXiv:2011.08036, 2020. 1, 5, 6
-
[31]
Cspnet: A new backbone that can enhance learning capability of cnn
Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. Cspnet: A new backbone that can enhance learning capability of cnn. In CVPR workshops, 2020. 2, 5
work page 2020
-
[32]
End-to-end object detection with fully convolutional network
Jianfeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, and Nanning Zheng. End-to-end object detection with fully convolutional network. In CVPR, 2020. 1
work page 2020
-
[33]
End-to-end object detection with fully convolutional network
Jianfeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, and Nanning Zheng. End-to-end object detection with fully convolutional network. In CVPR, 2021. 4
work page 2021
-
[35]
Rethinking classification and localization for object detection
Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, and Yun Fu. Rethinking classification and localization for object detection. In CVPR, 2020. 2
work page 2020
-
[36]
Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z Li. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In CVPR, 2020. 1, 4
work page 2020
-
[37]
Freeanchor: Learning to match anchors for vi- sual object detection
Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, and Qixiang Ye. Freeanchor: Learning to match anchors for vi- sual object detection. In NeurIPS, 2019. 1, 4
work page 2019
-
[38]
Bag of freebies for training object de- tection neural networks
Zhi Zhang, Tong He, Hang Zhang, Zhongyuan Zhang, Jun- yuan Xie, and Mu Li. Bag of freebies for training object de- tection neural networks. arXiv preprint arXiv:1902.04103 ,
-
[39]
Object detection made simpler by eliminating heuristic nms
Qiang Zhou, Chaohui Yu, Chunhua Shen, Zhibin Wang, and Hao Li. Object detection made simpler by eliminating heuristic nms. arXiv preprint arXiv:2101.11782, 2021. 1, 4
-
[40]
Xingyi Zhou, Dequan Wang, and Philipp Kr ¨ahenb¨uhl. Ob- jects as points. arXiv preprint arXiv:1904.07850 , 2019. 1, 3
-
[41]
Autoassign: Differ- entiable label assignment for dense object detection
Benjin Zhu, Jianfeng Wang, Zhengkai Jiang, Fuhang Zong, Songtao Liu, Zeming Li, and Jian Sun. Autoassign: Differ- entiable label assignment for dense object detection. arXiv preprint arXiv:2007.03496, 2020. 1, 4 7
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.