How much real data do we actually need: Analyzing object detection performance using synthetic and real data
Pith reviewed 2026-05-24 20:47 UTC · model grok-4.3
The pith
Domain similarity between synthetic and real datasets predicts how well mixed training sets perform on real test images for object detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors argue that by measuring domain similarity across multiple synthetic and real datasets and testing mixed training regimes for object detection, one can design a methodological procedure that indicates how much real data is actually needed to reach target performance on real test images.
What carries the argument
Domain similarity measurement between synthetic and real datasets, used to predict and guide the composition of mixed training sets for object detectors.
If this is right
- Synthetic data can substitute for a substantial fraction of real annotated data while preserving detection performance on real tests when domain similarity is high.
- Limited real data combined with domain-similar synthetic data yields better real-test results than either alone.
- A repeatable procedure emerges for selecting dataset mixes based on similarity scores rather than trial-and-error.
Where Pith is reading between the lines
- The same similarity-guided mixing logic could be tested on other vision tasks such as segmentation or pose estimation.
- If the procedure generalizes, it would reduce annotation budgets for new detection domains by prioritizing collection of only the most dissimilar real samples.
- A follow-up test could apply the derived procedure to an unseen pair of synthetic and real datasets and check whether predicted performance matches observed performance.
Load-bearing premise
That measurable domain similarity between the chosen synthetic and real datasets will reliably predict how well mixed training sets will perform on real test images.
What would settle it
An experiment showing two pairs of synthetic-real datasets with nearly identical domain similarity scores but markedly different accuracy outcomes when the same proportion of real data is mixed in.
Figures
read the original abstract
In recent years, deep learning models have resulted in a huge amount of progress in various areas, including computer vision. By nature, the supervised training of deep models requires a large amount of data to be available. This ideal case is usually not tractable as the data annotation is a tremendously exhausting and costly task to perform. An alternative is to use synthetic data. In this paper, we take a comprehensive look into the effects of replacing real data with synthetic data. We further analyze the effects of having a limited amount of real data. We use multiple synthetic and real datasets along with a simulation tool to create large amounts of cheaply annotated synthetic data. We analyze the domain similarity of each of these datasets. We provide insights about designing a methodological procedure for training deep networks using these datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines the effects of replacing portions of real training data with synthetic data for object detection, analyzes domain similarity across multiple real and synthetic datasets (including a simulation tool), and derives methodological insights for training deep networks when real annotated data is limited.
Significance. If the domain-similarity analysis reliably predicts mixed-training performance gains, the work would supply practical guidance on data mixing strategies that could reduce annotation costs in computer vision pipelines.
major comments (1)
- [Results] Results section (performance curves and similarity analysis): the manuscript presents domain similarity values alongside separate mAP curves for mixed training sets but does not report any correlation, regression, or ranking test demonstrating that the chosen similarity measure predicts or orders the observed performance deltas when real data is partially replaced by synthetic data. This missing link is load-bearing for the central claim that the analysis yields a generalizable methodological procedure.
minor comments (1)
- [Abstract] Abstract: the description of experiments across datasets mentions no quantitative results, error bars, or dataset exclusion criteria, which reduces the standalone informativeness of the abstract.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the single major comment below.
read point-by-point responses
-
Referee: [Results] Results section (performance curves and similarity analysis): the manuscript presents domain similarity values alongside separate mAP curves for mixed training sets but does not report any correlation, regression, or ranking test demonstrating that the chosen similarity measure predicts or orders the observed performance deltas when real data is partially replaced by synthetic data. This missing link is load-bearing for the central claim that the analysis yields a generalizable methodological procedure.
Authors: We agree that the manuscript would be strengthened by an explicit statistical demonstration that the domain similarity measure predicts or ranks the observed mAP changes. The current text derives qualitative insights from the juxtaposition of similarity values and performance curves but does not include a correlation, regression, or ranking test. In the revision we will add this analysis to the Results section: we will compute and report Spearman rank correlations (and, where appropriate, linear regression R²) between the similarity scores and the mAP deltas for each real-to-synthetic replacement ratio across the evaluated datasets. These quantitative results will be presented alongside the existing figures to directly support the claim of a generalizable procedure. revision: yes
Circularity Check
Empirical dataset comparison contains no circular derivations or self-referential predictions
full rationale
The paper performs an empirical analysis by training object detectors on varying mixtures of existing synthetic and real datasets, measuring mAP, and computing domain similarity statistics between those fixed datasets. No equations, fitted parameters, or predictions are defined in terms of the target performance quantities; the reported insights follow directly from the experimental outcomes on held-out real test images. No self-citation chains or uniqueness theorems are invoked to justify the procedure, and the work does not rename known results or smuggle ansatzes. The central claim therefore remains independent of its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
nuScenes: A multimodal dataset for autonomous driving
Caesar, H., Bankiti, V ., Lang, A. H., V ora, S., Liong, V . E., Xu, Q., Krishnan, A., Pan, Y ., Baldan, G., and Beijbom, O. nuscenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027,
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[2]
Sensor Transfer: Learning Optimal Sensor Effect Image Augmentation for Sim-to-Real Domain Adaptation
URL http://arxiv.org/abs/ 1809.06256. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834848, Apr 2018a. Chen, P.-R., Lo, S.-Y ., Hang, H.-M., Chan, ...
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Simulating LiDAR point cloud for autonomous driving using real-world scenes and traffic flows,
Fang, J., Yan, F., Zhao, T., Zhang, F., Zhou, D., Yang, R., Ma, Y ., and Wang, L. Simulating lidar point cloud for autonomous driving using real-world scenes and traffic flows. arXiv preprint arXiv:1811.07112,
-
[4]
FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation
URL http://arxiv.org/abs/ 1612.02649. Hoffman, J., Tzeng, E., Park, T., Zhu, J.-Y ., Isola, P., Saenko, K., Efros, A. A., and Darrell, T. CyCADA: Cycle- Consistent Adversarial Domain Adaptation
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
CyCADA: Cycle-Consistent Adversarial Domain Adaptation
URL http://arxiv.org/abs/1711.03213. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Speed/accuracy trade-offs for modern con- volutional object detectors
Huang, J., Rathod, V ., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y ., Guadarrama, S., and et al. Speed/accuracy trade-offs for modern con- volutional object detectors. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul
work page 2017
-
[7]
URL http: //arxiv.org/abs/1705.10118. Li, B., Zhang, T., and Xia, T. Vehicle detection from 3d lidar using fully convolutional network. Robotics: Science and Systems XII. Lin, T.-Y ., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollr, P., and Zitnick, C. L. Microsoft coco: Common objects in context. Lecture Notes in Computer Science, pp. 740755,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Few-shot image recognition by predicting parameters from activations
Qiao, S., Liu, C., Shen, W., and Yuille, A. Few-shot image recognition by predicting parameters from activations. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,
work page 2018
-
[9]
Rao, Q. and Frtunikj, J. Deep learning for self-driving cars: chances and challenges. In 2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS) , pp. 35–38. IEEE,
work page 2018
-
[10]
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
URL http://arxiv. org/abs/1603.05279. Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):11371149, Jun
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation
URL http://arxiv.org/abs/1711.06969. Shelhamer, E., Long, J., and Darrell, T. Fully convolu- tional networks for semantic segmentation. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 39 (4):640651, Apr
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Complex-YOLO: Real-time 3D Object Detection on Point Clouds
URL https://arxiv.org/pdf/1803.06199.pdf. Soviany, P. and Ionescu, R. T. Continuous trade-off op- timization between fast and accurate deep face detec- tors
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Learning to Compare: Relation Network for Few-Shot Learning
URL http: //arxiv.org/abs/1711.06025. Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., and Urtasun, R. Multinet: Real-time joint semantic reasoning for autonomous driving. In2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013–1020. IEEE,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Dynamic Graph CNN for Learning on Point Clouds
URL http://arxiv.org/ abs/1801.07829. Wrenninge, M. and Unger, J. Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing
URL http://arxiv.org/abs/1810.08705. Wu, B., Wan, A., Yue, X., and Keutzer, K. Squeeze- Seg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
URL http://arxiv.org/abs/ 1710.07368. Wu, Z., Han, X., Lin, Y . L., Uzunbas, M. G., Goldstein, T., Lim, S. N., and Davis, L. S. DCAN: Dual Channel-Wise Alignment Networks for Unsupervised Scene Adaptation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 535–552,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Learning cross-modal deep representations for robust pedestrian detection
Xu, D., Ouyang, W., Ricci, E., Wang, X., and Sebe, N. Learning cross-modal deep representations for robust pedestrian detection. 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), Jul
work page 2017
-
[18]
Bdd100k: A diverse driving video database with scalable annotation tooling
Yu, F., Xian, W., Chen, Y ., Liu, F., Liao, M., Madhavan, V ., and Darrell, T. Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687,
-
[19]
Fully Convolutional Adaptation Networks for Semantic Segmentation
URL http://arxiv.org/abs/ 1804.08286. Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. Pyramid scene parsing network
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
URL http://arxiv. org/abs/1612.01105
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.