SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data
Pith reviewed 2026-05-22 06:47 UTC · model grok-4.3
The pith
A metric that fuses image appearance similarity with geometric consistency predicts how useful a synthetic dataset will be for real-world computer vision tasks without any model training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SADGE measures synthetic-to-real utility through a constrained bilinear interaction that combines an appearance similarity score with a geometric consistency score, and this combined value shows stronger linear and rank correlations with downstream transfer performance than any appearance-only or geometry-only baseline across the tested benchmarks and tasks.
What carries the argument
Constrained bilinear interaction that fuses an appearance similarity score with a geometric consistency score.
If this is right
- Researchers can rank candidate synthetic datasets by SADGE score and train only on the highest-ranked ones instead of testing every option.
- Synthetic data pipelines can be iterated by adjusting generation parameters and immediately scoring the result against real images to guide improvements.
- Tasks that rely on both visual cues and spatial layout benefit when datasets are selected or created to raise the fused score rather than optimizing appearance or geometry in isolation.
Where Pith is reading between the lines
- The same fusion idea could be tried in other simulation-to-reality settings such as robotics or autonomous driving to forecast which simulated environments will transfer best.
- Data generators might be trained or tuned to directly maximize the SADGE score, potentially producing more efficient synthetic data with less manual tuning.
- Over repeated use, SADGE could help reduce the volume of real labeled data needed by making synthetic substitutes more predictable in advance.
Load-bearing premise
The correlations found on the current set of synthetic generation methods, tasks, and real distributions will continue to hold for new generation techniques, different downstream models, and previously unseen real-world data.
What would settle it
Apply SADGE to a new collection of synthetic datasets paired with real images outside the original benchmarks, train models on each synthetic set, and check whether the SADGE ordering matches the observed real-task performance ordering.
Figures
read the original abstract
We propose SADGE, a quantitative similarity metric that predicts the performance of synthetic image datasets for common computer vision tasks without downstream model training. Estimating whether a synthetic dataset will lead to a model that performs well on real-world data remains a bottleneck in model development. Existing evaluation metrics (e.g., PSNR, FID, CLIP) primarily measure semantic alignment between real and synthetic images (Appearance Similarity Score). Less commonly, structural similarity between images is considered to assess the domain gap (Geometric Similarity Score). However, to the best of our knowledge there exists no studies that evaluate which similarity metric is the best downstream predictor for a given synthetic dataset. In this paper, we show over a wide variety of different synthetic datasets and downstream tasks that neither appearance nor geometry alone can reliably predict downstream performance; rather, it is their non-linear interplay that dictates synthetic data utility. Specifically, we measure how commonly used Appearance and Geometric Similarity metrics computed between synthetic and real images correlate with downstream performance in object detection, semantic segmentation, and pose estimation. Across five public synthetic-to-real benchmark families and 15 dataset-level variants (79k image pairs), SADGE achieves the strongest association with downstream transfer performance under both linear and rank-based criteria, reaching Pearson r=0.88 and Spearman rho=0.77. We compute for each combination of geometry-based methods and appearance-based approaches SADGE scores across all benchmark families. The best configuration is obtained by fusing DINOv3 appearance similarity with MASt3R geometric consistency through a constrained bilinear interaction, outperforming both the strongest geometry-only baseline and the strongest appearance-only baseline .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SADGE, a metric to predict synthetic dataset utility for downstream CV tasks (object detection, semantic segmentation, pose estimation) without training. It claims neither appearance nor geometric similarity alone suffices; instead, their non-linear interplay via constrained bilinear fusion of DINOv3 appearance features and MASt3R geometric consistency yields the strongest predictor. Evaluated across five synthetic-to-real benchmark families, 15 dataset variants, and 79k image pairs, the best SADGE configuration reports Pearson r=0.88 and Spearman rho=0.77, outperforming single-modality baselines.
Significance. If the correlations prove robust after addressing selection and validation concerns, SADGE would offer a practical, training-free tool for synthetic data assessment, filling a gap where existing metrics like FID or PSNR fall short. The empirical finding that fusion outperforms isolated modalities is a concrete contribution, though claims of superiority require safeguards against post-selection bias to support broader adoption.
major comments (2)
- [Abstract] Abstract: The text states that all combinations of geometry- and appearance-based metrics were computed across benchmark families, after which the best configuration (DINOv3 + MASt3R) and its bilinear parameters were selected. Without explicit family-level cross-validation or a held-out benchmark family for parameter fitting and selection, the reported r=0.88 and rho=0.77 are vulnerable to optimistic bias from post-hoc choice among multiple tested variants.
- [Abstract] Abstract and evaluation description: No information is given on the total number of configurations tested, whether bilinear interaction parameters were tuned on the same 79k pairs used for final reporting, or error bars/confidence intervals on the correlation coefficients. This omission makes it impossible to evaluate whether the claimed strongest association reflects genuine predictive power or selection effects.
minor comments (2)
- [Abstract] Abstract: The phrase 'constrained bilinear interaction' is introduced without defining the constraints or the exact form of the fusion function; a brief equation or pseudocode would clarify reproducibility.
- The manuscript would benefit from a table summarizing all tested appearance/geometry combinations and their individual correlations to allow direct comparison with the fused SADGE result.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback on our manuscript. The concerns regarding potential selection bias and lack of details on the experimental configurations are well-taken. We provide point-by-point responses below and outline the revisions we will make to address these issues.
read point-by-point responses
-
Referee: [Abstract] Abstract: The text states that all combinations of geometry- and appearance-based metrics were computed across benchmark families, after which the best configuration (DINOv3 + MASt3R) and its bilinear parameters were selected. Without explicit family-level cross-validation or a held-out benchmark family for parameter fitting and selection, the reported r=0.88 and rho=0.77 are vulnerable to optimistic bias from post-hoc choice among multiple tested variants.
Authors: We agree that selecting the best configuration after evaluating multiple variants on the full set of benchmarks can lead to optimistic bias in the reported correlation values. Our original approach involved computing SADGE scores for all combinations of geometry- and appearance-based metrics across the benchmark families and then identifying the top performer. To rigorously address this, we will revise the evaluation section to include a leave-one-family-out cross-validation scheme for both configuration selection and bilinear parameter tuning. In this setup, the configuration and parameters will be chosen based on four families, and the correlation will be computed on the remaining held-out family. We will report the mean Pearson r and Spearman rho across all such folds, along with their standard deviations. This will provide a more conservative and generalizable estimate of SADGE's predictive performance. revision: yes
-
Referee: [Abstract] Abstract and evaluation description: No information is given on the total number of configurations tested, whether bilinear interaction parameters were tuned on the same 79k pairs used for final reporting, or error bars/confidence intervals on the correlation coefficients. This omission makes it impossible to evaluate whether the claimed strongest association reflects genuine predictive power or selection effects.
Authors: We agree that these details should have been included for full transparency. We will update the manuscript to report the total number of configurations evaluated, which included all combinations of the geometry-based and appearance-based metrics we considered. The bilinear interaction parameters were tuned on the same 79k image pairs used for the final correlation reporting. We will also add 95% confidence intervals for the Pearson r and Spearman rho values, computed using bootstrap resampling. These revisions will allow readers to better gauge the impact of selection effects and the statistical reliability of our results. revision: yes
Circularity Check
SADGE's reported Pearson r=0.88 obtained by selecting best fusion after evaluating all combinations on the same 79k-pair benchmark set
specific steps
-
fitted input called prediction
[Abstract]
"We compute for each combination of geometry-based methods and appearance-based approaches SADGE scores across all benchmark families. The best configuration is obtained by fusing DINOv3 appearance similarity with MASt3R geometric consistency through a constrained bilinear interaction, outperforming both the strongest geometry-only baseline and the strongest appearance-only baseline."
The paper evaluates every geometry-appearance pairing on the full set of five benchmark families (79k pairs), selects the single best configuration and its bilinear parameters, then presents that selected fusion as achieving the highest Pearson r=0.88 and Spearman rho=0.77 on the same data; the reported association is therefore the outcome of the selection step rather than a prediction from a fixed, pre-chosen metric.
full rationale
The paper's central claim is that the non-linear interplay of appearance and geometry (via the chosen SADGE configuration) is the reliable predictor of downstream performance. However, the abstract states that all combinations were computed across the benchmark families and the best configuration (DINOv3 + MASt3R with constrained bilinear interaction) was then obtained and reported as outperforming baselines with r=0.88. This selection and any associated parameter fitting on the identical evaluation data makes the headline correlation a post-selection result rather than an independent test of a pre-specified metric. Individual modality measurements may remain non-circular, but the fused SADGE claim reduces to the fitted choice on the reported data.
Axiom & Free-Parameter Ledger
free parameters (1)
- bilinear interaction parameters
axioms (1)
- domain assumption Similarity metrics computed on image pairs can predict downstream task performance on real data without training the target model.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/BranchSelection.leanRCLCombiner_isCoupling_iff echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We define SADGE as a bilinear interaction because it is the simplest function that captures complementarity while staying monotone and low-capacity. ... SADGE = a Ĝ + b  + c Ĝ  where a, b, c ≥ 0
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
the contribution of geometry similarity grows when appearance similarity is already high
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alaa, Boris van Breugel, Evgeny Saveliev, and Mihaela van der Schaar
Ahmed M. Alaa, Boris van Breugel, Evgeny Saveliev, and Mihaela van der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In Proceedings of the 39th International Conference on Machine Learning (ICML), volume 162, pages 290–306. PMLR, 2022
work page 2022
-
[2]
Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019
work page 2019
-
[3]
Ali Borji. Pros and cons of GAN evaluation measures: New developments.Computer Vision and Image Understanding, 215:103329, 2022
work page 2022
-
[4]
Yohann Cabon, Naila Murray, and Martin Humenberger. Virtual kitti 2, 2020
work page 2020
-
[5]
Michels, Soren Pirk, Chia-Chun Fu, and Wojciech Palubicki
Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hadrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Soren Pirk, Chia-Chun Fu, and Wojciech Palubicki. Generating Diverse Agricultural Data for Vision-Based Farming Applications . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (...
work page 2024
-
[6]
Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5828–5839, 2017
work page 2017
-
[7]
Dataset of industrial metal objects.arXiv preprint, 2022
Peter De Roovere, Steven Moonen, Nick Michiels, and Francis Wyffels. Dataset of industrial metal objects.arXiv preprint, 2022
work page 2022
-
[8]
Superpoint: Self-supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 224–236, 2018
work page 2018
-
[9]
Understanding dataset difficulty with V-usable information
Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta. Understanding dataset difficulty with V-usable information. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162, pages 5988–6008. PMLR, 2022
work page 2022
-
[10]
L. Eversberg and J. Lambrecht. Generating images with physics-based rendering for an industrial object detection task: Realism versus domain randomization.Sensors, 21(23):7901, 2021
work page 2021
-
[11]
Vision meets robotics: The kitti dataset.Int
A Geiger, P Lenz, C Stiller, and R Urtasun. Vision meets robotics: The kitti dataset.Int. J. Rob. Res., 32(11):1231–1237, September 2013
work page 2013
-
[12]
Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam H. Laradji, Hsueh-Ti Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Öztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S...
work page 2022
-
[13]
GANs trained by a two time-scale update rule converge to a local Nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. InAdvances in Neural Information Processing Systems, pages 6626–6637, 2017
work page 2017
-
[14]
T-LESS: An RGB-D dataset for 6d pose estimation of texture-less objects
Tomas Hodan, Pavel Haluza, Stepan Obdrzalek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. T-LESS: An RGB-D dataset for 6d pose estimation of texture-less objects. In2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 880–888, 2017. 10
work page 2017
-
[15]
BOP: Benchmark for 6d object pose estimation
Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, and Carsten Rother. BOP: Benchmark for 6d object pose estimation. InProceedings of the European Conference on Computer Vision (ECCV), 2018
work page 2018
-
[16]
D. Horváth, G. Erd ˝os, Z. Istenes, T. Horváth, and S. Földi. Object detection using sim2real domain randomization for robotic applications.IEEE Transactions on Robotics, 39(2):1225– 1243, 2023
work page 2023
-
[17]
Lawrence Zitnick, and Ross Girshick
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
work page 2017
-
[18]
Meta-sim: Learning to generate synthetic datasets
Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, and Sanja Fidler. Meta-sim: Learning to generate synthetic datasets. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019
work page 2019
-
[19]
Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023
work page 2023
-
[20]
Ching-Yun Ko, Pin-Yu Chen, Jeet Mohapatra, Payel Das, and Luca Daniel. Synbench: Task-agnostic benchmarking of pretrained representations using synthetic data.CoRR, abs/2210.02989, 2022
-
[21]
Improved precision and recall metric for assessing generative models
Tuomas Kynkäanniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. InAdvances in Neural Information Processing Systems, pages 3927–3936, 2019
work page 2019
-
[22]
Ground- ing image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3R: Grounding image matching in 3d.arXiv preprint arXiv:2406.09756, 2024
-
[23]
Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, and Ziwei Liu. Benchmarking and analyzing generative data for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(9):7675–7688, 2025
work page 2025
-
[24]
Lightglue: Local feature matching at light speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17627–17638, 2023
work page 2023
-
[25]
Yingzhou Lu, Huazheng Wang, and Wenqi Wei. Machine learning for synthetic data generation: a review.CoRR, abs/2302.04062, 2023
-
[26]
Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Markus Vincze
Pablo Martinez-Gonzalez, Sergiu Oprea, John A. Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Markus Vincze. Unrealrox+: An improved tool for acquiring synthetic data from virtual 3d environments.CoRR, abs/2104.11776, 2021
-
[27]
Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[28]
John McCormac, Ankur Handa, Stefan Leutenegger, and Andrew J. Davison. Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017
work page 2017
-
[29]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc-Antoine Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaa El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Patrick Labatut, Arman...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Krishna Pillutla, Linyi Liu, John Thickstun, Sean Welleck, Julian McAuley, and Luke Zettle- moyer. MAUVE scores for generative models: Theory and practice.Journal of Machine Learning Research, 24(356):1–92, 2023
work page 2023
-
[31]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceed- ings of the 38th International Conference on Machine Learning (ICML), volume 139, pages...
work page 2021
-
[32]
Infinite photorealistic worlds using procedural generation
Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, Alejandro Newell, Hei Law, Ankit Goyal, Kaiyu Yang, and Jia Deng. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), page...
work page 2023
-
[33]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022
work page 2022
-
[34]
German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M. Lopez. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[35]
Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen
Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. InAdvances in Neural Information Processing Systems, pages 2234–2242, 2016
work page 2016
-
[36]
Hannah Schieber, Kubilay Can Demir, Constantin Kleinbeck, Seung Hee Yang, and Daniel Roth. Indoor synthetic data generation: A systematic review.Computer Vision and Image Understanding, 240:103907, 2024
work page 2024
-
[37]
Rareplanes: Synthetic data takes flight
Jacob Shermeyer, Thomas Hossler, Adam Van Etten, Daniel Hogan, Ryan Lewis, and Daeil Kim. Rareplanes: Synthetic data takes flight. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 207–217, 2021
work page 2021
-
[38]
Indoor segmentation and support inference from RGBD images
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from RGBD images. InProceedings of the European Conference on Computer Vision (ECCV), 2012
work page 2012
-
[39]
Oriane Siméoni, Huy V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julien Ma...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
LoFTR: Detector-free local feature matching with transformers
Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. LoFTR: Detector-free local feature matching with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8922–8931, 2021
work page 2021
-
[41]
Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. Dataset cartography: Mapping and diagnosing datasets with training dynamics. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9275–9293, 2020
work page 2020
-
[42]
Antonio Torralba and Alexei A. Efros. Unbiased look at dataset bias. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011
work page 2011
- [43]
-
[44]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004
work page 2004
-
[45]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. Segformer: simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc
work page 2021
-
[46]
Ayush Zenith, Arnold Zumbrun, Neel Raut, and Jing Lin. Sdqm: Synthetic data quality metric for object detection dataset evaluation.arXiv preprint arXiv:2510.06596, 2025
-
[47]
SigLIP: Sigmoid loss for language image pre-training
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. SigLIP: Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986, 2023
work page 2023
-
[48]
Efros, Eli Shechtman, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018
work page 2018
-
[49]
Datasetgan: Efficient labeled data factory with minimal human effort
Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, and Sanja Fidler. Datasetgan: Efficient labeled data factory with minimal human effort. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10145–10155, 2021
work page 2021
-
[50]
X. Zhu, T. Bilal, P. Mårtensson, L. Hanson, M. Björkman, and A. Maki. Towards sim-to-real industrial parts classification with synthetic dataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4454–4463, 2023
work page 2023
-
[51]
X. Zhu, P. Mårtensson, L. Hanson, M. Björkman, and A. Maki. Automated assembly qual- ity inspection by deep learning with 2d and 3d synthetic cad data.Journal of Intelligent Manufacturing, pages 1–16, 2024. A Implementation Details Unless otherwise stated, the reported SADGE results use the best-performing appearance–geometry pair selected by an exhaustiv...
-
[52]
Therefore, IRB or equivalent approval is not applicable
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.