SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

Bartosz Kotrys; Dominik Michels; Patryk Bartkowiak; Soren Pirk; Wojtek Palubicki

arxiv: 2605.22467 · v1 · pith:YNMOCX7Bnew · submitted 2026-05-21 · 💻 cs.CV

SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

Patryk Bartkowiak , Bartosz Kotrys , Dominik Michels , Soren Pirk , Wojtek Palubicki This is my paper

Pith reviewed 2026-05-22 06:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords synthetic datadomain gaptransfer performanceappearance similaritygeometric consistencybilinear interactioncomputer visiondataset evaluation

0 comments

The pith

A metric that fuses image appearance similarity with geometric consistency predicts how useful a synthetic dataset will be for real-world computer vision tasks without any model training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SADGE to estimate the domain gap between synthetic and real images by showing that downstream performance on tasks like detection and segmentation depends on the interplay of how images look and how their structures align. Separate checks for visual resemblance or structural match each fail to forecast transfer results reliably across varied benchmarks. A specific non-linear fusion of the two factors produces scores that track actual accuracy gains far more closely than either factor by itself. This approach matters because generating and vetting synthetic data is costly, so an upfront predictor could let researchers test ideas faster and focus effort on promising datasets. The evaluation covers multiple public synthetic-to-real families and three common vision tasks using tens of thousands of image pairs.

Core claim

SADGE measures synthetic-to-real utility through a constrained bilinear interaction that combines an appearance similarity score with a geometric consistency score, and this combined value shows stronger linear and rank correlations with downstream transfer performance than any appearance-only or geometry-only baseline across the tested benchmarks and tasks.

What carries the argument

Constrained bilinear interaction that fuses an appearance similarity score with a geometric consistency score.

If this is right

Researchers can rank candidate synthetic datasets by SADGE score and train only on the highest-ranked ones instead of testing every option.
Synthetic data pipelines can be iterated by adjusting generation parameters and immediately scoring the result against real images to guide improvements.
Tasks that rely on both visual cues and spatial layout benefit when datasets are selected or created to raise the fused score rather than optimizing appearance or geometry in isolation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion idea could be tried in other simulation-to-reality settings such as robotics or autonomous driving to forecast which simulated environments will transfer best.
Data generators might be trained or tuned to directly maximize the SADGE score, potentially producing more efficient synthetic data with less manual tuning.
Over repeated use, SADGE could help reduce the volume of real labeled data needed by making synthetic substitutes more predictable in advance.

Load-bearing premise

The correlations found on the current set of synthetic generation methods, tasks, and real distributions will continue to hold for new generation techniques, different downstream models, and previously unseen real-world data.

What would settle it

Apply SADGE to a new collection of synthetic datasets paired with real images outside the original benchmarks, train models on each synthetic set, and check whether the SADGE ordering matches the observed real-task performance ordering.

Figures

Figures reproduced from arXiv: 2605.22467 by Bartosz Kotrys, Dominik Michels, Patryk Bartkowiak, Soren Pirk, Wojtek Palubicki.

**Figure 2.** Figure 2: Pearson correlation with downstream performance on all datasets (top-left panel) and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: We used the datasets DIMO (a), RarePlanes (b), TUD-L (c), VKITTI2 (d), and ASD (e). [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Scatter plots of downstream task performance versus (left) SADGE, (center) MASt3R [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Sensitivity analysis of SADGE coefficients in Eq. 6. The figure reports three Pearson [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

We propose SADGE, a quantitative similarity metric that predicts the performance of synthetic image datasets for common computer vision tasks without downstream model training. Estimating whether a synthetic dataset will lead to a model that performs well on real-world data remains a bottleneck in model development. Existing evaluation metrics (e.g., PSNR, FID, CLIP) primarily measure semantic alignment between real and synthetic images (Appearance Similarity Score). Less commonly, structural similarity between images is considered to assess the domain gap (Geometric Similarity Score). However, to the best of our knowledge there exists no studies that evaluate which similarity metric is the best downstream predictor for a given synthetic dataset. In this paper, we show over a wide variety of different synthetic datasets and downstream tasks that neither appearance nor geometry alone can reliably predict downstream performance; rather, it is their non-linear interplay that dictates synthetic data utility. Specifically, we measure how commonly used Appearance and Geometric Similarity metrics computed between synthetic and real images correlate with downstream performance in object detection, semantic segmentation, and pose estimation. Across five public synthetic-to-real benchmark families and 15 dataset-level variants (79k image pairs), SADGE achieves the strongest association with downstream transfer performance under both linear and rank-based criteria, reaching Pearson r=0.88 and Spearman rho=0.77. We compute for each combination of geometry-based methods and appearance-based approaches SADGE scores across all benchmark families. The best configuration is obtained by fusing DINOv3 appearance similarity with MASt3R geometric consistency through a constrained bilinear interaction, outperforming both the strongest geometry-only baseline and the strongest appearance-only baseline .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SADGE fuses DINOv3 appearance with MASt3R geometry via bilinear interaction and reports strong correlations with downstream transfer on synthetic-to-real benchmarks, but post-hoc selection of the best configuration likely inflates those numbers.

read the letter

The main point is that SADGE combines two off-the-shelf extractors in a constrained bilinear way and gets Pearson r=0.88 and Spearman rho=0.77 with transfer performance across five benchmark families. That is the concrete result worth noting. The paper shows that neither appearance metrics nor geometry metrics alone track downstream object detection, segmentation, or pose estimation as well as their non-linear combination, and it backs this with 79k image pairs from 15 dataset variants. That comparison is systematic and directly addresses a practical bottleneck in synthetic data work. They also give credit to the base models rather than claiming a wholly new architecture, which keeps the contribution focused. The empirical pattern holds up on the reported numbers: single-modality baselines underperform the fused version. The soft spot is the selection process. The abstract indicates they computed every geometry-appearance pairing and then chose the top bilinear configuration. Without family-level cross-validation or a fully held-out benchmark family for both parameter fitting and final reporting, the quoted correlations carry some optimistic bias. That does not invalidate the core observation that interplay matters, but it does mean the exact strength of SADGE needs checking against fresh data generators and tasks. The paper is aimed at researchers who build or curate synthetic datasets for computer vision and want a cheap predictor before running full training loops. A reader working on domain adaptation or data selection will find the numbers and the five-family scope useful even if they later adjust the fusion. It deserves a serious referee because the question is real, the evaluation scale is reasonable, and the methods are reproducible enough to test. I would send it for review with a request to clarify the exact data splits used for bilinear parameter selection and to add error bars or bootstrap intervals on the reported r and rho values.

Referee Report

2 major / 2 minor

Summary. The paper proposes SADGE, a metric to predict synthetic dataset utility for downstream CV tasks (object detection, semantic segmentation, pose estimation) without training. It claims neither appearance nor geometric similarity alone suffices; instead, their non-linear interplay via constrained bilinear fusion of DINOv3 appearance features and MASt3R geometric consistency yields the strongest predictor. Evaluated across five synthetic-to-real benchmark families, 15 dataset variants, and 79k image pairs, the best SADGE configuration reports Pearson r=0.88 and Spearman rho=0.77, outperforming single-modality baselines.

Significance. If the correlations prove robust after addressing selection and validation concerns, SADGE would offer a practical, training-free tool for synthetic data assessment, filling a gap where existing metrics like FID or PSNR fall short. The empirical finding that fusion outperforms isolated modalities is a concrete contribution, though claims of superiority require safeguards against post-selection bias to support broader adoption.

major comments (2)

[Abstract] Abstract: The text states that all combinations of geometry- and appearance-based metrics were computed across benchmark families, after which the best configuration (DINOv3 + MASt3R) and its bilinear parameters were selected. Without explicit family-level cross-validation or a held-out benchmark family for parameter fitting and selection, the reported r=0.88 and rho=0.77 are vulnerable to optimistic bias from post-hoc choice among multiple tested variants.
[Abstract] Abstract and evaluation description: No information is given on the total number of configurations tested, whether bilinear interaction parameters were tuned on the same 79k pairs used for final reporting, or error bars/confidence intervals on the correlation coefficients. This omission makes it impossible to evaluate whether the claimed strongest association reflects genuine predictive power or selection effects.

minor comments (2)

[Abstract] Abstract: The phrase 'constrained bilinear interaction' is introduced without defining the constraints or the exact form of the fusion function; a brief equation or pseudocode would clarify reproducibility.
The manuscript would benefit from a table summarizing all tested appearance/geometry combinations and their individual correlations to allow direct comparison with the fused SADGE result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. The concerns regarding potential selection bias and lack of details on the experimental configurations are well-taken. We provide point-by-point responses below and outline the revisions we will make to address these issues.

read point-by-point responses

Referee: [Abstract] Abstract: The text states that all combinations of geometry- and appearance-based metrics were computed across benchmark families, after which the best configuration (DINOv3 + MASt3R) and its bilinear parameters were selected. Without explicit family-level cross-validation or a held-out benchmark family for parameter fitting and selection, the reported r=0.88 and rho=0.77 are vulnerable to optimistic bias from post-hoc choice among multiple tested variants.

Authors: We agree that selecting the best configuration after evaluating multiple variants on the full set of benchmarks can lead to optimistic bias in the reported correlation values. Our original approach involved computing SADGE scores for all combinations of geometry- and appearance-based metrics across the benchmark families and then identifying the top performer. To rigorously address this, we will revise the evaluation section to include a leave-one-family-out cross-validation scheme for both configuration selection and bilinear parameter tuning. In this setup, the configuration and parameters will be chosen based on four families, and the correlation will be computed on the remaining held-out family. We will report the mean Pearson r and Spearman rho across all such folds, along with their standard deviations. This will provide a more conservative and generalizable estimate of SADGE's predictive performance. revision: yes
Referee: [Abstract] Abstract and evaluation description: No information is given on the total number of configurations tested, whether bilinear interaction parameters were tuned on the same 79k pairs used for final reporting, or error bars/confidence intervals on the correlation coefficients. This omission makes it impossible to evaluate whether the claimed strongest association reflects genuine predictive power or selection effects.

Authors: We agree that these details should have been included for full transparency. We will update the manuscript to report the total number of configurations evaluated, which included all combinations of the geometry-based and appearance-based metrics we considered. The bilinear interaction parameters were tuned on the same 79k image pairs used for the final correlation reporting. We will also add 95% confidence intervals for the Pearson r and Spearman rho values, computed using bootstrap resampling. These revisions will allow readers to better gauge the impact of selection effects and the statistical reliability of our results. revision: yes

Circularity Check

1 steps flagged

SADGE's reported Pearson r=0.88 obtained by selecting best fusion after evaluating all combinations on the same 79k-pair benchmark set

specific steps

fitted input called prediction [Abstract]
"We compute for each combination of geometry-based methods and appearance-based approaches SADGE scores across all benchmark families. The best configuration is obtained by fusing DINOv3 appearance similarity with MASt3R geometric consistency through a constrained bilinear interaction, outperforming both the strongest geometry-only baseline and the strongest appearance-only baseline."

The paper evaluates every geometry-appearance pairing on the full set of five benchmark families (79k pairs), selects the single best configuration and its bilinear parameters, then presents that selected fusion as achieving the highest Pearson r=0.88 and Spearman rho=0.77 on the same data; the reported association is therefore the outcome of the selection step rather than a prediction from a fixed, pre-chosen metric.

full rationale

The paper's central claim is that the non-linear interplay of appearance and geometry (via the chosen SADGE configuration) is the reliable predictor of downstream performance. However, the abstract states that all combinations were computed across the benchmark families and the best configuration (DINOv3 + MASt3R with constrained bilinear interaction) was then obtained and reported as outperforming baselines with r=0.88. This selection and any associated parameter fitting on the identical evaluation data makes the headline correlation a post-selection result rather than an independent test of a pre-specified metric. Individual modality measurements may remain non-circular, but the fused SADGE claim reduces to the fitted choice on the reported data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

SADGE rests on pre-trained models and an empirical fusion whose parameters are not fully specified in the abstract; no new physical entities are introduced.

free parameters (1)

bilinear interaction parameters
The constrained bilinear fusion between appearance and geometry scores requires weights or constraints whose exact values or fitting procedure are not detailed.

axioms (1)

domain assumption Similarity metrics computed on image pairs can predict downstream task performance on real data without training the target model.
Central premise that allows the metric to be used as a proxy without running full transfer experiments.

pith-pipeline@v0.9.0 · 5837 in / 1405 out tokens · 53332 ms · 2026-05-22T06:47:18.949749+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean RCLCombiner_isCoupling_iff echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We define SADGE as a bilinear interaction because it is the simplest function that captures complementarity while staying monotone and low-capacity. ... SADGE = a Ĝ + b Â + c Ĝ Â where a, b, c ≥ 0
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the contribution of geometry similarity grows when appearance similarity is already high

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

[1]

Alaa, Boris van Breugel, Evgeny Saveliev, and Mihaela van der Schaar

Ahmed M. Alaa, Boris van Breugel, Evgeny Saveliev, and Mihaela van der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In Proceedings of the 39th International Conference on Machine Learning (ICML), volume 162, pages 290–306. PMLR, 2022

work page 2022
[2]

Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

work page 2019
[3]

Pros and cons of GAN evaluation measures: New developments.Computer Vision and Image Understanding, 215:103329, 2022

Ali Borji. Pros and cons of GAN evaluation measures: New developments.Computer Vision and Image Understanding, 215:103329, 2022

work page 2022
[4]

Virtual kitti 2, 2020

Yohann Cabon, Naila Murray, and Martin Humenberger. Virtual kitti 2, 2020

work page 2020
[5]

Michels, Soren Pirk, Chia-Chun Fu, and Wojciech Palubicki

Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hadrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Soren Pirk, Chia-Chun Fu, and Wojciech Palubicki. Generating Diverse Agricultural Data for Vision-Based Farming Applications . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (...

work page 2024
[6]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5828–5839, 2017

work page 2017
[7]

Dataset of industrial metal objects.arXiv preprint, 2022

Peter De Roovere, Steven Moonen, Nick Michiels, and Francis Wyffels. Dataset of industrial metal objects.arXiv preprint, 2022

work page 2022
[8]

Superpoint: Self-supervised interest point detection and description

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 224–236, 2018

work page 2018
[9]

Understanding dataset difficulty with V-usable information

Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta. Understanding dataset difficulty with V-usable information. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162, pages 5988–6008. PMLR, 2022

work page 2022
[10]

Eversberg and J

L. Eversberg and J. Lambrecht. Generating images with physics-based rendering for an industrial object detection task: Realism versus domain randomization.Sensors, 21(23):7901, 2021

work page 2021
[11]

Vision meets robotics: The kitti dataset.Int

A Geiger, P Lenz, C Stiller, and R Urtasun. Vision meets robotics: The kitti dataset.Int. J. Rob. Res., 32(11):1231–1237, September 2013

work page 2013
[12]

Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam H

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam H. Laradji, Hsueh-Ti Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Öztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S...

work page 2022
[13]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. InAdvances in Neural Information Processing Systems, pages 6626–6637, 2017

work page 2017
[14]

T-LESS: An RGB-D dataset for 6d pose estimation of texture-less objects

Tomas Hodan, Pavel Haluza, Stepan Obdrzalek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. T-LESS: An RGB-D dataset for 6d pose estimation of texture-less objects. In2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 880–888, 2017. 10

work page 2017
[15]

BOP: Benchmark for 6d object pose estimation

Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, and Carsten Rother. BOP: Benchmark for 6d object pose estimation. InProceedings of the European Conference on Computer Vision (ECCV), 2018

work page 2018
[16]

Horváth, G

D. Horváth, G. Erd ˝os, Z. Istenes, T. Horváth, and S. Földi. Object detection using sim2real domain randomization for robotic applications.IEEE Transactions on Robotics, 39(2):1225– 1243, 2023

work page 2023
[17]

Lawrence Zitnick, and Ross Girshick

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[18]

Meta-sim: Learning to generate synthetic datasets

Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, and Sanja Fidler. Meta-sim: Learning to generate synthetic datasets. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

work page 2019
[19]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023

work page 2023
[20]

Synbench: Task-agnostic benchmarking of pretrained representations using synthetic data.CoRR, abs/2210.02989, 2022

Ching-Yun Ko, Pin-Yu Chen, Jeet Mohapatra, Payel Das, and Luca Daniel. Synbench: Task-agnostic benchmarking of pretrained representations using synthetic data.CoRR, abs/2210.02989, 2022

work page arXiv 2022
[21]

Improved precision and recall metric for assessing generative models

Tuomas Kynkäanniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. InAdvances in Neural Information Processing Systems, pages 3927–3936, 2019

work page 2019
[22]

Ground- ing image matching in 3d with mast3r

Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3R: Grounding image matching in 3d.arXiv preprint arXiv:2406.09756, 2024

work page arXiv 2024
[23]

Benchmarking and analyzing generative data for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(9):7675–7688, 2025

Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, and Ziwei Liu. Benchmarking and analyzing generative data for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(9):7675–7688, 2025

work page 2025
[24]

Lightglue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17627–17638, 2023

work page 2023
[25]

InProceedings of the 18th Confer- ence of the European Chapter of the Association for Computational Linguistics, pages 139–151

Yingzhou Lu, Huazheng Wang, and Wenqi Wei. Machine learning for synthetic data generation: a review.CoRR, abs/2302.04062, 2023

work page arXiv 2023
[26]

Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Markus Vincze

Pablo Martinez-Gonzalez, Sergiu Oprea, John A. Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Markus Vincze. Unrealrox+: An improved tool for acquiring synthetic data from virtual 3d environments.CoRR, abs/2104.11776, 2021

work page arXiv 2021
[27]

A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation

Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

work page 2016
[28]

John McCormac, Ankur Handa, Stefan Leutenegger, and Andrew J. Davison. Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017

work page 2017
[29]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc-Antoine Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaa El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Patrick Labatut, Arman...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[30]

MAUVE scores for generative models: Theory and practice.Journal of Machine Learning Research, 24(356):1–92, 2023

Krishna Pillutla, Linyi Liu, John Thickstun, Sean Welleck, Julian McAuley, and Luke Zettle- moyer. MAUVE scores for generative models: Theory and practice.Journal of Machine Learning Research, 24(356):1–92, 2023

work page 2023
[31]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceed- ings of the 38th International Conference on Machine Learning (ICML), volume 139, pages...

work page 2021
[32]

Infinite photorealistic worlds using procedural generation

Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, Alejandro Newell, Hei Law, Ankit Goyal, Kaiyu Yang, and Jia Deng. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), page...

work page 2023
[33]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

work page 2022
[34]

German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M. Lopez. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

work page 2016
[35]

Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen

Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. InAdvances in Neural Information Processing Systems, pages 2234–2242, 2016

work page 2016
[36]

Indoor synthetic data generation: A systematic review.Computer Vision and Image Understanding, 240:103907, 2024

Hannah Schieber, Kubilay Can Demir, Constantin Kleinbeck, Seung Hee Yang, and Daniel Roth. Indoor synthetic data generation: A systematic review.Computer Vision and Image Understanding, 240:103907, 2024

work page 2024
[37]

Rareplanes: Synthetic data takes flight

Jacob Shermeyer, Thomas Hossler, Adam Van Etten, Daniel Hogan, Ryan Lewis, and Daeil Kim. Rareplanes: Synthetic data takes flight. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 207–217, 2021

work page 2021
[38]

Indoor segmentation and support inference from RGBD images

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from RGBD images. InProceedings of the European Conference on Computer Vision (ECCV), 2012

work page 2012
[39]

DINOv3

Oriane Siméoni, Huy V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julien Ma...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

LoFTR: Detector-free local feature matching with transformers

Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. LoFTR: Detector-free local feature matching with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8922–8931, 2021

work page 2021
[41]

Smith, and Yejin Choi

Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. Dataset cartography: Mapping and diagnosing datasets with training dynamics. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9275–9293, 2020

work page 2020
[42]

Antonio Torralba and Alexei A. Efros. Unbiased look at dataset bias. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011

work page 2011
[43]

Turaga, O

D. Turaga, O. Verscheure, and P. Frossard. No reference PSNR estimation for compressed pictures.Signal Processing: Image Communication, 19(2):173–184, 2004. 12

work page 2004
[44]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

work page 2004
[45]

Alvarez, and Ping Luo

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. Segformer: simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021
[46]

Sdqm: Synthetic data quality metric for object detection dataset evaluation.arXiv preprint arXiv:2510.06596, 2025

Ayush Zenith, Arnold Zumbrun, Neel Raut, and Jing Lin. Sdqm: Synthetic data quality metric for object detection dataset evaluation.arXiv preprint arXiv:2510.06596, 2025

work page arXiv 2025
[47]

SigLIP: Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. SigLIP: Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986, 2023

work page 2023
[48]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018

work page 2018
[49]

Datasetgan: Efficient labeled data factory with minimal human effort

Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, and Sanja Fidler. Datasetgan: Efficient labeled data factory with minimal human effort. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10145–10155, 2021

work page 2021
[50]

X. Zhu, T. Bilal, P. Mårtensson, L. Hanson, M. Björkman, and A. Maki. Towards sim-to-real industrial parts classification with synthetic dataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4454–4463, 2023

work page 2023
[51]

X. Zhu, P. Mårtensson, L. Hanson, M. Björkman, and A. Maki. Automated assembly qual- ity inspection by deep learning with 2d and 3d synthetic cad data.Journal of Intelligent Manufacturing, pages 1–16, 2024. A Implementation Details Unless otherwise stated, the reported SADGE results use the best-performing appearance–geometry pair selected by an exhaustiv...

work page arXiv 2024
[52]

Therefore, IRB or equivalent approval is not applicable

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

Alaa, Boris van Breugel, Evgeny Saveliev, and Mihaela van der Schaar

Ahmed M. Alaa, Boris van Breugel, Evgeny Saveliev, and Mihaela van der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In Proceedings of the 39th International Conference on Machine Learning (ICML), volume 162, pages 290–306. PMLR, 2022

work page 2022

[2] [2]

Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

work page 2019

[3] [3]

Pros and cons of GAN evaluation measures: New developments.Computer Vision and Image Understanding, 215:103329, 2022

Ali Borji. Pros and cons of GAN evaluation measures: New developments.Computer Vision and Image Understanding, 215:103329, 2022

work page 2022

[4] [4]

Virtual kitti 2, 2020

Yohann Cabon, Naila Murray, and Martin Humenberger. Virtual kitti 2, 2020

work page 2020

[5] [5]

Michels, Soren Pirk, Chia-Chun Fu, and Wojciech Palubicki

Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hadrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Soren Pirk, Chia-Chun Fu, and Wojciech Palubicki. Generating Diverse Agricultural Data for Vision-Based Farming Applications . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (...

work page 2024

[6] [6]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5828–5839, 2017

work page 2017

[7] [7]

Dataset of industrial metal objects.arXiv preprint, 2022

Peter De Roovere, Steven Moonen, Nick Michiels, and Francis Wyffels. Dataset of industrial metal objects.arXiv preprint, 2022

work page 2022

[8] [8]

Superpoint: Self-supervised interest point detection and description

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 224–236, 2018

work page 2018

[9] [9]

Understanding dataset difficulty with V-usable information

Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta. Understanding dataset difficulty with V-usable information. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162, pages 5988–6008. PMLR, 2022

work page 2022

[10] [10]

Eversberg and J

L. Eversberg and J. Lambrecht. Generating images with physics-based rendering for an industrial object detection task: Realism versus domain randomization.Sensors, 21(23):7901, 2021

work page 2021

[11] [11]

Vision meets robotics: The kitti dataset.Int

A Geiger, P Lenz, C Stiller, and R Urtasun. Vision meets robotics: The kitti dataset.Int. J. Rob. Res., 32(11):1231–1237, September 2013

work page 2013

[12] [12]

Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam H

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam H. Laradji, Hsueh-Ti Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Öztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S...

work page 2022

[13] [13]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. InAdvances in Neural Information Processing Systems, pages 6626–6637, 2017

work page 2017

[14] [14]

T-LESS: An RGB-D dataset for 6d pose estimation of texture-less objects

Tomas Hodan, Pavel Haluza, Stepan Obdrzalek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. T-LESS: An RGB-D dataset for 6d pose estimation of texture-less objects. In2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 880–888, 2017. 10

work page 2017

[15] [15]

BOP: Benchmark for 6d object pose estimation

Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, and Carsten Rother. BOP: Benchmark for 6d object pose estimation. InProceedings of the European Conference on Computer Vision (ECCV), 2018

work page 2018

[16] [16]

Horváth, G

D. Horváth, G. Erd ˝os, Z. Istenes, T. Horváth, and S. Földi. Object detection using sim2real domain randomization for robotic applications.IEEE Transactions on Robotics, 39(2):1225– 1243, 2023

work page 2023

[17] [17]

Lawrence Zitnick, and Ross Girshick

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[18] [18]

Meta-sim: Learning to generate synthetic datasets

Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, and Sanja Fidler. Meta-sim: Learning to generate synthetic datasets. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

work page 2019

[19] [19]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023

work page 2023

[20] [20]

Synbench: Task-agnostic benchmarking of pretrained representations using synthetic data.CoRR, abs/2210.02989, 2022

Ching-Yun Ko, Pin-Yu Chen, Jeet Mohapatra, Payel Das, and Luca Daniel. Synbench: Task-agnostic benchmarking of pretrained representations using synthetic data.CoRR, abs/2210.02989, 2022

work page arXiv 2022

[21] [21]

Improved precision and recall metric for assessing generative models

Tuomas Kynkäanniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. InAdvances in Neural Information Processing Systems, pages 3927–3936, 2019

work page 2019

[22] [22]

Ground- ing image matching in 3d with mast3r

Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3R: Grounding image matching in 3d.arXiv preprint arXiv:2406.09756, 2024

work page arXiv 2024

[23] [23]

Benchmarking and analyzing generative data for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(9):7675–7688, 2025

Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, and Ziwei Liu. Benchmarking and analyzing generative data for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(9):7675–7688, 2025

work page 2025

[24] [24]

Lightglue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17627–17638, 2023

work page 2023

[25] [25]

InProceedings of the 18th Confer- ence of the European Chapter of the Association for Computational Linguistics, pages 139–151

Yingzhou Lu, Huazheng Wang, and Wenqi Wei. Machine learning for synthetic data generation: a review.CoRR, abs/2302.04062, 2023

work page arXiv 2023

[26] [26]

Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Markus Vincze

Pablo Martinez-Gonzalez, Sergiu Oprea, John A. Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Markus Vincze. Unrealrox+: An improved tool for acquiring synthetic data from virtual 3d environments.CoRR, abs/2104.11776, 2021

work page arXiv 2021

[27] [27]

A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation

Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

work page 2016

[28] [28]

John McCormac, Ankur Handa, Stefan Leutenegger, and Andrew J. Davison. Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017

work page 2017

[29] [29]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc-Antoine Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaa El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Patrick Labatut, Arman...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[30] [30]

MAUVE scores for generative models: Theory and practice.Journal of Machine Learning Research, 24(356):1–92, 2023

Krishna Pillutla, Linyi Liu, John Thickstun, Sean Welleck, Julian McAuley, and Luke Zettle- moyer. MAUVE scores for generative models: Theory and practice.Journal of Machine Learning Research, 24(356):1–92, 2023

work page 2023

[31] [31]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceed- ings of the 38th International Conference on Machine Learning (ICML), volume 139, pages...

work page 2021

[32] [32]

Infinite photorealistic worlds using procedural generation

Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, Alejandro Newell, Hei Law, Ankit Goyal, Kaiyu Yang, and Jia Deng. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), page...

work page 2023

[33] [33]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

work page 2022

[34] [34]

German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M. Lopez. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

work page 2016

[35] [35]

Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen

Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. InAdvances in Neural Information Processing Systems, pages 2234–2242, 2016

work page 2016

[36] [36]

Indoor synthetic data generation: A systematic review.Computer Vision and Image Understanding, 240:103907, 2024

Hannah Schieber, Kubilay Can Demir, Constantin Kleinbeck, Seung Hee Yang, and Daniel Roth. Indoor synthetic data generation: A systematic review.Computer Vision and Image Understanding, 240:103907, 2024

work page 2024

[37] [37]

Rareplanes: Synthetic data takes flight

Jacob Shermeyer, Thomas Hossler, Adam Van Etten, Daniel Hogan, Ryan Lewis, and Daeil Kim. Rareplanes: Synthetic data takes flight. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 207–217, 2021

work page 2021

[38] [38]

Indoor segmentation and support inference from RGBD images

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from RGBD images. InProceedings of the European Conference on Computer Vision (ECCV), 2012

work page 2012

[39] [39]

DINOv3

Oriane Siméoni, Huy V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julien Ma...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

LoFTR: Detector-free local feature matching with transformers

Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. LoFTR: Detector-free local feature matching with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8922–8931, 2021

work page 2021

[41] [41]

Smith, and Yejin Choi

Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. Dataset cartography: Mapping and diagnosing datasets with training dynamics. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9275–9293, 2020

work page 2020

[42] [42]

Antonio Torralba and Alexei A. Efros. Unbiased look at dataset bias. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011

work page 2011

[43] [43]

Turaga, O

D. Turaga, O. Verscheure, and P. Frossard. No reference PSNR estimation for compressed pictures.Signal Processing: Image Communication, 19(2):173–184, 2004. 12

work page 2004

[44] [44]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

work page 2004

[45] [45]

Alvarez, and Ping Luo

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. Segformer: simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021

[46] [46]

Sdqm: Synthetic data quality metric for object detection dataset evaluation.arXiv preprint arXiv:2510.06596, 2025

Ayush Zenith, Arnold Zumbrun, Neel Raut, and Jing Lin. Sdqm: Synthetic data quality metric for object detection dataset evaluation.arXiv preprint arXiv:2510.06596, 2025

work page arXiv 2025

[47] [47]

SigLIP: Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. SigLIP: Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11975–11986, 2023

work page 2023

[48] [48]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018

work page 2018

[49] [49]

Datasetgan: Efficient labeled data factory with minimal human effort

Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, and Sanja Fidler. Datasetgan: Efficient labeled data factory with minimal human effort. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10145–10155, 2021

work page 2021

[50] [50]

X. Zhu, T. Bilal, P. Mårtensson, L. Hanson, M. Björkman, and A. Maki. Towards sim-to-real industrial parts classification with synthetic dataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4454–4463, 2023

work page 2023

[51] [51]

X. Zhu, P. Mårtensson, L. Hanson, M. Björkman, and A. Maki. Automated assembly qual- ity inspection by deep learning with 2d and 3d synthetic cad data.Journal of Intelligent Manufacturing, pages 1–16, 2024. A Implementation Details Unless otherwise stated, the reported SADGE results use the best-performing appearance–geometry pair selected by an exhaustiv...

work page arXiv 2024

[52] [52]

Therefore, IRB or equivalent approval is not applicable

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page