Efficiently Linking Real Scenes with Synthetic Data Generation for AI-based Cognitive Robotics and Computer Vision Applications
Pith reviewed 2026-06-26 16:54 UTC · model grok-4.3
The pith
Linking real scenes to synthetic data bridges domain gaps in robotic vision training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The domain gap between simulation and real-world data limits the precision and scalability of AI models for tasks such as 6D pose estimation and grasping; linking real scenes directly with synthetic data generation during training data creation provides a practical way to bridge that gap.
What carries the argument
The linking mechanism that combines real scenes with synthetic data generation to produce training datasets.
If this is right
- AI architectures can reach higher precision in 6D pose estimation and grasping when trained on linked data.
- Training data generation can scale more efficiently for both industrial and household robotics scenarios.
- Synergies between data-generation methods and model architectures become usable to address current limits.
- Domain-gap problems in semantic environment analysis can be reduced without collecting exhaustive real-world datasets.
Where Pith is reading between the lines
- The same linking approach might extend to other robotics perception tasks such as navigation or object manipulation.
- Implementation details of the linking step would need to be tested to confirm they work at scale.
- Related simulation-reality transfer problems in non-robotics computer vision could benefit from similar linkage methods.
Load-bearing premise
That connecting real scenes to synthetic data will be enough to overcome domain gaps even though no specific linking technique or performance result is shown.
What would settle it
Run a side-by-side test of an AI pose-estimation model trained on three datasets: purely real, purely synthetic, and linked real-synthetic, then measure whether the linked version yields no measurable gain in accuracy or robustness on held-out real scenes.
Figures
read the original abstract
AI vision models are a driving factor for the potential use case scenarios of cognitive robotics within in the industry and household applications. A large array of methods from semantic environment analysis towards 6D and grasping pose estimation have been proposed based on the latest AI achievements. However, such advancements require further strong and efficient methods w.r.t. training data and AI-architectures, which are capable in synergy to tackle current challenges, precision limits, and scalability beyond domain gaps. In this paper, we discuss these current limits and trends in the related state-of-the-art which are challenging those. Further we discuss our current work in progress on bridging the domain gap between simulations and real world applications by linking those in the training data generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript discusses current limits and trends in state-of-the-art AI vision methods for cognitive robotics applications, such as semantic analysis, 6D pose estimation, and grasping. It highlights challenges in training data, AI architectures, precision, scalability, and domain gaps between simulation and reality. The paper further describes the authors' ongoing work-in-progress on bridging these domain gaps by linking real scenes with synthetic data generation during training data creation.
Significance. The general topic of domain gap reduction via mixed real-synthetic training data is relevant to scalable robotic vision systems. However, because the manuscript contains no specific methods, algorithms, datasets, experiments, or quantitative results, its potential significance cannot be assessed beyond a high-level overview of challenges and intent. No machine-checked proofs, reproducible code, or falsifiable predictions are provided.
major comments (1)
- [Abstract] Abstract: The manuscript positions itself as a discussion of ongoing work without presenting any concrete mechanism, equation, algorithm, or preliminary validation for 'linking real scenes with synthetic data generation.' This absence means the central claim of addressing domain gaps cannot be evaluated for correctness or novelty.
Simulated Author's Rebuttal
We thank the referee for their review. The manuscript is positioned as a discussion of challenges and work-in-progress rather than a complete technical contribution with algorithms or results. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The manuscript positions itself as a discussion of ongoing work without presenting any concrete mechanism, equation, algorithm, or preliminary validation for 'linking real scenes with synthetic data generation.' This absence means the central claim of addressing domain gaps cannot be evaluated for correctness or novelty.
Authors: We agree that the paper presents no concrete mechanisms, equations, algorithms, datasets, or validation results. It is explicitly a discussion paper reviewing limits in AI vision for robotics (semantic analysis, 6D pose estimation, grasping) and describing ongoing work-in-progress on linking real scenes with synthetic data generation to address domain gaps. The abstract and introduction state this scope directly. No claim is made to a novel evaluated method; the contribution is the overview of trends and the high-level intent of the linking approach. Such discussion papers can usefully frame open problems even without quantitative results. revision: no
Circularity Check
No significant circularity; purely descriptive discussion with no derivations or fitted claims
full rationale
The manuscript is explicitly positioned as a discussion of state-of-the-art limits plus ongoing work-in-progress on linking real scenes to synthetic data generation. No concrete mechanism, algorithm, equation, dataset, or result is asserted as solved or demonstrated. Consequently there are no load-bearing technical assumptions, predictions, self-citations, or derivations whose failure would falsify a central claim, and no steps reduce to inputs by construction. The text contains no mathematical content at all.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Imagenet classification with deep convolutional neural networks,
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Information Processing Sys- tems(F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds.), vol. 25, Curran Associates, Inc., 2012. 1
2012
-
[2]
Faster R-CNN: towards real-time object detection with region proposal networks,
S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,”CoRR, vol. abs/1506.01497, 2015. 1
Pith/arXiv arXiv 2015
-
[3]
You only look once: Unified, real-time object detection,
J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,”CoRR, vol. abs/1506.02640, 2015. 1
Pith/arXiv arXiv 2015
-
[4]
End- to-end object detection with transformers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End- to-end object detection with transformers,”CoRR, vol. abs/2005.12872, 2020. 1
arXiv 2005
-
[5]
Deformable DETR: deformable transformers for end-to-end object detection,
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: deformable transformers for end-to-end object detection,”CoRR, vol. abs/2010.04159, 2020. 1
Pith/arXiv arXiv 2010
-
[6]
Detrs with collaborative hybrid assignments train- ing,
Z. Zong, G. Song, and Y. Liu, “Detrs with collaborative hybrid assignments train- ing,” 2023. 1
2023
-
[7]
Masked-attention mask transformer for universal image segmentation,
B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,”CoRR, vol. abs/2112.01527,
-
[8]
Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,
Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,”CoRR, vol. abs/1711.00199, 2017. 1, 2, 3
Pith/arXiv arXiv 2017
-
[9]
Maskfusion: Real-time recognition, tracking and re- construction of multiple moving objects,
M. R¨ unz and L. Agapito, “Maskfusion: Real-time recognition, tracking and re- construction of multiple moving objects,”CoRR, vol. abs/1804.09194, 2018. 1, 2 8 Koch et al
Pith/arXiv arXiv 2018
-
[10]
Densefusion: 6d object pose estimation by iterative dense fusion,
C. Wang, D. Xu, Y. Zhu, R. Mart´ ın-Mart´ ın, C. Lu, L. Fei-Fei, and S. Savarese, “Densefusion: 6d object pose estimation by iterative dense fusion,”CoRR, vol. abs/1901.04780, 2019. 1, 2
Pith/arXiv arXiv 1901
-
[11]
Zebrapose: Coarse to fine surface encoding for 6dof object pose esti- mation,
Y. Su, M. Saleh, T. Fetzer, J. Rambach, N. Navab, B. Busam, D. Stricker, and F. Tombari, “Zebrapose: Coarse to fine surface encoding for 6dof object pose esti- mation,” 2022. 1, 2
2022
-
[12]
Anygrasp: Robust and efficient grasp perception in spatial and temporal domains,
H.-S. Fang, C. Wang, H. Fang, M. Gou, J. Liu, H. Yan, W. Liu, Y. Xie, and C. Lu, “Anygrasp: Robust and efficient grasp perception in spatial and temporal domains,”IEEE Transactions on Robotics (T-RO), 2023. 1, 2
2023
-
[13]
Dinov2: Learning robust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fer- nandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without superv...
2023
-
[14]
Learning transfer- able visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transfer- able visual models from natural language supervision,”CoRR, vol. abs/2103.00020,
-
[15]
Language models are few-shot learners,
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Nee- lakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCan- dlish, A. Radford, I. Sutskever, and D. ...
Pith/arXiv arXiv 2005
-
[16]
Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes,
S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab, “Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes,” inComputer Vision – ACCV 2012(K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, eds.), (Berlin, Heidelberg), pp. 548– 562, Springer Berlin Heidelberg, 2013. 2, 3
2012
-
[17]
Graspnet-1billion: A large-scale bench- mark for general object grasping,
H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale bench- mark for general object grasping,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11444–11453, 2020. 2, 3, 4
2020
-
[18]
Suctionnet-1billion: A large-scale bench- mark for suction grasping,
H. Cao, H.-S. Fang, W. Liu, and C. Lu, “Suctionnet-1billion: A large-scale bench- mark for suction grasping,”IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 8718–8725, 2021. 2
2021
-
[19]
Unseen object 6d pose estimation: A benchmark and baselines,
M. Gou, H. Pan, H.-S. Fang, Z. Liu, C. Lu, and P. Tan, “Unseen object 6d pose estimation: A benchmark and baselines,”arXiv preprint, 2022. 2
2022
-
[20]
Transcg: A large-scale real-world dataset for transparent object depth completion and a grasping baseline,
H. Fang, H.-S. Fang, S. Xu, and C. Lu, “Transcg: A large-scale real-world dataset for transparent object depth completion and a grasping baseline,”IEEE Robotics and Automation Letters, pp. 1–8, 2022. 2
2022
-
[21]
Self-supervised 6d object pose estimation for robot manipulation,
X. Deng, Y. Xiang, A. Mousavian, C. Eppner, T. Bretl, and D. Fox, “Self-supervised 6d object pose estimation for robot manipulation,”CoRR, vol. abs/1909.10159, 2019. 2, 3, 6
arXiv 1909
-
[22]
J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Gold- berg, “Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,”CoRR, vol. abs/1703.09312, 2017. 2
Pith/arXiv arXiv 2017
-
[23]
6-dof graspnet: Variational grasp genera- tion for object manipulation,
A. Mousavian, C. Eppner, and D. Fox, “6-dof graspnet: Variational grasp genera- tion for object manipulation,”CoRR, vol. abs/1905.10520, 2019. 2
arXiv 1905
-
[24]
Jacquard: A large scale dataset for robotic grasp detection,
A. Depierre, E. Dellandr´ ea, and L. Chen, “Jacquard: A large scale dataset for robotic grasp detection,”CoRR, vol. abs/1803.11469, 2018. 2 Efficiently Linking Real Scenes with Synthetic Data Generation 9
Pith/arXiv arXiv 2018
-
[25]
Learning 6-dof grasping interaction via deep geometry-aware 3d repre- sentations,
X. Yan, J. Hsu, M. Khansari, Y. Bai, A. Pathak, A. Gupta, J. Davidson, and H. Lee, “Learning 6-dof grasping interaction via deep geometry-aware 3d repre- sentations,” in2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3766–3773, 2018. 2
2018
-
[26]
With synthetic data towards part recognition generalized beyond the training instances,
P. Koch, M. Schl¨ uter, and J. Kr¨ uger, “With synthetic data towards part recognition generalized beyond the training instances,”AIP Conference Proceedings, vol. 2989, p. 020007, 01 2024. 2, 3, 6
2024
-
[27]
Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,
L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,”CoRR, vol. abs/1509.06825, 2015. 2, 3
Pith/arXiv arXiv 2015
-
[28]
S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning hand-eye coordi- nation for robotic grasping with deep learning and large-scale data collection,” CoRR, vol. abs/1603.02199, 2016. 2
Pith/arXiv arXiv 2016
-
[29]
Towards robot-assisted data gener- ation with minimal user interaction for autonomously training 6d pose estimation in operational environments,
P. Koch, M. Schl¨ uter, S. Thill, and J. Kr¨ uger, “Towards robot-assisted data gener- ation with minimal user interaction for autonomously training 6d pose estimation in operational environments,”Procedia CIRP, vol. 120, pp. 249–254, 2023. 56th CIRP International Conference on Manufacturing Systems 2023. 2, 3
2023
-
[30]
Noise and the reality gap: The use of simulation in evolutionary robotics,
N. Jakobi, P. Husbands, and I. Harvey, “Noise and the reality gap: The use of simulation in evolutionary robotics,” vol. 929, pp. 704–720, 01 1995. 2, 3
1995
-
[31]
Palm-e: An embodied multimodal language model,
D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duck- worth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “Palm-e: An embodied multimodal language model,”
-
[32]
Diffusion policy: Visuomotor policy learning via action diffusion,
C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” 2023. 2, 3, 4
2023
-
[33]
Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learn- ing,
T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learn- ing,”CoRR, vol. abs/1910.10897, 2019. 2, 4
arXiv 1910
-
[34]
Gradient surgery for multi-task learning,
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,”CoRR, vol. abs/2001.06782, 2020. 2, 4
arXiv 2001
-
[35]
Multi-task reinforcement learning with context-based representations,
S. Sodhani, A. Zhang, and J. Pineau, “Multi-task reinforcement learning with context-based representations,” 2021. 2, 4
2021
-
[36]
Contrastive preference learning: Learning from human feedback without rl,
J. Hejna, R. Rafailov, H. Sikchi, C. Finn, S. Niekum, W. B. Knox, and D. Sadigh, “Contrastive preference learning: Learning from human feedback without rl,” 2023. 2, 4
2023
-
[37]
Graspness discovery in clutters for fast and accurate grasp detection,
C. Wang, H.-S. Fang, M. Gou, H. Fang, J. Gao, and C. Lu, “Graspness discovery in clutters for fast and accurate grasp detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15964–15973, October
-
[38]
Target-referenced reactive grasping for dynamic objects,
J. Liu, R. Zhang, H.-S. Fang, M. Gou, H. Fang, C. Wang, S. Xu, H. Yan, and C. Lu, “Target-referenced reactive grasping for dynamic objects,” pp. 8824–8833, June 2023. 2
2023
-
[39]
Learning ambidextrous robot grasping policies,
J. Mahler, M. Matl, V. Satish, M. Danielczuk, B. DeRose, S. McKinley, and K. Goldberg, “Learning ambidextrous robot grasping policies,”Science Robotics, vol. 4, no. 26, p. eaau4984, 2019. 2
2019
-
[40]
Efficient grasping from rgbd images: Learn- ing using a new rectangle representation,
Y. Jiang, S. Moseson, and A. Saxena, “Efficient grasping from rgbd images: Learn- ing using a new rectangle representation,” in2011 IEEE International Conference on Robotics and Automation, pp. 3304–3311, 2011. 2
2011
-
[41]
Deep learning for detecting robotic grasps,
I. Lenz, H. Lee, and A. Saxena, “Deep learning for detecting robotic grasps,”The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 705–724, 2015. 2 10 Koch et al
2015
-
[42]
Roi-based robotic grasp detec- tion in object overlapping scenes using convolutional neural network,
H. Zhang, X. Lan, X. Zhou, and N. Zheng, “Roi-based robotic grasp detec- tion in object overlapping scenes using convolutional neural network,”CoRR, vol. abs/1808.10313, 2018. 2
Pith/arXiv arXiv 2018
-
[43]
Learning grasp affordance reasoning through semantic relations,
P. Ard´ on, `E. Pairet, R. P. A. Petrick, S. Ramamoorthy, and K. S. Lo- han, “Learning grasp affordance reasoning through semantic relations,”CoRR, vol. abs/1906.09836, 2019. 2, 3
Pith/arXiv arXiv 1906
-
[44]
Faster recognition of graspable targets de- fined by orientation in a visual search task,
L. Bamford, N. Klassen, and J. Karl, “Faster recognition of graspable targets de- fined by orientation in a visual search task,”Experimental Brain Research, vol. 238, 04 2020. 2, 3
2020
-
[45]
Defgraspsim: Physics-based simulation of grasp outcomes for 3d deformable objects,
I. Huang, Y. Narang, C. Eppner, B. Sundaralingam, M. Macklin, R. Bajcsy, T. Her- mans, and D. Fox, “Defgraspsim: Physics-based simulation of grasp outcomes for 3d deformable objects,”IEEE Robotics and Automation Letters, vol. 7, p. 6274–6281, July 2022. 2
2022
-
[46]
Imagenet: A large- scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large- scale hierarchical image database,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. 3
2009
-
[47]
Microsoft COCO: common objects in context,
T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll´ ar, and C. L. Zitnick, “Microsoft COCO: common objects in context,”CoRR, vol. abs/1405.0312, 2014. 3
Pith/arXiv arXiv 2014
-
[48]
The cityscapes dataset for semantic urban scene understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,”CoRR, vol. abs/1604.01685, 2016. 3
Pith/arXiv arXiv 2016
-
[49]
Homebreweddb: RGB-D dataset for 6d pose estimation of 3d objects,
R. Kaskman, S. Zakharov, I. Shugurov, and S. Ilic, “Homebreweddb: RGB-D dataset for 6d pose estimation of 3d objects,”CoRR, vol. abs/1904.03167, 2019. 3
arXiv 1904
-
[50]
Bop challenge 2022 on detection, segmentation and pose estimation of specific rigid objects,
M. Sundermeyer, T. Hodan, Y. Labbe, G. Wang, E. Brachmann, B. Drost, C. Rother, and J. Matas, “Bop challenge 2022 on detection, segmentation and pose estimation of specific rigid objects,” 2023. 3
2022
-
[51]
Domain randomization for transferring deep neural networks from simulation to the real world,
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,”CoRR, vol. abs/1703.06907, 2017. 3
Pith/arXiv arXiv 2017
-
[52]
A neural algorithm of artistic style,
L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” CoRR, vol. abs/1508.06576, 2015. 3, 7
Pith/arXiv arXiv 2015
-
[53]
Domain enhanced arbitrary image style transfer via contrastive learning,
Y. Zhang, F. Tang, W. Dong, H. Huang, C. Ma, T.-Y. Lee, and C. Xu, “Domain enhanced arbitrary image style transfer via contrastive learning,” inSpecial Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings, SIGGRAPH ’22, ACM, Aug. 2022. 3
2022
-
[54]
Instant neural graphics primitives with a multiresolution hash encoding,
T. M¨ uller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,”CoRR, vol. abs/2201.05989, 2022. 4, 5
arXiv 2022
-
[55]
Extracting triangular 3d models, materials, and lighting from images,
J. Munkberg, J. Hasselgren, T. Shen, J. Gao, W. Chen, A. Evans, T. M¨ uller, and S. Fidler, “Extracting triangular 3d models, materials, and lighting from images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8280–8290, June 2022. 4, 5
2022
-
[56]
Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising,
J. Hasselgren, N. Hofmann, and J. Munkberg, “Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising,” arXiv:2206.03380, 2022. 4
arXiv 2022
-
[57]
Key-locked rank one editing for text-to-image personalization,
Y. Tewel, R. Gal, G. Chechik, and Y. Atzmon, “Key-locked rank one editing for text-to-image personalization,” 2023. 6
2023
-
[58]
Segment any- thing,
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Doll´ ar, and R. Girshick, “Segment any- thing,” 2023. 7
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.