Recognition: 2 theorem links
· Lean TheoremWildDet3D: Scaling Promptable 3D Detection in the Wild
Pith reviewed 2026-05-10 17:31 UTC · model grok-4.3
The pith
A unified model detects 3D objects from single images using text, point or box prompts and gains from depth cues.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WildDet3D is a geometry-aware architecture that natively accepts text, point, and box prompts for monocular 3D object detection and incorporates auxiliary depth signals at inference time. Combined with WildDet3D-Data, the largest open 3D detection dataset constructed by generating candidate 3D boxes from 2D annotations and retaining only human-verified ones across 13.5K categories in diverse real-world scenes, it reaches 22.6/24.8 AP3D on the new WildDet3D-Bench with text and box prompts, 34.2/36.4 AP3D on Omni3D, and 40.3/48.9 ODS in zero-shot tests on Argoverse 2 and ScanNet, with depth cues adding +20.7 AP on average.
What carries the argument
WildDet3D, a unified geometry-aware architecture that accepts multiple prompt modalities and integrates depth signals during inference.
Load-bearing premise
Generating candidate 3D boxes from 2D annotations and human verification produces accurate, unbiased 3D ground truth across 13.5K categories and real scenes without systematic errors or selection bias.
What would settle it
Independent re-annotation of a random sample of the dataset's 3D boxes to measure error rates against the verified labels, or evaluation on a new benchmark containing categories and scenes entirely absent from the construction process.
read the original abstract
Understanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection--recovering the extent, location, and orientation of objects from an input RGB image. To be practical in the open world, such a detector must generalize beyond closed-set categories, support diverse prompt modalities, and leverage geometric cues when available. Progress is hampered by two bottlenecks: existing methods are designed for a single prompt type and lack a mechanism to incorporate additional geometric cues, and current 3D datasets cover only narrow categories in controlled environments, limiting open-world transfer. In this work we address both gaps. First, we introduce WildDet3D, a unified geometry-aware architecture that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. Second, we present WildDet3D-Data, the largest open 3D detection dataset to date, constructed by generating candidate 3D boxes from existing 2D annotations and retaining only human-verified ones, yielding over 1M images across 13.5K categories in diverse real-world scenes. WildDet3D establishes a new state-of-the-art across multiple benchmarks and settings. In the open-world setting, it achieves 22.6/24.8 AP3D on our newly introduced WildDet3D-Bench with text and box prompts. On Omni3D, it reaches 34.2/36.4 AP3D with text and box prompts, respectively. In zero-shot evaluation, it achieves 40.3/48.9 ODS on Argoverse 2 and ScanNet. Notably, incorporating depth cues at inference time yields substantial additional gains (+20.7 AP on average across settings).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces WildDet3D, a unified geometry-aware architecture for monocular 3D object detection that natively supports text, point, and box prompts and incorporates auxiliary depth signals at inference. It also presents WildDet3D-Data, the largest open 3D detection dataset constructed by lifting 2D annotations to candidate 3D boxes and retaining only human-verified instances, spanning over 1M images and 13.5K categories in diverse scenes. The work claims new state-of-the-art results including 22.6/24.8 AP3D on the new WildDet3D-Bench (text/box prompts), 34.2/36.4 AP3D on Omni3D, zero-shot ODS of 40.3/48.9 on Argoverse 2 and ScanNet, and average gains of +20.7 AP from depth cues.
Significance. If the dataset supplies reliable unbiased 3D ground truth and the empirical gains are reproducible, the work would meaningfully advance open-world monocular 3D detection by scaling promptable detection to thousands of categories while integrating geometric cues. The dataset scale and prompt flexibility address documented bottlenecks in the field. Credit is due for the empirical breadth across open-world, closed-set, and zero-shot settings.
major comments (2)
- [WildDet3D-Data section] Dataset construction (WildDet3D-Data section): the pipeline of generating candidate 3D boxes from 2D annotations followed by human verification is described at high level only, with no reported quantitative error analysis, inter-annotator agreement, per-category statistics, or comparison to LiDAR/multi-view references. This is load-bearing for the central claim, as every reported AP3D and ODS number depends on the accuracy and lack of systematic bias in this supervision across 13.5K categories and 1M+ images.
- [Experiments section] Experiments section: the abstract and results tables report concrete AP3D/ODS numbers and depth gains without architecture diagrams, training procedure details, ablation studies on prompt/depth components, or error bars/statistical tests. This prevents assessment of whether the SOTA claims arise from the unified architecture or from dataset effects.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, agreeing where revisions are warranted and providing clarifications on the current manuscript content.
read point-by-point responses
-
Referee: [WildDet3D-Data section] Dataset construction (WildDet3D-Data section): the pipeline of generating candidate 3D boxes from 2D annotations followed by human verification is described at high level only, with no reported quantitative error analysis, inter-annotator agreement, per-category statistics, or comparison to LiDAR/multi-view references. This is load-bearing for the central claim, as every reported AP3D and ODS number depends on the accuracy and lack of systematic bias in this supervision across 13.5K categories and 1M+ images.
Authors: We agree that the dataset construction requires more quantitative support to validate the 3D ground truth quality. In the revised version we will expand the WildDet3D-Data section with: inter-annotator agreement metrics on a sampled subset, quantitative error analysis comparing verified boxes to available LiDAR or multi-view references, per-category statistics on instance counts and verification pass rates, and discussion of potential systematic biases. We will also release the annotation guidelines and a verification subset to enable external assessment. revision: yes
-
Referee: [Experiments section] Experiments section: the abstract and results tables report concrete AP3D/ODS numbers and depth gains without architecture diagrams, training procedure details, ablation studies on prompt/depth components, or error bars/statistical tests. This prevents assessment of whether the SOTA claims arise from the unified architecture or from dataset effects.
Authors: The manuscript already contains an architecture diagram (Figure 2) and training procedure details (Section 4). However, we acknowledge the absence of targeted ablations on prompt modalities and depth integration as well as error bars and statistical tests. We will add these in the revision: ablations isolating each prompt type and the depth cue, error bars from multiple runs, and significance tests on the reported gains. This will clarify the sources of improvement and strengthen reproducibility. revision: yes
Circularity Check
No circularity: purely empirical architecture and dataset with independent evaluation
full rationale
The paper introduces WildDet3D as a promptable 3D detector and WildDet3D-Data via 2D-to-3D lifting plus human verification, then reports empirical AP3D/ODS metrics on new and existing benchmarks including zero-shot transfer. No equations, fitted parameters, or self-citations are presented that reduce any reported gain to a quantity defined by construction from the inputs; the derivation chain consists of standard training and evaluation steps whose outputs are not tautological with the dataset construction or model design. This is self-contained empirical work with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human verification of candidate 3D boxes produces accurate and unbiased 3D ground truth across diverse categories and scenes
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking (D=3 from linking) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
we generate per-pixel ray directions ri,j = K^{-1}[u, v, 1]^T and encode them using 8th-order real spherical harmonics: ϕ(r) = RSH8(r / ||r||) ∈ R^{81}
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ahmadyan, L
A. Ahmadyan, L. Zhang, A. Ablavatski, J. Wei, and M. Grundmann. Objectron: A large scale dataset of object-centric videos in the wild with pose annotations.CVPR, 2021
2021
-
[2]
S. Bai, Y. Cai, R. Chen, K. Chen, X. Chen, Z. Cheng, L. Deng, W. Ding, C. Gao, C. Ge, W. Ge, Z. Guo, Q. Huang, J. Huang, F. Huang, B. Hui, S. Jiang, Z. Li, M. Li, M. Li, K. Li, Z. Lin, J. Lin, X. Liu, J. Liu, C. Liu, Y. Liu, D. Liu, S. Liu, D. Lu, R. Luo, C. Lv, R. Men, L. Meng, X. Ren, X. Ren, S. Song, Y. Sun, J. Tang, J. Tu, J. Wan, P. Wang, P. Wang, Q....
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Z. Bai, T. He, H. Mei, P. Wang, Z. Gao, J. Chen, L. Liu, Z. Zhang, and M. Z. Shou. One token to seg them all: Language instructed reasoning segmentation in videos. InNeurIPS, 2024
2024
-
[4]
Baruch, Z
G. Baruch, Z. Chen, A. Dehghan, T. Dimry, Y. Feigin, P. Fu, T. Gebauer, B. Joffe, D. Kurz, A. Schwartz, and E. Shulman. Arkitscenes: A diverse real-world dataset for 3d indoor scene understanding using mobile rgb-d data,
- [5]
-
[6]
Brazil and X
G. Brazil and X. Liu. M3d-rpn: Monocular 3d region proposal network for object detection. InICCV, 2019
2019
- [7]
-
[8]
H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom. nuscenes: A multimodal dataset for autonomous driving, 2020. URLhttps://arxiv.org/abs/1903.11027
-
[9]
SAM 3: Segment Anything with Concepts
N. Carion, L. Gustafson, Y.-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V. Alwala, H. Khedr, A. Huang, J. Lei, T. Ma, B. Guo, A. Kalla, M. Marks, J. Greer, M. Wang, P. Sun, R. Rädle, T. Afouras, E. Mavroudi, K. Xu, T.-H. Wu, Y. Zhou, L. Momeni, R. Hazra, S. Ding, S. Vaze, F. Porcher, F. Li, S. Li, A. Kamath, H. K. Cheng, P. Dollár, N. Ravi, K. Saenko...
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [10]
-
[11]
C. Clark, J. Zhang, Z. Ma, J. S. Park, M. Salehi, R. Tripathi, S. Lee, Z. Ren, C. D. Kim, Y. Yang, V. Shao, Y. Yang, W. Huang, Z. Gao, T. Anderson, J. Zhang, J. Jain, G. Stoica, W. Han, A. Farhadi, and R. Krishna. Molmo2: Open weights and data for vision-language models with video understanding and grounding, 2026. URL https://arxiv.org/abs/2601.10611
-
[12]
Cordts, M
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The Cityscapes dataset for semantic urban scene understanding. InCVPR, 2016
2016
-
[13]
A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InCVPR, 2017
2017
-
[14]
Deitke, C
M. Deitke, C. Clark, S. Lee, R. Tripathi, Y. Yang, J. S. Park, M. Salehi, N. Muennighoff, K. Lo, L. Soldaini, J. Lu, T. Anderson, E. Bransom, K. Ehsani, H. Ngo, Y. Chen, A. Patel, M. Yatskar, C. Callison-Burch, A. Head, R. Hendrix, F. Bastani, E. VanderBilt, N. Lambert, Y. Chou, A. Chheda, J. Sparks, S. Skjonsberg, M. Schmitz, A. Sarnat, B. Bischoff, P. W...
2025
-
[15]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URLhttps://arxiv.org/abs/2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[16]
Eigen, C
D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network,
- [17]
- [18]
-
[19]
Geiger, P
A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012
2012
-
[20]
Gupta, P
A. Gupta, P. Dollar, and R. Girshick. LVIS: A dataset for large vocabulary instance segmentation. InCVPR, 2019
2019
-
[21]
Huang, H
R. Huang, H. Zheng, Y. Wang, Z. Xia, M. Pavone, and G. Huang. Training an open-vocabulary monocular 3d detection model without 3d data. InNeurIPS, 2024
2024
- [22]
- [23]
-
[24]
L. Jin, R. Tucker, Z. Li, D. Fouhey, N. Snavely, and A. Holynski. Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos. InCVPR, 2025
2025
- [25]
-
[26]
J. Lazarow, D. Griffiths, G. Kohavi, F. Crespo, and A. Dehghan. Cubify anything: Scaling indoor 3d object detection, 2024. URLhttps://arxiv.org/abs/2412.04458
-
[27]
Lazarow, D
J. Lazarow, D. Griffiths, G. Kohavi, F. Crespo, and A. Dehghan. Cubify anything: Scaling indoor 3d object detection. InCVPR, 2025
2025
-
[28]
L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y. Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwang, et al. Grounded language-image pre-training. InCVPR, 2022
2022
- [29]
-
[30]
Z. Li, X. Xu, S. Lim, and H. Zhao. Unimode: Unified monocular 3d object detection. InCVPR, 2024
2024
-
[31]
T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár. Microsoft coco: Common objects in context, 2015. URLhttps://arxiv.org/abs/1405.0312
work page internal anchor Pith review arXiv 2015
-
[32]
S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection.arXiv preprint arXiv:2303.05499, 2023
work page Pith review arXiv 2023
-
[33]
Z. Liu, Z. Wu, and R. Tóth. Smoke: Single-stage monocular 3d object detection via keypoint estimation. In CVPR Workshop, 2020
2020
-
[34]
Loshchilov and F
I. Loshchilov and F. Hutter. Decoupled weight decay regularization, 2019. URLhttps://arxiv.org/abs/1711. 05101
2019
- [35]
-
[36]
Minderer, A
M. Minderer, A. Gritsenko, A. Stone, M. Neumann, D. Weissenborn, A. Dosovitskiy, A. Mahendran, A. Arnab, M. Dehghani, Z. Shen, et al. Simple open-vocabulary object detection. InECCV, 2022
2022
-
[37]
Minderer, A
M. Minderer, A. Gritsenko, and N. Houlsby. Scaling open-vocabulary object detection.NeurIPS, 2023
2023
-
[38]
OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks, M. Brundag...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
DINOv2: Learning Robust Visual Features without Supervision
M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski. Dinov2: Learning robust visual features without supervisi...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[40]
Prolific academic online research, 2025
Prolific. Prolific academic online research, 2025. URLhttps://www.prolific.com. Accessed: March 20, 2026
2025
-
[41]
Learning Transferable Visual Models From Natural Language Supervision
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[42]
N. Ravi, V. Gabeur, Y.-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V. Alwala, N. Carion, C.-Y. Wu, R. Girshick, P. Dollár, and C. Feichtenhofer. Sam 2: Segment anything in images and videos. InICLR, 2025
2025
-
[43]
Reading, A
C. Reading, A. Harakeh, J. Chae, and S. L. Waslander. Categorical depth distribution network for monocular 3d object detection. InCVPR, 2021
2021
-
[44]
Generalizedintersectionoverunion:Ametricandalossfor bounding box regression
H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese. Generalized intersection over union: A metric and a loss for bounding box regression, 2019. URLhttps://arxiv.org/abs/1902.09630
-
[45]
Roberts, J
M. Roberts, J. Ramapuram, A. Ranjan, A. Kumar, M. A. Bautista, N. Paczan, R. Webb, and J. M. Susskind. Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. InICCV, 2021
2021
-
[46]
Rukhovich, A
D. Rukhovich, A. Vorontsova, and A. Konushin. Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection. InW ACV, 2022
2022
-
[47]
S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, X. Zhang, J. Li, and J. Sun. Objects365: A large-scale, high-quality dataset for object detection. InICCV, 2019
2019
-
[48]
S. Song, S. P. Lichtenberg, and J. Xiao. Sun rgb-d: A rgb-d scene understanding benchmark suite. InCVPR, 2015
2015
-
[49]
Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, and L. Wang. Robustness-aware 3d object detection in autonomous driving: A review and outlook.IEEE Transactions on Intelligent Transportation Systems, 2024
2024
-
[50]
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conf...
2020
- [51]
-
[52]
S. D. Team, X. Chen, F.-J. Chu, P. Gleize, K. J. Liang, A. Sax, H. Tang, W. Wang, M. Guo, T. Hardin, X. Li, A. Lin, J. Liu, Z. Ma, A. Sagar, B. Song, X. Wang, J. Yang, B. Zhang, P. Dollár, G. Gkioxari, M. Feiszli, and J. Malik. Sam 3d: 3dfy anything in images, 2025. URLhttps://arxiv.org/abs/2511.16624. 26
work page internal anchor Pith review arXiv 2025
- [53]
-
[54]
R. Wang, S. Xu, Y. Dong, Y. Deng, J. Xiang, Z. Lv, G. Sun, X. Tong, and J. Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details, 2025. URLhttps://arxiv.org/abs/2507.02546
work page internal anchor Pith review arXiv 2025
-
[55]
T. Wang, X. Zhu, J. Pang, and D. Lin. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In CVPR, 2021
2021
- [56]
-
[57]
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, D. Ramanan, P. Carr, and J. Hays. Argoverse 2: Next generation datasets for self-driving perception and forecasting, 2023. URLhttps://arxiv.org/abs/2301.00493
work page internal anchor Pith review arXiv 2023
-
[58]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, C. Zheng, D. Liu, F. Zhou, F. Huang, F. Hu, H. Ge, H. Wei, H. Lin, J. Tang, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Zhou, J. Lin, K. Dang, K. Bao, K. Yang, L. Yu, L. Deng, M. Li, M. Xue, M. Li, P. Zhang, P. Wang, Q. Zhu, R. Men, R. Gao, S. Liu, S. Luo, T. ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [59]
- [60]
- [61]
- [62]
- [63]
-
[64]
W. Yuan, J. Duan, V. Blukis, W. Pumacay, R. Krishna, A. Murali, A. Mousavian, and D. Fox. Robopoint: A vision-language model for spatial affordance prediction for robotics. InCoRL, 2024
2024
- [65]
- [66]
-
[67]
Adding conditional control to text-to-image diffusion models,
L. Zhang, A. Rao, and M. Agrawala. Adding conditional control to text-to-image diffusion models, 2023. URL https://arxiv.org/abs/2302.05543
-
[68]
Zhang, H
R. Zhang, H. Qiu, T. Wang, Z. Guo, Z. Cui, Y. Qiao, H. Li, and P. Gao. Monodetr: Depth-guided transformer for monocular 3d object detection. InICCV, 2023
2023
-
[69]
Y. Zhou, C. Barnes, J. Lu, J. Yang, and H. Li. On the continuity of rotation representations in neural networks,
- [70]
-
[71]
S. Zhu, A. Kumar, M. Hu, and X. Liu. Tame a wild camera: In-the-wild monocular camera calibration. In NeurIPS, 2023
2023
-
[72]
X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Wang, L. Wang, J. Gao, and Y. J. Lee. Segment everything everywhere all at once. InNeurIPS, 2023. 27 Appendix The appendix includes the following sections: •§A - Model and loss details •§B - Training details •§C - Evaluation details •§D - Dataset details •§E - Dataset examples •§F - Qualitative results A Model a...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.