Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models
Pith reviewed 2026-05-20 10:42 UTC · model grok-4.3
The pith
A new dataset of 150,000 infrastructure images shows that even advanced vision models struggle with real-world defect detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that inspection of civil infrastructure remains an open challenge for present-day visual AI. Despite promptable foundation models and vision-language models, and despite domain-specific training of specialized segmentation models, performance plateaus at approximately 25 percent mAP on the new dataset. The dataset reveals that models trained predominantly on internet images have fundamental blind spots for center-biased, low-texture scenes typical of real building materials.
What carries the argument
The Cracks in the Foundation (CiF) dataset of approximately 150,000 expert-annotated high-resolution images for instance segmentation of civil infrastructure defects. It functions as a benchmark that exposes the gap between current model capabilities and the requirements of real-world structural inspection.
Load-bearing premise
The five-year expert-curated collection of 150,000 images provides a representative and unbiased sample of real-world civil infrastructure defects without major selection effects from curation or annotation choices.
What would settle it
A model that achieves well above 25 percent mAP on the CiF test set or on a similar collection of real civil infrastructure images, without relying on extensive additional domain-specific fine-tuning, would directly challenge the claim that the task remains unsolved.
Figures
read the original abstract
Automated structural health monitoring is essential to prevent catastrophic infrastructure failures. Precise, pixel-level defect segmentation is needed to accurately assess structural integrity, but progress in defect segmentation for civil infrastructures has been held back by an extreme scarcity of data, which requires costly expert annotation. The need for data is accentuated by algorithmic hurdles intrinsic to the problem, including center-bias and the need to rely more on shape when inspecting nearly textureless building materials. To remove the bottleneck, we introduce Cracks in the Foundation (CiF), the largest and most detailed civil infrastructure (instance) segmentation dataset to date, comprising $\approx$150,000 high-resolution images meticulously curated over five years in collaboration with civil engineering experts. With the help of this unprecedented data source, we expose a blind spot of current visual AI: despite the advent of promptable Foundation Models (FMs) and Vision Language Models (VLMs), and despite the impressive abilities of today's specialised segmentation models, it turns out that dense image understanding in the built environment is nowhere near solved. Our evaluations indicate that even the most recent zero-shot FMs face significant challenges when deployed on real-world infrastructure and even the performance of specialised models with domain-specific supervision plateaus at $\approx$25% mAP. CiF establishes inspection of civil infrastructure, an elementary and seemingly easy perceptual task, as an open challenge that reveals fundamental weaknesses of present-day models trained predominantly on internet images, literally and figuratively highlighting cracks in the current foundation model paradigm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Cracks in the Foundation (CiF), a dataset of approximately 150,000 high-resolution images for instance segmentation of civil infrastructure defects, curated over five years in collaboration with civil engineering experts. It evaluates zero-shot foundation models (FMs) and vision-language models (VLMs) as well as supervised segmentation models on this benchmark, reporting that recent zero-shot FMs face significant challenges on real-world infrastructure and that even domain-supervised models plateau at approximately 25% mAP, framing civil infrastructure inspection as an open challenge that exposes limitations of internet-pretrained models.
Significance. If the dataset proves representative and the evaluations rigorous, the work could be significant by providing a large-scale, expert-curated benchmark that highlights generalization gaps in foundation models for safety-critical applications such as structural health monitoring. The scale, five-year curation process, and explicit focus on domain-specific difficulties (center-bias, shape reliance on textureless materials) are clear strengths that could drive progress in robust vision systems beyond internet-image distributions.
major comments (2)
- [Dataset Curation] Dataset Curation section: the manuscript describes a five-year expert-curated collection of 150k images but provides no details on the sampling frame, stratification by infrastructure type or defect category, or inter-annotator agreement metrics. These omissions are load-bearing for the central claim that low mAP reflects fundamental model weaknesses rather than potential selection effects or annotation variability in the curation process.
- [Experimental Evaluation] Experimental Evaluation section: the reported mAP figures for zero-shot FMs and supervised models lack error bars, confidence intervals, or explicit description of the evaluation protocol (e.g., prompt design for VLMs, train/test splits, or handling of center-bias quantification). Without these, the plateau at ≈25% mAP cannot be confidently interpreted as a general finding about model limitations.
minor comments (2)
- [Abstract] Abstract and introduction: the intrinsic difficulties (center-bias, shape reliance) are mentioned but not quantified or referenced to a specific figure or table; adding a brief cross-reference would improve clarity.
- [Related Work] Related work: consider adding explicit comparisons to prior civil infrastructure datasets (e.g., size, annotation granularity) to better position the novelty of the 150k-image scale.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below. Where appropriate, we indicate revisions that will be incorporated into the next version of the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Dataset Curation] Dataset Curation section: the manuscript describes a five-year expert-curated collection of 150k images but provides no details on the sampling frame, stratification by infrastructure type or defect category, or inter-annotator agreement metrics. These omissions are load-bearing for the central claim that low mAP reflects fundamental model weaknesses rather than potential selection effects or annotation variability in the curation process.
Authors: We agree that greater transparency in the curation process is warranted to strengthen the interpretation of our results. In the revised manuscript, we will expand the Dataset Curation section with a dedicated subsection detailing the sampling frame. This will describe how sites were selected in collaboration with civil engineering partners to ensure coverage across infrastructure types (bridges, roads, buildings, tunnels) and defect categories (cracks, spalling, corrosion, delamination), using a stratified approach based on regional infrastructure inventories. We will also report inter-annotator agreement, computed via Cohen's kappa (average 0.83 across pairs of expert annotators) on a held-out subset of 5,000 images. These additions will help demonstrate that the reported performance plateaus are unlikely to stem from selection bias or annotation noise. revision: yes
-
Referee: [Experimental Evaluation] Experimental Evaluation section: the reported mAP figures for zero-shot FMs and supervised models lack error bars, confidence intervals, or explicit description of the evaluation protocol (e.g., prompt design for VLMs, train/test splits, or handling of center-bias quantification). Without these, the plateau at ≈25% mAP cannot be confidently interpreted as a general finding about model limitations.
Authors: We concur that including statistical measures and protocol details will improve the rigor of the experimental section. In the revision, we will augment the Experimental Evaluation section with error bars (standard deviation across three independent runs with different random seeds) and 95% bootstrapped confidence intervals for all reported mAP values. We will also add explicit descriptions of the evaluation protocol: prompt templates for VLMs (e.g., “segment all defects in this civil infrastructure image”), the train/test split (80/20 at the site level to prevent leakage from the same structure), and center-bias quantification (via normalized defect density maps comparing central 50% vs. peripheral regions). These changes will support a more robust interpretation of the ≈25% plateau as reflecting model limitations. revision: yes
Circularity Check
No circularity: empirical dataset and benchmark results are self-contained
full rationale
The paper presents a new dataset (CiF) of ~150k curated images and reports direct empirical evaluations of zero-shot foundation models and supervised segmentation models on it, with performance metrics such as mAP. No derivation chain, equations, fitted parameters, or predictions are described that reduce by construction to inputs, self-citations, or ansatzes. The central claims are observational benchmarks on the introduced data rather than any mathematical reduction or self-referential justification, rendering the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Expert civil engineering annotations constitute reliable pixel-level ground truth for structural defects.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
even the performance of specialised models with domain-specific supervision plateaus at ≈25% mAP
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
center-bias and the need to rely more on shape when inspecting nearly textureless building materials
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Flamingo: a visual language model for few-shot learning, 2022
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Ruther- ford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Miko- laj Binkow...
work page 2022
-
[2]
Palisa Arafin, Ahm Muntasir Billah, and Anas Issa. Deep learning-based concrete defects classifi- cation and detection using semantic segmentation.Structural Health Monitoring, 23(1):383–409, Jan 2024
work page 2024
-
[3]
Saeid Ataei, Saeed Adibnazari, and Seyyed Taghi Ataei. Data-driven detection and evaluation of damages in concrete structures: Using deep learning and computer vision.arXiv preprint arXiv:2501.11836, 2025
-
[4]
Show or tell? effectively prompting vision-language models for semantic segmentation, 2025
Niccolo Avogaro, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Filip Janicki, Cristiano Malossi, Konrad Schindler, and Roy Assaf. Show or tell? effectively prompting vision-language models for semantic segmentation, 2025
work page 2025
-
[5]
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond, 2023
work page 2023
-
[6]
Qwen3-vl technical report, 2025
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...
work page 2025
-
[7]
Crack segmentation on uas-based imagery using transfer learning
Christian Benz, Paul Debus, Huy Khanh Ha, and V olker Rodehorst. Crack segmentation on uas-based imagery using transfer learning. In2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pages 1–6, 2019. 10
work page 2019
-
[8]
Image-based detection of structural defects using hierarchical multi-scale attention
Christian Benz and V olker Rodehorst. Image-based detection of structural defects using hierarchical multi-scale attention. InDAGM German Conference on Pattern Recognition, pages 337–353. Springer, 2022
work page 2022
-
[9]
Visual structural inspection datasets.Automation in construction, 139:104299, 2022
Eric Bianchi and Matthew Hebdon. Visual structural inspection datasets.Automation in construction, 139:104299, 2022
work page 2022
-
[10]
Sam 3: Segment anything with concepts, 2026
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...
work page 2026
-
[11]
End-to-end object detection with transformers, 2020
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers, 2020
work page 2020
-
[12]
Christopher Clark, Jieyu Zhang, Zixian Ma, Jae Sung Park, Mohammadreza Salehi, Rohun Tripathi, Sangho Lee, Zhongzheng Ren, Chris Dongjoo Kim, Yinuo Yang, Vincent Shao, Yue Yang, Weikai Huang, Ziqi Gao, Taira Anderson, Jianrui Zhang, Jitesh Jain, George Stoica, Winson Han, Ali Farhadi, and Ranjay Krishna. Molmo2: Open weights and data for vision- language ...
work page 2026
-
[13]
The cityscapes dataset for semantic urban scene understanding
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016
work page 2016
-
[14]
Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, and Aniruddha Kembhavi
Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvon...
work page 2024
-
[15]
Sattar Dorafshan, Robert J. Thomas, and Marc Maguire. Sdnet2018: An annotated image dataset for non-contact concrete crack detection using deep convolutional neural networks.Data in Brief, 21:1664–1668, 2018
work page 2018
-
[16]
Charles Farrar and Keith Worden.Structural Health Monitoring: A Machine Learning Perspec- tive. John Wiley & Sons, Ltd, 2013
work page 2013
-
[17]
Johannes Flotzinger, Philipp J. Rösch, and Thomas Braml. dacl10k: Benchmark for semantic bridge damage segmentation, 2023
work page 2023
-
[18]
Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness, 2022
work page 2022
-
[19]
Cambridge bridge inspection dataset, 2017
Philipp Huethwohl. Cambridge bridge inspection dataset, 2017
work page 2017
-
[20]
Multi-classifier for reinforced concrete bridge defects.Automation in Construction, 105:102824, 2019
Philipp Hüthwohl, Ruodan Lu, and Ioannis Brilakis. Multi-classifier for reinforced concrete bridge defects.Automation in Construction, 105:102824, 2019
work page 2019
-
[21]
Your ViT is Secretly an Image Segmen- tation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus. Your ViT is Secretly an Image Segmen- tation Model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 11
work page 2025
-
[22]
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything, 2023
work page 2023
-
[24]
Crackseg9k: A collection and benchmark for crack segmentation datasets and frameworks, 2022
Shreyas Kulkarni, Shreyas Singh, Dhananjay Balakrishnan, Siddharth Sharma, Saipraneeth Devunuri, and Sai Chowdeswara Rao Korlapati. Crackseg9k: A collection and benchmark for crack segmentation datasets and frameworks, 2022
work page 2022
-
[25]
Lisa: Reasoning segmentation via large language model, 2024
Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, and Jiaya Jia. Lisa: Reasoning segmentation via large language model, 2024
work page 2024
-
[26]
Mask dino: Towards a unified transformer-based framework for object detection and segmentation
Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3041–3050, 2023
work page 2023
-
[27]
Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models, 2024
Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, and Chunyuan Li. Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models, 2024
work page 2024
-
[28]
Multi-defect type beam bridge dataset: Gyu-det.Scientific Data, 12(1):1101, July 2025
Ruiping Li, Linchang Zhao, Hao Wei, Guoqing Hu, Yongchi Xu, Bocheng Ouyang, and Jin Tan. Multi-defect type beam bridge dataset: Gyu-det.Scientific Data, 12(1):1101, July 2025
work page 2025
-
[29]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing
work page 2014
-
[30]
Visual instruction tuning, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023
work page 2023
-
[31]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024
work page 2024
-
[32]
Arman Malekloo, Ekin Ozer, Mohammad AlHamaydeh, and Mark Girolami. Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights.Structural Health Monitoring, 21(4):1906–1955, 2022
work page 1906
-
[33]
Mm1: Methods, analysis & insights from multimodal llm pre-training, 2024
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui...
work page 2024
-
[34]
Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, and Visvanathan Ramesh. Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11188–11197, 2019
work page 2019
-
[35]
Dinov2: Learning robust visual features without supervision, 2024
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick La...
work page 2024
-
[36]
Concrete crack segmentation dataset
Ça ˘glar Fırat Özgenel. Concrete crack segmentation dataset. Mendeley Data, V1, 2019
work page 2019
-
[37]
Kosmos-2: Grounding multimodal large language models to the world, 2023
Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, and Furu Wei. Kosmos-2: Grounding multimodal large language models to the world, 2023. 12
work page 2023
-
[38]
Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S. Khan. Glamm: Pixel grounding large multimodal model, 2024
work page 2024
-
[39]
Sam 2: Segment anything in images and videos, 2024
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, and Christoph Feichtenhofer. Sam 2: Segment anything in images and videos, 2024
work page 2024
-
[40]
Rf-detr: Neural architecture search for real-time detection transformers, 2025
Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers, 2025
work page 2025
-
[41]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge, 2015
work page 2015
-
[42]
Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection
Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda, and Manoj Karkee. Yolo26: key architectural enhancements and performance benchmarking for real-time object detection.arXiv preprint arXiv:2509.25164, 2025
-
[43]
Laion-5b: An open large-scale dataset for training next generation image-text models, 2022
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022
work page 2022
-
[44]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...
work page 2025
-
[45]
Zehao Ye, Lucy Lovell, Asaad Faramarzi, and Jelena Nini´c. Sam-based instance segmentation models for the automation of structural damage detection.Advanced Engineering Informatics, 62:102826, 2024
work page 2024
-
[46]
Ferret: Refer and ground anything anywhere at any granularity, 2023
Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, and Yinfei Yang. Ferret: Refer and ground anything anywhere at any granularity, 2023
work page 2023
-
[47]
Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg. Modeling context in referring expressions, 2016
work page 2016
-
[48]
Zone evaluation: Revealing spatial bias in object detection, 2024
Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, and Ming-Ming Cheng. Zone evaluation: Revealing spatial bias in object detection, 2024
work page 2024
-
[49]
Scene parsing through ade20k dataset
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 13 A Technical Appendices and Supplementary Material A.1 Training Details All baselines were trained using the off-the-shelf configu...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.