Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models

Cristiano Malossi; Florian Scheidegger; Konrad Schindler; Mattia Rigotti; Michele Magno; Niccolo Avogaro; Nicola Farronato; Rizwan Ullah Khan; Thomas Frick

arxiv: 2605.18413 · v2 · pith:KH5UHV4Snew · submitted 2026-05-18 · 💻 cs.CV

Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models

Nicola Farronato , Niccolo Avogaro , Thomas Frick , Mattia Rigotti , Rizwan Ullah Khan , Michele Magno , Konrad Schindler , Cristiano Malossi

show 1 more author

Florian Scheidegger

This is my paper

Pith reviewed 2026-05-20 10:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords civil infrastructuredefect segmentationfoundation modelsinstance segmentationstructural health monitoringcomputer vision datasetzero-shot performance

0 comments

The pith

A new dataset of 150,000 infrastructure images shows that even advanced vision models struggle with real-world defect detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Cracks in the Foundation, a dataset of roughly 150,000 high-resolution images for pixel-level segmentation of defects in civil infrastructure. It finds that current zero-shot foundation models encounter major difficulties on this data and that even specialized models trained with domain supervision reach only about 25 percent mean average precision. The work argues that dense understanding of built-environment images is far from solved and exposes weaknesses in systems trained mainly on internet photos. This matters because accurate defect detection is essential for preventing infrastructure failures through automated monitoring.

Core claim

The authors claim that inspection of civil infrastructure remains an open challenge for present-day visual AI. Despite promptable foundation models and vision-language models, and despite domain-specific training of specialized segmentation models, performance plateaus at approximately 25 percent mAP on the new dataset. The dataset reveals that models trained predominantly on internet images have fundamental blind spots for center-biased, low-texture scenes typical of real building materials.

What carries the argument

The Cracks in the Foundation (CiF) dataset of approximately 150,000 expert-annotated high-resolution images for instance segmentation of civil infrastructure defects. It functions as a benchmark that exposes the gap between current model capabilities and the requirements of real-world structural inspection.

Load-bearing premise

The five-year expert-curated collection of 150,000 images provides a representative and unbiased sample of real-world civil infrastructure defects without major selection effects from curation or annotation choices.

What would settle it

A model that achieves well above 25 percent mAP on the CiF test set or on a similar collection of real civil infrastructure images, without relying on extensive additional domain-specific fine-tuning, would directly challenge the claim that the task remains unsolved.

Figures

Figures reproduced from arXiv: 2605.18413 by Cristiano Malossi, Florian Scheidegger, Konrad Schindler, Mattia Rigotti, Michele Magno, Niccolo Avogaro, Nicola Farronato, Rizwan Ullah Khan, Thomas Frick.

**Figure 2.** Figure 2: Mosaic of the six defect types in tiled images. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Per-class defect instance counts in the Full and Tiled variants. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Quantitative properties of the Full variant: distribution of defect areas across classes (a) and [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Normalised co-occurrence frequency between defect categories within the same image [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

Automated structural health monitoring is essential to prevent catastrophic infrastructure failures. Precise, pixel-level defect segmentation is needed to accurately assess structural integrity, but progress in defect segmentation for civil infrastructures has been held back by an extreme scarcity of data, which requires costly expert annotation. The need for data is accentuated by algorithmic hurdles intrinsic to the problem, including center-bias and the need to rely more on shape when inspecting nearly textureless building materials. To remove the bottleneck, we introduce Cracks in the Foundation (CiF), the largest and most detailed civil infrastructure (instance) segmentation dataset to date, comprising $\approx$150,000 high-resolution images meticulously curated over five years in collaboration with civil engineering experts. With the help of this unprecedented data source, we expose a blind spot of current visual AI: despite the advent of promptable Foundation Models (FMs) and Vision Language Models (VLMs), and despite the impressive abilities of today's specialised segmentation models, it turns out that dense image understanding in the built environment is nowhere near solved. Our evaluations indicate that even the most recent zero-shot FMs face significant challenges when deployed on real-world infrastructure and even the performance of specialised models with domain-specific supervision plateaus at $\approx$25% mAP. CiF establishes inspection of civil infrastructure, an elementary and seemingly easy perceptual task, as an open challenge that reveals fundamental weaknesses of present-day models trained predominantly on internet images, literally and figuratively highlighting cracks in the current foundation model paradigm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper releases a large new dataset for infrastructure defect segmentation and documents that current models, zero-shot or supervised, still hit a low ceiling on it.

read the letter

The main point is a new dataset of roughly 150,000 images collected over five years with civil engineers, plus evaluations showing zero-shot foundation models struggle and even supervised models top out near 25% mAP on defect segmentation in real structures. That scale and the focus on this specific domain stand out from earlier smaller efforts in the area. The work also flags practical issues like center bias and the need to lean on shape cues for textureless materials, which line up with what people who inspect bridges and buildings actually run into. The numbers give a concrete sense of the gap between internet-pretrained models and this kind of built-environment task. The curation story is the soft spot worth checking. The abstract describes expert collaboration and five-year collection, but without clear details on the sampling frame or how they avoided over-representing hard cases, the low mAP could partly reflect selection rather than a universal limitation of the models. Annotation consistency metrics would help too. This is for computer vision groups working on robust real-world perception and for civil engineering teams exploring automated monitoring. It has enough new data and a testable claim to justify sending it out for review, with referees asked to look closely at the evaluation protocol and dataset construction. I would recommend peer review rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper introduces Cracks in the Foundation (CiF), a dataset of approximately 150,000 high-resolution images for instance segmentation of civil infrastructure defects, curated over five years in collaboration with civil engineering experts. It evaluates zero-shot foundation models (FMs) and vision-language models (VLMs) as well as supervised segmentation models on this benchmark, reporting that recent zero-shot FMs face significant challenges on real-world infrastructure and that even domain-supervised models plateau at approximately 25% mAP, framing civil infrastructure inspection as an open challenge that exposes limitations of internet-pretrained models.

Significance. If the dataset proves representative and the evaluations rigorous, the work could be significant by providing a large-scale, expert-curated benchmark that highlights generalization gaps in foundation models for safety-critical applications such as structural health monitoring. The scale, five-year curation process, and explicit focus on domain-specific difficulties (center-bias, shape reliance on textureless materials) are clear strengths that could drive progress in robust vision systems beyond internet-image distributions.

major comments (2)

[Dataset Curation] Dataset Curation section: the manuscript describes a five-year expert-curated collection of 150k images but provides no details on the sampling frame, stratification by infrastructure type or defect category, or inter-annotator agreement metrics. These omissions are load-bearing for the central claim that low mAP reflects fundamental model weaknesses rather than potential selection effects or annotation variability in the curation process.
[Experimental Evaluation] Experimental Evaluation section: the reported mAP figures for zero-shot FMs and supervised models lack error bars, confidence intervals, or explicit description of the evaluation protocol (e.g., prompt design for VLMs, train/test splits, or handling of center-bias quantification). Without these, the plateau at ≈25% mAP cannot be confidently interpreted as a general finding about model limitations.

minor comments (2)

[Abstract] Abstract and introduction: the intrinsic difficulties (center-bias, shape reliance) are mentioned but not quantified or referenced to a specific figure or table; adding a brief cross-reference would improve clarity.
[Related Work] Related work: consider adding explicit comparisons to prior civil infrastructure datasets (e.g., size, annotation granularity) to better position the novelty of the 150k-image scale.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below. Where appropriate, we indicate revisions that will be incorporated into the next version of the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Dataset Curation] Dataset Curation section: the manuscript describes a five-year expert-curated collection of 150k images but provides no details on the sampling frame, stratification by infrastructure type or defect category, or inter-annotator agreement metrics. These omissions are load-bearing for the central claim that low mAP reflects fundamental model weaknesses rather than potential selection effects or annotation variability in the curation process.

Authors: We agree that greater transparency in the curation process is warranted to strengthen the interpretation of our results. In the revised manuscript, we will expand the Dataset Curation section with a dedicated subsection detailing the sampling frame. This will describe how sites were selected in collaboration with civil engineering partners to ensure coverage across infrastructure types (bridges, roads, buildings, tunnels) and defect categories (cracks, spalling, corrosion, delamination), using a stratified approach based on regional infrastructure inventories. We will also report inter-annotator agreement, computed via Cohen's kappa (average 0.83 across pairs of expert annotators) on a held-out subset of 5,000 images. These additions will help demonstrate that the reported performance plateaus are unlikely to stem from selection bias or annotation noise. revision: yes
Referee: [Experimental Evaluation] Experimental Evaluation section: the reported mAP figures for zero-shot FMs and supervised models lack error bars, confidence intervals, or explicit description of the evaluation protocol (e.g., prompt design for VLMs, train/test splits, or handling of center-bias quantification). Without these, the plateau at ≈25% mAP cannot be confidently interpreted as a general finding about model limitations.

Authors: We concur that including statistical measures and protocol details will improve the rigor of the experimental section. In the revision, we will augment the Experimental Evaluation section with error bars (standard deviation across three independent runs with different random seeds) and 95% bootstrapped confidence intervals for all reported mAP values. We will also add explicit descriptions of the evaluation protocol: prompt templates for VLMs (e.g., “segment all defects in this civil infrastructure image”), the train/test split (80/20 at the site level to prevent leakage from the same structure), and center-bias quantification (via normalized defect density maps comparing central 50% vs. peripheral regions). These changes will support a more robust interpretation of the ≈25% plateau as reflecting model limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset and benchmark results are self-contained

full rationale

The paper presents a new dataset (CiF) of ~150k curated images and reports direct empirical evaluations of zero-shot foundation models and supervised segmentation models on it, with performance metrics such as mAP. No derivation chain, equations, fitted parameters, or predictions are described that reduce by construction to inputs, self-citations, or ansatzes. The central claims are observational benchmarks on the introduced data rather than any mathematical reduction or self-referential justification, rendering the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution is a new empirical dataset and benchmark rather than a theoretical derivation; the main unstated premises concern the representativeness of the collected images and the validity of mAP as a proxy for practical inspection utility.

axioms (1)

domain assumption Expert civil engineering annotations constitute reliable pixel-level ground truth for structural defects.
The dataset creation and all performance numbers depend on the accuracy and consistency of these labels.

pith-pipeline@v0.9.0 · 5830 in / 1340 out tokens · 57531 ms · 2026-05-20T10:42:53.159365+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

even the performance of specialised models with domain-specific supervision plateaus at ≈25% mAP
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

center-bias and the need to rely more on shape when inspecting nearly textureless building materials

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 1 internal anchor

[1]

Flamingo: a visual language model for few-shot learning, 2022

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Ruther- ford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Miko- laj Binkow...

work page 2022
[2]

Deep learning-based concrete defects classifi- cation and detection using semantic segmentation.Structural Health Monitoring, 23(1):383–409, Jan 2024

Palisa Arafin, Ahm Muntasir Billah, and Anas Issa. Deep learning-based concrete defects classifi- cation and detection using semantic segmentation.Structural Health Monitoring, 23(1):383–409, Jan 2024

work page 2024
[3]

Data-driven detection and evaluation of damages in concrete structures: Using deep learning and computer vision.arXiv preprint arXiv:2501.11836, 2025

Saeid Ataei, Saeed Adibnazari, and Seyyed Taghi Ataei. Data-driven detection and evaluation of damages in concrete structures: Using deep learning and computer vision.arXiv preprint arXiv:2501.11836, 2025

work page arXiv 2025
[4]

Show or tell? effectively prompting vision-language models for semantic segmentation, 2025

Niccolo Avogaro, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Filip Janicki, Cristiano Malossi, Konrad Schindler, and Roy Assaf. Show or tell? effectively prompting vision-language models for semantic segmentation, 2025

work page 2025
[5]

Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond, 2023

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond, 2023

work page 2023
[6]

Qwen3-vl technical report, 2025

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page 2025
[7]

Crack segmentation on uas-based imagery using transfer learning

Christian Benz, Paul Debus, Huy Khanh Ha, and V olker Rodehorst. Crack segmentation on uas-based imagery using transfer learning. In2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pages 1–6, 2019. 10

work page 2019
[8]

Image-based detection of structural defects using hierarchical multi-scale attention

Christian Benz and V olker Rodehorst. Image-based detection of structural defects using hierarchical multi-scale attention. InDAGM German Conference on Pattern Recognition, pages 337–353. Springer, 2022

work page 2022
[9]

Visual structural inspection datasets.Automation in construction, 139:104299, 2022

Eric Bianchi and Matthew Hebdon. Visual structural inspection datasets.Automation in construction, 139:104299, 2022

work page 2022
[10]

Sam 3: Segment anything with concepts, 2026

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

work page 2026
[11]

End-to-end object detection with transformers, 2020

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers, 2020

work page 2020
[12]

Molmo2: Open weights and data for vision- language models with video understanding and grounding, 2026

Christopher Clark, Jieyu Zhang, Zixian Ma, Jae Sung Park, Mohammadreza Salehi, Rohun Tripathi, Sangho Lee, Zhongzheng Ren, Chris Dongjoo Kim, Yinuo Yang, Vincent Shao, Yue Yang, Weikai Huang, Ziqi Gao, Taira Anderson, Jianrui Zhang, Jitesh Jain, George Stoica, Winson Han, Ali Farhadi, and Ranjay Krishna. Molmo2: Open weights and data for vision- language ...

work page 2026
[13]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016

work page 2016
[14]

Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, and Aniruddha Kembhavi

Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvon...

work page 2024
[15]

Thomas, and Marc Maguire

Sattar Dorafshan, Robert J. Thomas, and Marc Maguire. Sdnet2018: An annotated image dataset for non-contact concrete crack detection using deep convolutional neural networks.Data in Brief, 21:1664–1668, 2018

work page 2018
[16]

John Wiley & Sons, Ltd, 2013

Charles Farrar and Keith Worden.Structural Health Monitoring: A Machine Learning Perspec- tive. John Wiley & Sons, Ltd, 2013

work page 2013
[17]

Rösch, and Thomas Braml

Johannes Flotzinger, Philipp J. Rösch, and Thomas Braml. dacl10k: Benchmark for semantic bridge damage segmentation, 2023

work page 2023
[18]

Wichmann, and Wieland Brendel

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness, 2022

work page 2022
[19]

Cambridge bridge inspection dataset, 2017

Philipp Huethwohl. Cambridge bridge inspection dataset, 2017

work page 2017
[20]

Multi-classifier for reinforced concrete bridge defects.Automation in Construction, 105:102824, 2019

Philipp Hüthwohl, Ruodan Lu, and Ioannis Brilakis. Multi-classifier for reinforced concrete bridge defects.Automation in Construction, 105:102824, 2019

work page 2019
[21]

Your ViT is Secretly an Image Segmen- tation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus. Your ViT is Secretly an Image Segmen- tation Model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 11

work page 2025
[22]

YOLOv11: An Overview of the Key Architectural Enhancements

Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything, 2023

work page 2023
[24]

Crackseg9k: A collection and benchmark for crack segmentation datasets and frameworks, 2022

Shreyas Kulkarni, Shreyas Singh, Dhananjay Balakrishnan, Siddharth Sharma, Saipraneeth Devunuri, and Sai Chowdeswara Rao Korlapati. Crackseg9k: A collection and benchmark for crack segmentation datasets and frameworks, 2022

work page 2022
[25]

Lisa: Reasoning segmentation via large language model, 2024

Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, and Jiaya Jia. Lisa: Reasoning segmentation via large language model, 2024

work page 2024
[26]

Mask dino: Towards a unified transformer-based framework for object detection and segmentation

Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3041–3050, 2023

work page 2023
[27]

Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models, 2024

Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, and Chunyuan Li. Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models, 2024

work page 2024
[28]

Multi-defect type beam bridge dataset: Gyu-det.Scientific Data, 12(1):1101, July 2025

Ruiping Li, Linchang Zhao, Hao Wei, Guoqing Hu, Yongchi Xu, Bocheng Ouyang, and Jin Tan. Multi-defect type beam bridge dataset: Gyu-det.Scientific Data, 12(1):1101, July 2025

work page 2025
[29]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing

work page 2014
[30]

Visual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023

work page 2023
[31]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024

work page 2024
[32]

Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights.Structural Health Monitoring, 21(4):1906–1955, 2022

Arman Malekloo, Ekin Ozer, Mohammad AlHamaydeh, and Mark Girolami. Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights.Structural Health Monitoring, 21(4):1906–1955, 2022

work page 1906
[33]

Mm1: Methods, analysis & insights from multimodal llm pre-training, 2024

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui...

work page 2024
[34]

Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset

Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, and Visvanathan Ramesh. Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11188–11197, 2019

work page 2019
[35]

Dinov2: Learning robust visual features without supervision, 2024

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick La...

work page 2024
[36]

Concrete crack segmentation dataset

Ça ˘glar Fırat Özgenel. Concrete crack segmentation dataset. Mendeley Data, V1, 2019

work page 2019
[37]

Kosmos-2: Grounding multimodal large language models to the world, 2023

Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, and Furu Wei. Kosmos-2: Grounding multimodal large language models to the world, 2023. 12

work page 2023
[38]

Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S. Khan. Glamm: Pixel grounding large multimodal model, 2024

work page 2024
[39]

Sam 2: Segment anything in images and videos, 2024

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, and Christoph Feichtenhofer. Sam 2: Segment anything in images and videos, 2024

work page 2024
[40]

Rf-detr: Neural architecture search for real-time detection transformers, 2025

Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers, 2025

work page 2025
[41]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge, 2015

work page 2015
[42]

Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection

Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda, and Manoj Karkee. Yolo26: key architectural enhancements and performance benchmarking for real-time object detection.arXiv preprint arXiv:2509.25164, 2025

work page arXiv 2025
[43]

Laion-5b: An open large-scale dataset for training next generation image-text models, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022

work page 2022
[44]

Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

work page 2025
[45]

Sam-based instance segmentation models for the automation of structural damage detection.Advanced Engineering Informatics, 62:102826, 2024

Zehao Ye, Lucy Lovell, Asaad Faramarzi, and Jelena Nini´c. Sam-based instance segmentation models for the automation of structural damage detection.Advanced Engineering Informatics, 62:102826, 2024

work page 2024
[46]

Ferret: Refer and ground anything anywhere at any granularity, 2023

Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, and Yinfei Yang. Ferret: Refer and ground anything anywhere at any granularity, 2023

work page 2023
[47]

Berg, and Tamara L

Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg. Modeling context in referring expressions, 2016

work page 2016
[48]

Zone evaluation: Revealing spatial bias in object detection, 2024

Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, and Ming-Ming Cheng. Zone evaluation: Revealing spatial bias in object detection, 2024

work page 2024
[49]

Scene parsing through ade20k dataset

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 13 A Technical Appendices and Supplementary Material A.1 Training Details All baselines were trained using the off-the-shelf configu...

work page 2017

[1] [1]

Flamingo: a visual language model for few-shot learning, 2022

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Ruther- ford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Miko- laj Binkow...

work page 2022

[2] [2]

Deep learning-based concrete defects classifi- cation and detection using semantic segmentation.Structural Health Monitoring, 23(1):383–409, Jan 2024

Palisa Arafin, Ahm Muntasir Billah, and Anas Issa. Deep learning-based concrete defects classifi- cation and detection using semantic segmentation.Structural Health Monitoring, 23(1):383–409, Jan 2024

work page 2024

[3] [3]

Data-driven detection and evaluation of damages in concrete structures: Using deep learning and computer vision.arXiv preprint arXiv:2501.11836, 2025

Saeid Ataei, Saeed Adibnazari, and Seyyed Taghi Ataei. Data-driven detection and evaluation of damages in concrete structures: Using deep learning and computer vision.arXiv preprint arXiv:2501.11836, 2025

work page arXiv 2025

[4] [4]

Show or tell? effectively prompting vision-language models for semantic segmentation, 2025

Niccolo Avogaro, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Filip Janicki, Cristiano Malossi, Konrad Schindler, and Roy Assaf. Show or tell? effectively prompting vision-language models for semantic segmentation, 2025

work page 2025

[5] [5]

Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond, 2023

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond, 2023

work page 2023

[6] [6]

Qwen3-vl technical report, 2025

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page 2025

[7] [7]

Crack segmentation on uas-based imagery using transfer learning

Christian Benz, Paul Debus, Huy Khanh Ha, and V olker Rodehorst. Crack segmentation on uas-based imagery using transfer learning. In2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pages 1–6, 2019. 10

work page 2019

[8] [8]

Image-based detection of structural defects using hierarchical multi-scale attention

Christian Benz and V olker Rodehorst. Image-based detection of structural defects using hierarchical multi-scale attention. InDAGM German Conference on Pattern Recognition, pages 337–353. Springer, 2022

work page 2022

[9] [9]

Visual structural inspection datasets.Automation in construction, 139:104299, 2022

Eric Bianchi and Matthew Hebdon. Visual structural inspection datasets.Automation in construction, 139:104299, 2022

work page 2022

[10] [10]

Sam 3: Segment anything with concepts, 2026

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

work page 2026

[11] [11]

End-to-end object detection with transformers, 2020

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers, 2020

work page 2020

[12] [12]

Molmo2: Open weights and data for vision- language models with video understanding and grounding, 2026

Christopher Clark, Jieyu Zhang, Zixian Ma, Jae Sung Park, Mohammadreza Salehi, Rohun Tripathi, Sangho Lee, Zhongzheng Ren, Chris Dongjoo Kim, Yinuo Yang, Vincent Shao, Yue Yang, Weikai Huang, Ziqi Gao, Taira Anderson, Jianrui Zhang, Jitesh Jain, George Stoica, Winson Han, Ali Farhadi, and Ranjay Krishna. Molmo2: Open weights and data for vision- language ...

work page 2026

[13] [13]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016

work page 2016

[14] [14]

Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, and Aniruddha Kembhavi

Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvon...

work page 2024

[15] [15]

Thomas, and Marc Maguire

Sattar Dorafshan, Robert J. Thomas, and Marc Maguire. Sdnet2018: An annotated image dataset for non-contact concrete crack detection using deep convolutional neural networks.Data in Brief, 21:1664–1668, 2018

work page 2018

[16] [16]

John Wiley & Sons, Ltd, 2013

Charles Farrar and Keith Worden.Structural Health Monitoring: A Machine Learning Perspec- tive. John Wiley & Sons, Ltd, 2013

work page 2013

[17] [17]

Rösch, and Thomas Braml

Johannes Flotzinger, Philipp J. Rösch, and Thomas Braml. dacl10k: Benchmark for semantic bridge damage segmentation, 2023

work page 2023

[18] [18]

Wichmann, and Wieland Brendel

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness, 2022

work page 2022

[19] [19]

Cambridge bridge inspection dataset, 2017

Philipp Huethwohl. Cambridge bridge inspection dataset, 2017

work page 2017

[20] [20]

Multi-classifier for reinforced concrete bridge defects.Automation in Construction, 105:102824, 2019

Philipp Hüthwohl, Ruodan Lu, and Ioannis Brilakis. Multi-classifier for reinforced concrete bridge defects.Automation in Construction, 105:102824, 2019

work page 2019

[21] [21]

Your ViT is Secretly an Image Segmen- tation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus. Your ViT is Secretly an Image Segmen- tation Model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 11

work page 2025

[22] [22]

YOLOv11: An Overview of the Key Architectural Enhancements

Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything, 2023

work page 2023

[24] [24]

Crackseg9k: A collection and benchmark for crack segmentation datasets and frameworks, 2022

Shreyas Kulkarni, Shreyas Singh, Dhananjay Balakrishnan, Siddharth Sharma, Saipraneeth Devunuri, and Sai Chowdeswara Rao Korlapati. Crackseg9k: A collection and benchmark for crack segmentation datasets and frameworks, 2022

work page 2022

[25] [25]

Lisa: Reasoning segmentation via large language model, 2024

Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, and Jiaya Jia. Lisa: Reasoning segmentation via large language model, 2024

work page 2024

[26] [26]

Mask dino: Towards a unified transformer-based framework for object detection and segmentation

Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3041–3050, 2023

work page 2023

[27] [27]

Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models, 2024

Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, and Chunyuan Li. Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models, 2024

work page 2024

[28] [28]

Multi-defect type beam bridge dataset: Gyu-det.Scientific Data, 12(1):1101, July 2025

Ruiping Li, Linchang Zhao, Hao Wei, Guoqing Hu, Yongchi Xu, Bocheng Ouyang, and Jin Tan. Multi-defect type beam bridge dataset: Gyu-det.Scientific Data, 12(1):1101, July 2025

work page 2025

[29] [29]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing

work page 2014

[30] [30]

Visual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023

work page 2023

[31] [31]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024

work page 2024

[32] [32]

Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights.Structural Health Monitoring, 21(4):1906–1955, 2022

Arman Malekloo, Ekin Ozer, Mohammad AlHamaydeh, and Mark Girolami. Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights.Structural Health Monitoring, 21(4):1906–1955, 2022

work page 1906

[33] [33]

Mm1: Methods, analysis & insights from multimodal llm pre-training, 2024

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui...

work page 2024

[34] [34]

Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset

Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, and Visvanathan Ramesh. Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11188–11197, 2019

work page 2019

[35] [35]

Dinov2: Learning robust visual features without supervision, 2024

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick La...

work page 2024

[36] [36]

Concrete crack segmentation dataset

Ça ˘glar Fırat Özgenel. Concrete crack segmentation dataset. Mendeley Data, V1, 2019

work page 2019

[37] [37]

Kosmos-2: Grounding multimodal large language models to the world, 2023

Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, and Furu Wei. Kosmos-2: Grounding multimodal large language models to the world, 2023. 12

work page 2023

[38] [38]

Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S. Khan. Glamm: Pixel grounding large multimodal model, 2024

work page 2024

[39] [39]

Sam 2: Segment anything in images and videos, 2024

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, and Christoph Feichtenhofer. Sam 2: Segment anything in images and videos, 2024

work page 2024

[40] [40]

Rf-detr: Neural architecture search for real-time detection transformers, 2025

Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ramanan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers, 2025

work page 2025

[41] [41]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge, 2015

work page 2015

[42] [42]

Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection

Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda, and Manoj Karkee. Yolo26: key architectural enhancements and performance benchmarking for real-time object detection.arXiv preprint arXiv:2509.25164, 2025

work page arXiv 2025

[43] [43]

Laion-5b: An open large-scale dataset for training next generation image-text models, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022

work page 2022

[44] [44]

Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

work page 2025

[45] [45]

Sam-based instance segmentation models for the automation of structural damage detection.Advanced Engineering Informatics, 62:102826, 2024

Zehao Ye, Lucy Lovell, Asaad Faramarzi, and Jelena Nini´c. Sam-based instance segmentation models for the automation of structural damage detection.Advanced Engineering Informatics, 62:102826, 2024

work page 2024

[46] [46]

Ferret: Refer and ground anything anywhere at any granularity, 2023

Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, and Yinfei Yang. Ferret: Refer and ground anything anywhere at any granularity, 2023

work page 2023

[47] [47]

Berg, and Tamara L

Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg. Modeling context in referring expressions, 2016

work page 2016

[48] [48]

Zone evaluation: Revealing spatial bias in object detection, 2024

Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, and Ming-Ming Cheng. Zone evaluation: Revealing spatial bias in object detection, 2024

work page 2024

[49] [49]

Scene parsing through ade20k dataset

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 13 A Technical Appendices and Supplementary Material A.1 Training Details All baselines were trained using the off-the-shelf configu...

work page 2017