arxiv: 2605.04635 · v2 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection

Huan Zhang , Lianghong Tan , Yichu Xu , Jiangzhong Cao , Huanqi Wu , Linwei Zhu , Xu Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:46 UTC · model grok-4.3

classification 💻 cs.CV

keywords PCB defect detectiondefect generationdiffusion modelsmulti-modal conditionsattention mechanismsdata augmentationIIoT inspectioncomputer vision

0 comments

The pith

A joint generation-detection framework for PCBs uses multi-modal synthesis to augment scarce defect data and reach 98.0 percent mAP@0.5.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that combining controlled defect synthesis with a specialized detector solves the twin problems of scarce imbalanced samples and weak feature representation in PCB inspection. A multi-modal generator extracts edge, depth, and text conditions in parallel, embeds them at multiple scales, and modulates them to produce structurally aligned defects that augment the training set. The detector then applies shift-wise attention for global-local context and gated cross-level fusion to handle complex backgrounds. This integrated pipeline yields higher detection accuracy than prior separate approaches while also improving generation quality over existing conditional methods. A sympathetic reader would care because industrial IIoT systems often lack enough real defects to train reliable models.

Core claim

The authors establish that a generation-assisted detection framework, with a Multi-modal Condition Generator feeding a ScaleEncoder and FiLM-style Condition Modulation for synthesis, plus an Inverted Residual Shift Attention and Cross-level Complementary Fusion Block for detection, jointly overcomes data scarcity and representation limits, delivering mAP@0.5 of 98.0 percent and mAP@0.5:0.95 of 61.8 percent on DsPCBSD+ while the generator reaches FID 129.61 and SSIM 0.619.

What carries the argument

The Multi-modal Condition Generator with ScaleEncoder and Condition Modulation that synthesizes aligned defects from parallel edge-depth-text inputs, paired with the detector's Inverted Residual Shift Attention and Cross-level Complementary Fusion Block that fuses global context and local texture via shift convolution and pixel-level gates.

If this is right

Synthesized defects directly enrich the scarce IIoT training set, so gains in generation quality translate into higher detection mAP.
The multi-modal conditioning enables structurally aligned samples that help the detector handle complex circuit backgrounds better than single-condition methods.
The joint pipeline outperforms all compared detection and generation baselines on the DsPCBSD+ benchmark.
The IIoT pipeline supports real-time inspection by addressing both data volume and feature extraction challenges in one system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multi-modal conditioning strategy could be tested on other industrial inspection tasks where defect samples are rare, such as weld or fabric defect detection.
Ablating the generation branch would show whether the attention and fusion blocks alone deliver part of the accuracy gain even without extra data.
If domain shift between generated and real defects proves larger than reported, the framework might need additional adaptation steps for new PCB manufacturing lines.

Load-bearing premise

The synthesized defect samples must be realistic enough and distributionally close enough to real IIoT PCB images that adding them to the training set improves detection accuracy on actual data rather than introducing harmful artifacts or shift.

What would settle it

Training the detector on real samples alone versus real samples plus the generated ones and measuring mAP on a held-out set of real PCB defects; if the augmented version shows no gain or a drop, the core benefit of generation assistance is refuted.

Figures

Figures reproduced from arXiv: 2605.04635 by Huanqi Wu, Huan Zhang, Jiangzhong Cao, Lianghong Tan, Linwei Zhu, Xu Zhang, Yichu Xu.

**Figure 1.** Figure 1: Comparison between UniPCB and previous approaches for PCB view at source ↗

**Figure 1.** Figure 1: Comparison between UniPCB and previous approaches for PCB [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of the proposed defect generation framework. Real PCB images captured by an AOI machine are processed by the Multi-modal Condition view at source ↗

**Figure 3.** Figure 3: (a) Structure of the ScaleEncoder, which encodes the condition map via view at source ↗

**Figure 4.** Figure 4: Overview of the proposed detection framework. IRSA Blocks replace BasicBlocks in all four stages of the ResNet-18 backbone. Yellow and blue view at source ↗

**Figure 5.** Figure 5: (a) Overview of the Inverted Residual Shift Attention Block. (b) view at source ↗

**Figure 7.** Figure 7: Representative annotated samples from the six defect categories, view at source ↗

**Figure 6.** Figure 6: (a) Overview of the Cross-level Complementary Fusion (CLCF) Block. view at source ↗

**Figure 8.** Figure 8: Visualization of detection results across different models. Zoom-in for best view. view at source ↗

**Figure 9.** Figure 9: Visualization of generation results across different models. For better [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 9.** Figure 9: Visualization of generation results across different models. For better view at source ↗

**Figure 10.** Figure 10: Examples of synthesized defect images with bounding-box annota view at source ↗

read the original abstract

In the Industrial Internet of Things (IIoT), enabling intelligent, real-time Printed Circuit Board (PCB) defect inspection is critical for ensuring product reliability. However, existing IIoT-based visual inspection systems face two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework that integrates controlled defect synthesis with task-specific defect detection within an IIoT-enabled pipeline. On the generation side, a Multi-modal Condition Generator extracts complementary edge, depth, and text conditions in parallel. A ScaleEncoder then embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation at each scale, enabling structurally aligned and defect-aware sample synthesis to augment the scarce IIoT dataset. On the detection side, an Inverted Residual Shift Attention couples self-attention with shift-wise convolution to jointly capture global context and local texture, and a Cross-level Complementary Fusion Block generates pixel-level gates for selective cross-level feature fusion. The synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection. Extensive experiments on DsPCBSD+ demonstrate that UniPCB achieves mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% on defect detection, surpassing all compared methods, while the generation branch attains an FID of 129.61 and SSIM of 0.619, outperforming existing conditional generation approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniPCB pairs multi-modal defect generation with a custom detector to handle scarce PCB data, but the experiments do not isolate whether the generated samples drive the reported gains.

read the letter

UniPCB pairs multi-modal defect generation with a custom detector to handle scarce PCB data, but the experiments do not isolate whether the generated samples drive the reported gains. The generation side runs edge, depth, and text conditions in parallel, embeds them via ScaleEncoder, and applies FiLM modulation at four resolutions inside a diffusion U-Net. The detection side adds Inverted Residual Shift Attention to mix global context with local texture and a Cross-level Complementary Fusion Block that gates features across levels. The two branches train together so that synthetic samples directly enlarge the detection set. This joint setup is the concrete extension beyond single-modality generation or standalone detectors. The headline numbers on DsPCBSD+ are 98.0% mAP@0.5 and 61.8% mAP@0.5:0.95, with generation FID of 129.61 and SSIM of 0.619 beating the cited baselines. The paper does a clean job of stating the industrial data-scarcity problem and showing how the pipeline addresses both synthesis and detection in one loop. The central gap is the missing ablation that trains the detector on real data only. Without it, the mAP improvement could come entirely from the new attention and fusion modules rather than from the added samples. The generation metrics are middling, so domain shift remains a plausible risk that is not tested. The abstract also omits error bars, statistical tests, and split details, which leaves the strength of the claims harder to judge. This work is aimed at engineers and researchers who build visual inspection systems for manufacturing and IIoT, especially those facing imbalanced defect classes. A reader who needs a practical pipeline for augmenting small industrial datasets would get usable ideas from the conditioning and fusion choices. It deserves a serious referee because the problem is real, the architecture is fully specified, and the numbers are concrete enough to evaluate. Send it for review but request the generation ablation and basic statistical reporting on the main results.

Referee Report

3 major / 2 minor

Summary. The paper proposes UniPCB, a generation-assisted framework for PCB defect inspection that integrates a Multi-modal Condition Generator (using parallel edge, depth, and text conditions fed via ScaleEncoder and Condition Modulation into a diffusion U-Net) for synthesizing defect samples to augment scarce IIoT data, with a detection network employing Inverted Residual Shift Attention and Cross-level Complementary Fusion for improved feature representation. On the DsPCBSD+ dataset, it reports mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% for detection (surpassing compared methods) alongside generation metrics of FID 129.61 and SSIM 0.619 (outperforming existing conditional generators), claiming that synthesized samples directly enrich training and compound with architectural improvements.

Significance. If the central claim holds, the work offers a practical pipeline for addressing data imbalance in industrial PCB inspection by jointly optimizing synthesis and detection, which could improve real-world IIoT reliability. The explicit reporting of both generation quality metrics and end-task mAP provides a basis for comparison, and the multi-modal conditioning approach is a concrete technical contribution. However, the lack of isolating experiments limits attribution of gains.

major comments (3)

[Abstract / Experiments] Abstract and experiments section: The headline claim that 'the synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection' is load-bearing for the generation-assisted framing, yet no ablation is described that trains the detection branch (Inverted Residual Shift Attention + Cross-level Complementary Fusion) on real DsPCBSD+ data only versus real + generated samples. Without this, the mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% cannot be attributed to the Multi-modal Condition Generator rather than the detection modules alone.
[Abstract] Abstract: The reported mAP and generation metrics are presented without error bars, statistical significance tests (e.g., paired t-tests across runs), details on train/validation/test splits, or full baseline re-implementation protocols. This makes it impossible to assess whether the gains over compared methods are robust or sensitive to implementation choices.
[Abstract / Methods] Generation branch description: The Multi-modal Condition Generator is said to produce 'structurally aligned and defect-aware' samples, but the abstract provides no quantitative measure of distributional alignment (e.g., feature-space distance to real defects) or qualitative failure cases, leaving the weakest assumption—that the FID 129.61 / SSIM 0.619 outputs avoid harmful domain shift—unverified.

minor comments (2)

[Abstract] The abstract uses 'DsPCBSD+' without defining the dataset or citing its source; this should be clarified with a reference or brief description in the main text.
[Methods] Notation for the ScaleEncoder and Condition Modulation (FiLM-style) is introduced without equations; adding a short mathematical formulation would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental validation and reporting that we will address to strengthen the paper. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [Abstract / Experiments] The headline claim that 'the synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection' is load-bearing for the generation-assisted framing, yet no ablation is described that trains the detection branch (Inverted Residual Shift Attention + Cross-level Complementary Fusion) on real DsPCBSD+ data only versus real + generated samples. Without this, the mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% cannot be attributed to the Multi-modal Condition Generator rather than the detection modules alone.

Authors: We agree that an explicit ablation isolating the contribution of the synthesized samples is necessary to substantiate the generation-assisted claim. In the revised manuscript, we will add this ablation: training the full detection network (Inverted Residual Shift Attention + Cross-level Complementary Fusion) on real DsPCBSD+ data only, and comparing it directly to training on the combined real + generated set under identical hyperparameters and splits. This will quantify the mAP gains attributable to the Multi-modal Condition Generator. revision: yes
Referee: [Abstract] The reported mAP and generation metrics are presented without error bars, statistical significance tests (e.g., paired t-tests across runs), details on train/validation/test splits, or full baseline re-implementation protocols. This makes it impossible to assess whether the gains over compared methods are robust or sensitive to implementation choices.

Authors: We will revise the experiments section to report mean mAP values with standard deviations across multiple independent runs (e.g., 5 seeds), include paired t-tests or equivalent significance tests against baselines, explicitly state the train/validation/test split ratios and sampling strategy on DsPCBSD+, and provide complete re-implementation details (hyperparameters, data augmentation, and training schedules) for all compared methods to enable robust assessment of the gains. revision: yes
Referee: [Abstract / Methods] Generation branch description: The Multi-modal Condition Generator is said to produce 'structurally aligned and defect-aware' samples, but the abstract provides no quantitative measure of distributional alignment (e.g., feature-space distance to real defects) or qualitative failure cases, leaving the weakest assumption—that the FID 129.61 / SSIM 0.619 outputs avoid harmful domain shift—unverified.

Authors: FID and SSIM are established metrics for generation fidelity and structural similarity. To further verify distributional alignment and absence of harmful domain shift, the revised version will add quantitative analysis (e.g., average feature-space L2 distances using embeddings from a pre-trained ResNet on real vs. generated defect patches) and a dedicated qualitative section showing representative success cases alongside any observed failure modes (e.g., over-generated artifacts or misalignment). revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework with external validation

full rationale

The paper presents an architectural framework (Multi-modal Condition Generator with ScaleEncoder and Condition Modulation; Inverted Residual Shift Attention and Cross-level Complementary Fusion) whose performance claims rest on empirical metrics (mAP@0.5 98.0%, mAP@0.5:0.95 61.8%, FID 129.61, SSIM 0.619) measured on the external DsPCBSD+ dataset. No equations, derivations, or first-principles results are described that reduce by construction to fitted inputs, self-citations, or renamed patterns. The generation-assisted claim is presented as an empirical outcome rather than a tautological restatement of training objectives, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the framework rests on standard deep-learning assumptions about diffusion models and attention rather than novel axioms, but many implementation details (loss balancing, conditioning strength, training schedules) remain unspecified.

axioms (2)

domain assumption Multi-modal conditions (edge, depth, text) can be extracted in parallel and embedded to produce structurally aligned defect images
Invoked in the generation branch description
domain assumption Coupling self-attention with shift-wise convolution and cross-level gating improves feature representation under complex circuit backgrounds
Invoked in the detection branch description

pith-pipeline@v0.9.0 · 5629 in / 1568 out tokens · 62189 ms · 2026-05-12T00:46:45.340438+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Multi-modal Condition Generator extracts complementary edge, depth, and text conditions... ScaleEncoder embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Inverted Residual Shift Attention couples self-attention with shift-wise convolution... Cross-level Complementary Fusion Block generates pixel-level gates

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

[1]

Review of vision- based defect detection research and its perspectives for printed circuit board,

Y . Zhou, M. Yuan, J. Zhang, G. Ding, and S. Qin, “Review of vision- based defect detection research and its perspectives for printed circuit board,”J. Manuf. Syst., vol. 70, pp. 557–578, 2023

work page 2023
[2]

A comprehensive review of research on surface defect detection of pcbs based on machine vision,

Z. He, Y . Lian, Y . Wang, and Z. Lu, “A comprehensive review of research on surface defect detection of pcbs based on machine vision,”Results Eng., p. 106437, 2025

work page 2025
[3]

Auto-encoding variational bayes,

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in Proc. ICLR, 2014

work page 2014
[4]

Neural discrete representation learning,

A. Van Den Oord, O. Vinyalset al., “Neural discrete representation learning,”Proc. NeurIPS, 2017

work page 2017
[5]

Generative adversarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Proc. NeurIPS, 2014

work page 2014
[6]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Proc. NeurIPS, pp. 6840–6851, 2020

work page 2020
[7]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inProc. ICLR, 2021

work page 2021
[8]

Score-based generative modeling through stochastic differ- ential equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,” inProc. ICLR, 2021

work page 2021
[9]

Uniuir: Considering underwater image restoration as an all-in-one learner,

X. Zhang, H. Zhang, G. Wang, Q. Zhang, L. Zhang, and B. Du, “Uniuir: Considering underwater image restoration as an all-in-one learner,”IEEE Trans. Image Process., vol. 34, pp. 6963–6977, 2025

work page 2025
[10]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProc. CVPR, 2022, pp. 10 684–10 695

work page 2022
[11]

Any2rsi: Controllable remote sens- ing text-to-image generation via any control and enriched description,

X. Zhang, J. Huang, and L. Zhang, “Any2rsi: Controllable remote sens- ing text-to-image generation via any control and enriched description,” inProc. AAAI, 2026, pp. 12 852–12 860

work page 2026
[12]

Diffusion models in vision: A survey,

F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 10 850–10 869, 2023

work page 2023
[13]

A survey on surface defect inspection based on generative models in manufacturing,

Y . He, S. Li, X. Wen, and J. Xu, “A survey on surface defect inspection based on generative models in manufacturing,”Appl. Sci., vol. 14, no. 15, p. 6774, 2024

work page 2024
[14]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” inProc. ECCV, 2020, pp. 213–229

work page 2020
[15]

Detrs beat yolos on real-time object detection,

Y . Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y . Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” inProc. CVPR, 2024, pp. 16 965–16 974

work page 2024
[16]

Efficientdet: Scalable and efficient object detection,

M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient object detection,” inProc. CVPR, 2020, pp. 10 781–10 790

work page 2020
[17]

Research on pcb defect detection using artificial intelligence: a systematic mapping study,

D. I. Ural and A. Sezen, “Research on pcb defect detection using artificial intelligence: a systematic mapping study,”Evol. Intell., vol. 17, no. 5, pp. 3101–3111, 2024

work page 2024
[18]

A new contrastive gan with data augmentation for surface defect recognition under limited data,

Z. Du, L. Gao, and X. Li, “A new contrastive gan with data augmentation for surface defect recognition under limited data,”IEEE Trans. Instrum. Meas., vol. 72, pp. 1–13, 2022

work page 2022
[19]

Few-shot defect image generation via defect-aware feature manipulation,

Y . Duan, Y . Hong, L. Niu, and L. Zhang, “Few-shot defect image generation via defect-aware feature manipulation,” inProc. AAAI, 2023, pp. 571–578

work page 2023
[20]

A stable diffusion enhanced yolov5 model for metal stamped part defect detection based on improved network structure,

Y . Liang, S. Feng, Y . Zhang, F. Xue, F. Shen, and J. Guo, “A stable diffusion enhanced yolov5 model for metal stamped part defect detection based on improved network structure,”J. Manuf. Process., vol. 111, pp. 21–31, 2024

work page 2024
[21]

Self-attention generative adversarial networks,

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” inProc. ICML, 2019, pp. 7354–7363

work page 2019
[22]

Semantic image synthesis with spatially-adaptive normalization,

T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu, “Semantic image synthesis with spatially-adaptive normalization,” inProc. CVPR, 2019, pp. 2337–2346

work page 2019
[23]

Effective data augmentation with diffusion models,

B. Trabucco, K. Doherty, M. Gurinas, and R. Salakhutdinov, “Effective data augmentation with diffusion models,” inProc. ICLR, 2024

work page 2024
[24]

Anomalydiffusion: Few-shot anomaly image generation with diffusion model,

T. Hu, J. Zhang, R. Yi, Y . Du, X. Chen, L. Liu, Y . Wang, and C. Wang, “Anomalydiffusion: Few-shot anomaly image generation with diffusion model,” inProc. AAAI, 2024, pp. 8526–8534

work page 2024
[25]

Ddmf: a pcb surface defect detection model based on conditional denoising diffusion and multiscale feature fusion,

W. Deng, L. Yan, and C. Wang, “Ddmf: a pcb surface defect detection model based on conditional denoising diffusion and multiscale feature fusion,”J. Supercomput., vol. 81, no. 15, pp. 1–33, 2025

work page 2025
[26]

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,

C. Mou, X. Wang, L. Xie, Y . Wu, J. Zhang, Z. Qi, Y . Shan, and X. Qie, “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,” inProc. AAAI, 2024, pp. 4296–4304

work page 2024
[27]

Unicontrol: A unified diffusion model for controllable visual generation in the wild,

C. Qin, S. Bai, Y . Shen, L. Chen, B. Ni, J. Liu, Y . Liu, and X. Liu, “Unicontrol: A unified diffusion model for controllable visual generation in the wild,” inProc. NeurIPS, 2023

work page 2023
[28]

Circuit board welding defect detection based on industrial iovt,

C. Zhang, G. Shi, H. Li, M. Yang, Y . Li, Z. Bing, W. Chen, Z. Wang, F. Yu, and V . C. M. Leung, “Circuit board welding defect detection based on industrial iovt,”IEEE Internet Things J., vol. 13, no. 7, pp. 14 003–14 018, 2026

work page 2026
[29]

Yolo-hmc: An improved method for pcb surface defect detection,

M. Yuan, Y . Zhou, X. Ren, H. Zhi, J. Zhang, and H. Chen, “Yolo-hmc: An improved method for pcb surface defect detection,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–11, 2024

work page 2024
[30]

Reliable and lightweight adaptive convolution network for pcb surface defect detection,

L. Lei, H.-X. Li, and H.-D. Yang, “Reliable and lightweight adaptive convolution network for pcb surface defect detection,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–8, 2024

work page 2024
[31]

An adaptive defect-aware attention network for accurate pcb- defect detection,

X. Liu, “An adaptive defect-aware attention network for accurate pcb- defect detection,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–11, 2024

work page 2024
[32]

Internimage: Exploring large- scale vision foundation models with deformable convolutions,

W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, X. Wang, and Y . Qiao, “Internimage: Exploring large- scale vision foundation models with deformable convolutions,” inProc. CVPR, 2023, pp. 14 408–14 419. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

work page 2023
[33]

Sgt-yolo: A lightweight method for pcb defect detection,

C. Mo, Z. Hu, J. Wang, and X. Xiao, “Sgt-yolo: A lightweight method for pcb defect detection,”IEEE Trans. Instrum. Meas., vol. 74, pp. 1–11, 2025

work page 2025
[34]

Refined defect detector with deformable transformer and pyramid feature fusion for pcb detection,

T. Liu, G.-Z. Cao, Z. He, and S. Xie, “Refined defect detector with deformable transformer and pyramid feature fusion for pcb detection,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–11, 2023

work page 2023
[35]

Pcb-detr: A detection network of pcb surface defect with spatial attention offset module,

Q. Li, L. Wu, H. Xiao, and C. Huang, “Pcb-detr: A detection network of pcb surface defect with spatial attention offset module,”IEEE Access, vol. 12, pp. 158 436–158 445, 2024

work page 2024
[36]

Multi-granularity relation enhancement network for tiny defect detection on printed circuit board,

F. Guo, Z. Chen, B. Chen, M. Jing, and L. Zuo, “Multi-granularity relation enhancement network for tiny defect detection on printed circuit board,”IEEE Trans. Instrum. Meas., vol. 74, pp. 1–11, 2025

work page 2025
[37]

Dynamic head: Unifying object detection heads with attentions,

X. Dai, Y . Chen, B. Xiao, D. Chen, M. Liu, L. Yuan, and L. Zhang, “Dynamic head: Unifying object detection heads with attentions,” in Proc. CVPR, 2021, pp. 7373–7382

work page 2021
[38]

A pcb defect detector based on coordinate feature refinement,

J. Yang, Z. Liu, W. Du, and S. Zhang, “A pcb defect detector based on coordinate feature refinement,”IEEE Trans. Instrum. Meas., vol. 72, pp. 1–10, 2023

work page 2023
[39]

Mrc-detr: An adaptive multi-residual coupled transformer for bare board pcb defect detection,

J. Cao, H. Wu, X. Zhang, L. Tan, and H. Zhang, “Mrc-detr: An adaptive multi-residual coupled transformer for bare board pcb defect detection,” arXiv preprint arXiv:2507.03386, 2025

work page arXiv 2025
[40]

Ms-detr: a real-time multi- scale detection transformer for pcb defect detection,

L. Ji, C. Huang, H. Li, W. Han, and L. Yi, “Ms-detr: a real-time multi- scale detection transformer for pcb defect detection,”Signal Image Video Process., vol. 19, no. 3, p. 203, 2025

work page 2025
[41]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProc. ICCV, 2023, pp. 3836–3847

work page 2023
[42]

A computational approach to edge detection,

J. Canny, “A computational approach to edge detection,”IEEE Trans. Pattern Anal. Mach. Intell., no. 6, pp. 679–698, 2009

work page 2009
[43]

A threshold selection method from gray-level his- tograms,

N. Otsuet al., “A threshold selection method from gray-level his- tograms,”Automatica, vol. 11, no. 285-296, pp. 23–27, 1975

work page 1975
[44]

Depth anything v2,

L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,” inProc. NeurIPS, 2024, pp. 21 875–21 911

work page 2024
[45]

Ultralytics yolo11,

G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

work page 2024
[46]

Uni-controlnet: All-in-one control to text-to-image diffusion models,

S. Zhao, D. Chen, Y .-C. Chen, J. Bao, S. Hao, L. Yuan, and K.-Y . K. Wong, “Uni-controlnet: All-in-one control to text-to-image diffusion models,”Proc. NeurIPS, pp. 11 127–11 150, 2023

work page 2023
[47]

No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects,

R. Sunkara and T. Luo, “No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects,” inProc. ECML-PKDD, 2022, pp. 443–459

work page 2022
[48]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProc. AAAI, vol. 32, no. 1, 2018

work page 2018
[49]

Diffusion models beat gans on image synthesis,

P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” inProc. NeurIPS, 2021, pp. 8780–8794

work page 2021
[50]

Shiftwiseconv: Small convolutional kernel with large kernel effect,

D. Li, L. Li, Z. Chen, and J. Li, “Shiftwiseconv: Small convolutional kernel with large kernel effect,” inProc. CVPR, 2025, pp. 25 281–25 291

work page 2025
[51]

XCiT: Cross-covariance image transformers,

A. Ali, H. Touvron, M. Caron, P. Bojanowski, M. Douze, A. Joulin, I. Laptev, N. Neverova, G. Synnaeve, J. Verbeek, and H. J ´egou, “XCiT: Cross-covariance image transformers,” inProc. NeurIPS, 2021, pp. 20 014–20 027

work page 2021
[52]

A dataset for deep learning based detection of printed circuit board surface defect,

S. Lv, B. Ouyang, Z. Deng, T. Liang, S. Jiang, K. Zhang, J. Chen, and Z. Li, “A dataset for deep learning based detection of printed circuit board surface defect,”Sci. Data, vol. 11, no. 1, p. 811, 2024

work page 2024
[53]

Hripcb: a challenging dataset for pcb defects detection and classification,

W. Huang, P. Wei, M. Zhang, and H. Liu, “Hripcb: a challenging dataset for pcb defects detection and classification,”The Journal of Engineering, vol. 2020, no. 13, pp. 303–309, 2020

work page 2020
[54]

Online pcb defect detector on a new pcb defect dataset,

S. Tang, F. He, X. Huang, and J. Yang, “Online pcb defect detector on a new pcb defect dataset,”arXiv preprint arXiv:1902.06197, 2019

work page arXiv 1902
[55]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” inProc. NeurIPS, 2017

work page 2017
[56]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. CVPR, 2018, pp. 586–595

work page 2018
[57]

Image quality assessment: From error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[58]

Deformable detr: Deformable transformers for end-to-end object detection,

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” inProc. ICLR, 2021

work page 2021
[59]

Dab-detr: Dynamic anchor boxes are better queries for detr,

S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang, “Dab-detr: Dynamic anchor boxes are better queries for detr,” inProc. ICLR, 2022

work page 2022
[60]

Dino: Detr with improved denoising anchor boxes for end-to- end object detection,

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,” inProc. ICLR, 2023

work page 2023
[61]

D-fine: Redefine regression task of detrs as fine-grained distribution refinement,

Y . Peng, H. Li, P. Wu, Y . Zhang, X. Sun, and F. Wu, “D-fine: Redefine regression task of detrs as fine-grained distribution refinement,” inProc. ICLR, 2025

work page 2025
[62]

Deim: Detr with improved matching for fast convergence,

S. Huang, Z. Lu, X. Cun, Y . Yu, X. Zhou, and X. Shen, “Deim: Detr with improved matching for fast convergence,” inProc. CVPR, 2025, pp. 15 162–15 171

work page 2025
[63]

Ultralytics yolov8,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics

work page 2023
[64]

Yolov10: Real-time end-to-end object detection,

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “Yolov10: Real-time end-to-end object detection,” inProc. NeurIPS, 2024, pp. 107 984–108 011

work page 2024
[65]

Anycontrol: create your artwork with versatile control on text-to-image generation,

Y . Sun, Y . Liu, Y . Tang, W. Pei, and K. Chen, “Anycontrol: create your artwork with versatile control on text-to-image generation,” inProc. ECCV, 2024, pp. 92–109

work page 2024
[66]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProc. CVPR, 2018, pp. 7132–7141

work page 2018
[67]

Coordinate attention for efficient mobile network design,

Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” inProc. CVPR, 2021, pp. 13 713–13 722

work page 2021
[68]

Simam: A simple, parameter- free attention module for convolutional neural networks,

L. Yang, R.-Y . Zhang, L. Li, and X. Xie, “Simam: A simple, parameter- free attention module for convolutional neural networks,” inProc. ICML, 2021, pp. 11 863–11 874

work page 2021
[69]

Efficient multi-scale attention module with cross-spatial learning,

D. Ouyang, S. He, G. Zhang, M. Luo, H. Guo, J. Zhan, and Z. Huang, “Efficient multi-scale attention module with cross-spatial learning,” in Proc. ICASSP, 2023, pp. 1–5

work page 2023