pith. machine review for the scientific record. sign in

arxiv: 2605.04635 · v2 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords PCB defect detectiondefect generationdiffusion modelsmulti-modal conditionsattention mechanismsdata augmentationIIoT inspectioncomputer vision
0
0 comments X

The pith

A joint generation-detection framework for PCBs uses multi-modal synthesis to augment scarce defect data and reach 98.0 percent mAP@0.5.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that combining controlled defect synthesis with a specialized detector solves the twin problems of scarce imbalanced samples and weak feature representation in PCB inspection. A multi-modal generator extracts edge, depth, and text conditions in parallel, embeds them at multiple scales, and modulates them to produce structurally aligned defects that augment the training set. The detector then applies shift-wise attention for global-local context and gated cross-level fusion to handle complex backgrounds. This integrated pipeline yields higher detection accuracy than prior separate approaches while also improving generation quality over existing conditional methods. A sympathetic reader would care because industrial IIoT systems often lack enough real defects to train reliable models.

Core claim

The authors establish that a generation-assisted detection framework, with a Multi-modal Condition Generator feeding a ScaleEncoder and FiLM-style Condition Modulation for synthesis, plus an Inverted Residual Shift Attention and Cross-level Complementary Fusion Block for detection, jointly overcomes data scarcity and representation limits, delivering mAP@0.5 of 98.0 percent and mAP@0.5:0.95 of 61.8 percent on DsPCBSD+ while the generator reaches FID 129.61 and SSIM 0.619.

What carries the argument

The Multi-modal Condition Generator with ScaleEncoder and Condition Modulation that synthesizes aligned defects from parallel edge-depth-text inputs, paired with the detector's Inverted Residual Shift Attention and Cross-level Complementary Fusion Block that fuses global context and local texture via shift convolution and pixel-level gates.

If this is right

  • Synthesized defects directly enrich the scarce IIoT training set, so gains in generation quality translate into higher detection mAP.
  • The multi-modal conditioning enables structurally aligned samples that help the detector handle complex circuit backgrounds better than single-condition methods.
  • The joint pipeline outperforms all compared detection and generation baselines on the DsPCBSD+ benchmark.
  • The IIoT pipeline supports real-time inspection by addressing both data volume and feature extraction challenges in one system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-modal conditioning strategy could be tested on other industrial inspection tasks where defect samples are rare, such as weld or fabric defect detection.
  • Ablating the generation branch would show whether the attention and fusion blocks alone deliver part of the accuracy gain even without extra data.
  • If domain shift between generated and real defects proves larger than reported, the framework might need additional adaptation steps for new PCB manufacturing lines.

Load-bearing premise

The synthesized defect samples must be realistic enough and distributionally close enough to real IIoT PCB images that adding them to the training set improves detection accuracy on actual data rather than introducing harmful artifacts or shift.

What would settle it

Training the detector on real samples alone versus real samples plus the generated ones and measuring mAP on a held-out set of real PCB defects; if the augmented version shows no gain or a drop, the core benefit of generation assistance is refuted.

Figures

Figures reproduced from arXiv: 2605.04635 by Huanqi Wu, Huan Zhang, Jiangzhong Cao, Lianghong Tan, Linwei Zhu, Xu Zhang, Yichu Xu.

Figure 1
Figure 1. Figure 1: Comparison between UniPCB and previous approaches for PCB view at source ↗
Figure 1
Figure 1. Figure 1: Comparison between UniPCB and previous approaches for PCB [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed defect generation framework. Real PCB images captured by an AOI machine are processed by the Multi-modal Condition view at source ↗
Figure 3
Figure 3. Figure 3: (a) Structure of the ScaleEncoder, which encodes the condition map via view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the proposed detection framework. IRSA Blocks replace BasicBlocks in all four stages of the ResNet-18 backbone. Yellow and blue view at source ↗
Figure 5
Figure 5. Figure 5: (a) Overview of the Inverted Residual Shift Attention Block. (b) view at source ↗
Figure 7
Figure 7. Figure 7: Representative annotated samples from the six defect categories, view at source ↗
Figure 6
Figure 6. Figure 6: (a) Overview of the Cross-level Complementary Fusion (CLCF) Block. view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of detection results across different models. Zoom-in for best view. view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of generation results across different models. For better [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of generation results across different models. For better view at source ↗
Figure 10
Figure 10. Figure 10: Examples of synthesized defect images with bounding-box annota view at source ↗
read the original abstract

In the Industrial Internet of Things (IIoT), enabling intelligent, real-time Printed Circuit Board (PCB) defect inspection is critical for ensuring product reliability. However, existing IIoT-based visual inspection systems face two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework that integrates controlled defect synthesis with task-specific defect detection within an IIoT-enabled pipeline. On the generation side, a Multi-modal Condition Generator extracts complementary edge, depth, and text conditions in parallel. A ScaleEncoder then embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation at each scale, enabling structurally aligned and defect-aware sample synthesis to augment the scarce IIoT dataset. On the detection side, an Inverted Residual Shift Attention couples self-attention with shift-wise convolution to jointly capture global context and local texture, and a Cross-level Complementary Fusion Block generates pixel-level gates for selective cross-level feature fusion. The synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection. Extensive experiments on DsPCBSD+ demonstrate that UniPCB achieves mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% on defect detection, surpassing all compared methods, while the generation branch attains an FID of 129.61 and SSIM of 0.619, outperforming existing conditional generation approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes UniPCB, a generation-assisted framework for PCB defect inspection that integrates a Multi-modal Condition Generator (using parallel edge, depth, and text conditions fed via ScaleEncoder and Condition Modulation into a diffusion U-Net) for synthesizing defect samples to augment scarce IIoT data, with a detection network employing Inverted Residual Shift Attention and Cross-level Complementary Fusion for improved feature representation. On the DsPCBSD+ dataset, it reports mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% for detection (surpassing compared methods) alongside generation metrics of FID 129.61 and SSIM 0.619 (outperforming existing conditional generators), claiming that synthesized samples directly enrich training and compound with architectural improvements.

Significance. If the central claim holds, the work offers a practical pipeline for addressing data imbalance in industrial PCB inspection by jointly optimizing synthesis and detection, which could improve real-world IIoT reliability. The explicit reporting of both generation quality metrics and end-task mAP provides a basis for comparison, and the multi-modal conditioning approach is a concrete technical contribution. However, the lack of isolating experiments limits attribution of gains.

major comments (3)
  1. [Abstract / Experiments] Abstract and experiments section: The headline claim that 'the synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection' is load-bearing for the generation-assisted framing, yet no ablation is described that trains the detection branch (Inverted Residual Shift Attention + Cross-level Complementary Fusion) on real DsPCBSD+ data only versus real + generated samples. Without this, the mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% cannot be attributed to the Multi-modal Condition Generator rather than the detection modules alone.
  2. [Abstract] Abstract: The reported mAP and generation metrics are presented without error bars, statistical significance tests (e.g., paired t-tests across runs), details on train/validation/test splits, or full baseline re-implementation protocols. This makes it impossible to assess whether the gains over compared methods are robust or sensitive to implementation choices.
  3. [Abstract / Methods] Generation branch description: The Multi-modal Condition Generator is said to produce 'structurally aligned and defect-aware' samples, but the abstract provides no quantitative measure of distributional alignment (e.g., feature-space distance to real defects) or qualitative failure cases, leaving the weakest assumption—that the FID 129.61 / SSIM 0.619 outputs avoid harmful domain shift—unverified.
minor comments (2)
  1. [Abstract] The abstract uses 'DsPCBSD+' without defining the dataset or citing its source; this should be clarified with a reference or brief description in the main text.
  2. [Methods] Notation for the ScaleEncoder and Condition Modulation (FiLM-style) is introduced without equations; adding a short mathematical formulation would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental validation and reporting that we will address to strengthen the paper. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [Abstract / Experiments] The headline claim that 'the synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection' is load-bearing for the generation-assisted framing, yet no ablation is described that trains the detection branch (Inverted Residual Shift Attention + Cross-level Complementary Fusion) on real DsPCBSD+ data only versus real + generated samples. Without this, the mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% cannot be attributed to the Multi-modal Condition Generator rather than the detection modules alone.

    Authors: We agree that an explicit ablation isolating the contribution of the synthesized samples is necessary to substantiate the generation-assisted claim. In the revised manuscript, we will add this ablation: training the full detection network (Inverted Residual Shift Attention + Cross-level Complementary Fusion) on real DsPCBSD+ data only, and comparing it directly to training on the combined real + generated set under identical hyperparameters and splits. This will quantify the mAP gains attributable to the Multi-modal Condition Generator. revision: yes

  2. Referee: [Abstract] The reported mAP and generation metrics are presented without error bars, statistical significance tests (e.g., paired t-tests across runs), details on train/validation/test splits, or full baseline re-implementation protocols. This makes it impossible to assess whether the gains over compared methods are robust or sensitive to implementation choices.

    Authors: We will revise the experiments section to report mean mAP values with standard deviations across multiple independent runs (e.g., 5 seeds), include paired t-tests or equivalent significance tests against baselines, explicitly state the train/validation/test split ratios and sampling strategy on DsPCBSD+, and provide complete re-implementation details (hyperparameters, data augmentation, and training schedules) for all compared methods to enable robust assessment of the gains. revision: yes

  3. Referee: [Abstract / Methods] Generation branch description: The Multi-modal Condition Generator is said to produce 'structurally aligned and defect-aware' samples, but the abstract provides no quantitative measure of distributional alignment (e.g., feature-space distance to real defects) or qualitative failure cases, leaving the weakest assumption—that the FID 129.61 / SSIM 0.619 outputs avoid harmful domain shift—unverified.

    Authors: FID and SSIM are established metrics for generation fidelity and structural similarity. To further verify distributional alignment and absence of harmful domain shift, the revised version will add quantitative analysis (e.g., average feature-space L2 distances using embeddings from a pre-trained ResNet on real vs. generated defect patches) and a dedicated qualitative section showing representative success cases alongside any observed failure modes (e.g., over-generated artifacts or misalignment). revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework with external validation

full rationale

The paper presents an architectural framework (Multi-modal Condition Generator with ScaleEncoder and Condition Modulation; Inverted Residual Shift Attention and Cross-level Complementary Fusion) whose performance claims rest on empirical metrics (mAP@0.5 98.0%, mAP@0.5:0.95 61.8%, FID 129.61, SSIM 0.619) measured on the external DsPCBSD+ dataset. No equations, derivations, or first-principles results are described that reduce by construction to fitted inputs, self-citations, or renamed patterns. The generation-assisted claim is presented as an empirical outcome rather than a tautological restatement of training objectives, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the framework rests on standard deep-learning assumptions about diffusion models and attention rather than novel axioms, but many implementation details (loss balancing, conditioning strength, training schedules) remain unspecified.

axioms (2)
  • domain assumption Multi-modal conditions (edge, depth, text) can be extracted in parallel and embedded to produce structurally aligned defect images
    Invoked in the generation branch description
  • domain assumption Coupling self-attention with shift-wise convolution and cross-level gating improves feature representation under complex circuit backgrounds
    Invoked in the detection branch description

pith-pipeline@v0.9.0 · 5629 in / 1568 out tokens · 62189 ms · 2026-05-12T00:46:45.340438+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

  1. [1]

    Review of vision- based defect detection research and its perspectives for printed circuit board,

    Y . Zhou, M. Yuan, J. Zhang, G. Ding, and S. Qin, “Review of vision- based defect detection research and its perspectives for printed circuit board,”J. Manuf. Syst., vol. 70, pp. 557–578, 2023

  2. [2]

    A comprehensive review of research on surface defect detection of pcbs based on machine vision,

    Z. He, Y . Lian, Y . Wang, and Z. Lu, “A comprehensive review of research on surface defect detection of pcbs based on machine vision,”Results Eng., p. 106437, 2025

  3. [3]

    Auto-encoding variational bayes,

    D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in Proc. ICLR, 2014

  4. [4]

    Neural discrete representation learning,

    A. Van Den Oord, O. Vinyalset al., “Neural discrete representation learning,”Proc. NeurIPS, 2017

  5. [5]

    Generative adversarial nets,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Proc. NeurIPS, 2014

  6. [6]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Proc. NeurIPS, pp. 6840–6851, 2020

  7. [7]

    Denoising diffusion implicit models,

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inProc. ICLR, 2021

  8. [8]

    Score-based generative modeling through stochastic differ- ential equations,

    Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,” inProc. ICLR, 2021

  9. [9]

    Uniuir: Considering underwater image restoration as an all-in-one learner,

    X. Zhang, H. Zhang, G. Wang, Q. Zhang, L. Zhang, and B. Du, “Uniuir: Considering underwater image restoration as an all-in-one learner,”IEEE Trans. Image Process., vol. 34, pp. 6963–6977, 2025

  10. [10]

    High- resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProc. CVPR, 2022, pp. 10 684–10 695

  11. [11]

    Any2rsi: Controllable remote sens- ing text-to-image generation via any control and enriched description,

    X. Zhang, J. Huang, and L. Zhang, “Any2rsi: Controllable remote sens- ing text-to-image generation via any control and enriched description,” inProc. AAAI, 2026, pp. 12 852–12 860

  12. [12]

    Diffusion models in vision: A survey,

    F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 10 850–10 869, 2023

  13. [13]

    A survey on surface defect inspection based on generative models in manufacturing,

    Y . He, S. Li, X. Wen, and J. Xu, “A survey on surface defect inspection based on generative models in manufacturing,”Appl. Sci., vol. 14, no. 15, p. 6774, 2024

  14. [14]

    End-to-end object detection with transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” inProc. ECCV, 2020, pp. 213–229

  15. [15]

    Detrs beat yolos on real-time object detection,

    Y . Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y . Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” inProc. CVPR, 2024, pp. 16 965–16 974

  16. [16]

    Efficientdet: Scalable and efficient object detection,

    M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient object detection,” inProc. CVPR, 2020, pp. 10 781–10 790

  17. [17]

    Research on pcb defect detection using artificial intelligence: a systematic mapping study,

    D. I. Ural and A. Sezen, “Research on pcb defect detection using artificial intelligence: a systematic mapping study,”Evol. Intell., vol. 17, no. 5, pp. 3101–3111, 2024

  18. [18]

    A new contrastive gan with data augmentation for surface defect recognition under limited data,

    Z. Du, L. Gao, and X. Li, “A new contrastive gan with data augmentation for surface defect recognition under limited data,”IEEE Trans. Instrum. Meas., vol. 72, pp. 1–13, 2022

  19. [19]

    Few-shot defect image generation via defect-aware feature manipulation,

    Y . Duan, Y . Hong, L. Niu, and L. Zhang, “Few-shot defect image generation via defect-aware feature manipulation,” inProc. AAAI, 2023, pp. 571–578

  20. [20]

    A stable diffusion enhanced yolov5 model for metal stamped part defect detection based on improved network structure,

    Y . Liang, S. Feng, Y . Zhang, F. Xue, F. Shen, and J. Guo, “A stable diffusion enhanced yolov5 model for metal stamped part defect detection based on improved network structure,”J. Manuf. Process., vol. 111, pp. 21–31, 2024

  21. [21]

    Self-attention generative adversarial networks,

    H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” inProc. ICML, 2019, pp. 7354–7363

  22. [22]

    Semantic image synthesis with spatially-adaptive normalization,

    T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu, “Semantic image synthesis with spatially-adaptive normalization,” inProc. CVPR, 2019, pp. 2337–2346

  23. [23]

    Effective data augmentation with diffusion models,

    B. Trabucco, K. Doherty, M. Gurinas, and R. Salakhutdinov, “Effective data augmentation with diffusion models,” inProc. ICLR, 2024

  24. [24]

    Anomalydiffusion: Few-shot anomaly image generation with diffusion model,

    T. Hu, J. Zhang, R. Yi, Y . Du, X. Chen, L. Liu, Y . Wang, and C. Wang, “Anomalydiffusion: Few-shot anomaly image generation with diffusion model,” inProc. AAAI, 2024, pp. 8526–8534

  25. [25]

    Ddmf: a pcb surface defect detection model based on conditional denoising diffusion and multiscale feature fusion,

    W. Deng, L. Yan, and C. Wang, “Ddmf: a pcb surface defect detection model based on conditional denoising diffusion and multiscale feature fusion,”J. Supercomput., vol. 81, no. 15, pp. 1–33, 2025

  26. [26]

    T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,

    C. Mou, X. Wang, L. Xie, Y . Wu, J. Zhang, Z. Qi, Y . Shan, and X. Qie, “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,” inProc. AAAI, 2024, pp. 4296–4304

  27. [27]

    Unicontrol: A unified diffusion model for controllable visual generation in the wild,

    C. Qin, S. Bai, Y . Shen, L. Chen, B. Ni, J. Liu, Y . Liu, and X. Liu, “Unicontrol: A unified diffusion model for controllable visual generation in the wild,” inProc. NeurIPS, 2023

  28. [28]

    Circuit board welding defect detection based on industrial iovt,

    C. Zhang, G. Shi, H. Li, M. Yang, Y . Li, Z. Bing, W. Chen, Z. Wang, F. Yu, and V . C. M. Leung, “Circuit board welding defect detection based on industrial iovt,”IEEE Internet Things J., vol. 13, no. 7, pp. 14 003–14 018, 2026

  29. [29]

    Yolo-hmc: An improved method for pcb surface defect detection,

    M. Yuan, Y . Zhou, X. Ren, H. Zhi, J. Zhang, and H. Chen, “Yolo-hmc: An improved method for pcb surface defect detection,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–11, 2024

  30. [30]

    Reliable and lightweight adaptive convolution network for pcb surface defect detection,

    L. Lei, H.-X. Li, and H.-D. Yang, “Reliable and lightweight adaptive convolution network for pcb surface defect detection,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–8, 2024

  31. [31]

    An adaptive defect-aware attention network for accurate pcb- defect detection,

    X. Liu, “An adaptive defect-aware attention network for accurate pcb- defect detection,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–11, 2024

  32. [32]

    Internimage: Exploring large- scale vision foundation models with deformable convolutions,

    W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, X. Wang, and Y . Qiao, “Internimage: Exploring large- scale vision foundation models with deformable convolutions,” inProc. CVPR, 2023, pp. 14 408–14 419. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

  33. [33]

    Sgt-yolo: A lightweight method for pcb defect detection,

    C. Mo, Z. Hu, J. Wang, and X. Xiao, “Sgt-yolo: A lightweight method for pcb defect detection,”IEEE Trans. Instrum. Meas., vol. 74, pp. 1–11, 2025

  34. [34]

    Refined defect detector with deformable transformer and pyramid feature fusion for pcb detection,

    T. Liu, G.-Z. Cao, Z. He, and S. Xie, “Refined defect detector with deformable transformer and pyramid feature fusion for pcb detection,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–11, 2023

  35. [35]

    Pcb-detr: A detection network of pcb surface defect with spatial attention offset module,

    Q. Li, L. Wu, H. Xiao, and C. Huang, “Pcb-detr: A detection network of pcb surface defect with spatial attention offset module,”IEEE Access, vol. 12, pp. 158 436–158 445, 2024

  36. [36]

    Multi-granularity relation enhancement network for tiny defect detection on printed circuit board,

    F. Guo, Z. Chen, B. Chen, M. Jing, and L. Zuo, “Multi-granularity relation enhancement network for tiny defect detection on printed circuit board,”IEEE Trans. Instrum. Meas., vol. 74, pp. 1–11, 2025

  37. [37]

    Dynamic head: Unifying object detection heads with attentions,

    X. Dai, Y . Chen, B. Xiao, D. Chen, M. Liu, L. Yuan, and L. Zhang, “Dynamic head: Unifying object detection heads with attentions,” in Proc. CVPR, 2021, pp. 7373–7382

  38. [38]

    A pcb defect detector based on coordinate feature refinement,

    J. Yang, Z. Liu, W. Du, and S. Zhang, “A pcb defect detector based on coordinate feature refinement,”IEEE Trans. Instrum. Meas., vol. 72, pp. 1–10, 2023

  39. [39]

    Mrc-detr: An adaptive multi-residual coupled transformer for bare board pcb defect detection,

    J. Cao, H. Wu, X. Zhang, L. Tan, and H. Zhang, “Mrc-detr: An adaptive multi-residual coupled transformer for bare board pcb defect detection,” arXiv preprint arXiv:2507.03386, 2025

  40. [40]

    Ms-detr: a real-time multi- scale detection transformer for pcb defect detection,

    L. Ji, C. Huang, H. Li, W. Han, and L. Yi, “Ms-detr: a real-time multi- scale detection transformer for pcb defect detection,”Signal Image Video Process., vol. 19, no. 3, p. 203, 2025

  41. [41]

    Adding conditional control to text-to-image diffusion models,

    L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProc. ICCV, 2023, pp. 3836–3847

  42. [42]

    A computational approach to edge detection,

    J. Canny, “A computational approach to edge detection,”IEEE Trans. Pattern Anal. Mach. Intell., no. 6, pp. 679–698, 2009

  43. [43]

    A threshold selection method from gray-level his- tograms,

    N. Otsuet al., “A threshold selection method from gray-level his- tograms,”Automatica, vol. 11, no. 285-296, pp. 23–27, 1975

  44. [44]

    Depth anything v2,

    L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,” inProc. NeurIPS, 2024, pp. 21 875–21 911

  45. [45]

    Ultralytics yolo11,

    G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

  46. [46]

    Uni-controlnet: All-in-one control to text-to-image diffusion models,

    S. Zhao, D. Chen, Y .-C. Chen, J. Bao, S. Hao, L. Yuan, and K.-Y . K. Wong, “Uni-controlnet: All-in-one control to text-to-image diffusion models,”Proc. NeurIPS, pp. 11 127–11 150, 2023

  47. [47]

    No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects,

    R. Sunkara and T. Luo, “No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects,” inProc. ECML-PKDD, 2022, pp. 443–459

  48. [48]

    Film: Visual reasoning with a general conditioning layer,

    E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProc. AAAI, vol. 32, no. 1, 2018

  49. [49]

    Diffusion models beat gans on image synthesis,

    P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” inProc. NeurIPS, 2021, pp. 8780–8794

  50. [50]

    Shiftwiseconv: Small convolutional kernel with large kernel effect,

    D. Li, L. Li, Z. Chen, and J. Li, “Shiftwiseconv: Small convolutional kernel with large kernel effect,” inProc. CVPR, 2025, pp. 25 281–25 291

  51. [51]

    XCiT: Cross-covariance image transformers,

    A. Ali, H. Touvron, M. Caron, P. Bojanowski, M. Douze, A. Joulin, I. Laptev, N. Neverova, G. Synnaeve, J. Verbeek, and H. J ´egou, “XCiT: Cross-covariance image transformers,” inProc. NeurIPS, 2021, pp. 20 014–20 027

  52. [52]

    A dataset for deep learning based detection of printed circuit board surface defect,

    S. Lv, B. Ouyang, Z. Deng, T. Liang, S. Jiang, K. Zhang, J. Chen, and Z. Li, “A dataset for deep learning based detection of printed circuit board surface defect,”Sci. Data, vol. 11, no. 1, p. 811, 2024

  53. [53]

    Hripcb: a challenging dataset for pcb defects detection and classification,

    W. Huang, P. Wei, M. Zhang, and H. Liu, “Hripcb: a challenging dataset for pcb defects detection and classification,”The Journal of Engineering, vol. 2020, no. 13, pp. 303–309, 2020

  54. [54]

    Online pcb defect detector on a new pcb defect dataset,

    S. Tang, F. He, X. Huang, and J. Yang, “Online pcb defect detector on a new pcb defect dataset,”arXiv preprint arXiv:1902.06197, 2019

  55. [55]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium,

    M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” inProc. NeurIPS, 2017

  56. [56]

    The unreasonable effectiveness of deep features as a perceptual metric,

    R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. CVPR, 2018, pp. 586–595

  57. [57]

    Image quality assessment: From error visibility to structural similarity,

    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004

  58. [58]

    Deformable detr: Deformable transformers for end-to-end object detection,

    X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” inProc. ICLR, 2021

  59. [59]

    Dab-detr: Dynamic anchor boxes are better queries for detr,

    S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang, “Dab-detr: Dynamic anchor boxes are better queries for detr,” inProc. ICLR, 2022

  60. [60]

    Dino: Detr with improved denoising anchor boxes for end-to- end object detection,

    H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,” inProc. ICLR, 2023

  61. [61]

    D-fine: Redefine regression task of detrs as fine-grained distribution refinement,

    Y . Peng, H. Li, P. Wu, Y . Zhang, X. Sun, and F. Wu, “D-fine: Redefine regression task of detrs as fine-grained distribution refinement,” inProc. ICLR, 2025

  62. [62]

    Deim: Detr with improved matching for fast convergence,

    S. Huang, Z. Lu, X. Cun, Y . Yu, X. Zhou, and X. Shen, “Deim: Detr with improved matching for fast convergence,” inProc. CVPR, 2025, pp. 15 162–15 171

  63. [63]

    Ultralytics yolov8,

    G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics

  64. [64]

    Yolov10: Real-time end-to-end object detection,

    A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “Yolov10: Real-time end-to-end object detection,” inProc. NeurIPS, 2024, pp. 107 984–108 011

  65. [65]

    Anycontrol: create your artwork with versatile control on text-to-image generation,

    Y . Sun, Y . Liu, Y . Tang, W. Pei, and K. Chen, “Anycontrol: create your artwork with versatile control on text-to-image generation,” inProc. ECCV, 2024, pp. 92–109

  66. [66]

    Squeeze-and-excitation networks,

    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProc. CVPR, 2018, pp. 7132–7141

  67. [67]

    Coordinate attention for efficient mobile network design,

    Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” inProc. CVPR, 2021, pp. 13 713–13 722

  68. [68]

    Simam: A simple, parameter- free attention module for convolutional neural networks,

    L. Yang, R.-Y . Zhang, L. Li, and X. Xie, “Simam: A simple, parameter- free attention module for convolutional neural networks,” inProc. ICML, 2021, pp. 11 863–11 874

  69. [69]

    Efficient multi-scale attention module with cross-spatial learning,

    D. Ouyang, S. He, G. Zhang, M. Luo, H. Guo, J. Zhan, and Z. Huang, “Efficient multi-scale attention module with cross-spatial learning,” in Proc. ICASSP, 2023, pp. 1–5