Pattern-Enhanced RT-DETR for Multi-Class Battery Detection

Enyuan Hu; Xu Zhong

arxiv: 2605.13670 · v1 · pith:LP3PZTMInew · submitted 2026-05-13 · 💻 cs.CV

Pattern-Enhanced RT-DETR for Multi-Class Battery Detection

Xu Zhong , Enyuan Hu This is my paper

Pith reviewed 2026-05-14 20:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords battery detectionobject detectionRT-DETRtransformer detectorquery generationmulti-class detectioncomputer vision

0 comments

The pith

PaQ-RT-DETR adds pattern-based dynamic query generation to RT-DETR and raises multi-class battery detection mAP@50 to 0.782.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks five existing detectors on roughly 8,591 battery images and introduces PaQ-RT-DETR, which inserts pattern-based dynamic query generation into the RT-DETR transformer pipeline. This change targets query activation imbalance and delivers the top score of 0.782 mAP@50, a 2.8 percent lift over the unmodified RT-DETR-X model, with gains across all six battery classes including the scarcest one. A reader would care because accurate, fast battery detection supports recycling, quality control, and sorting lines, and the modification adds negligible compute while improving an already competitive transformer detector.

Core claim

PaQ-RT-DETR introduces pattern-based dynamic query generation into RT-DETR to alleviate query activation imbalance. On a public dataset of approximately 8,591 annotated images, the resulting PaQ-RT-DETR-X model records an overall mAP@50 of 0.782, exceeding the base RT-DETR-X by 2.8 percent and showing consistent per-class improvements across all six battery categories, including the data-scarce Bike Battery class.

What carries the argument

pattern-based dynamic query generation, which modifies the query initialization step inside RT-DETR to reduce activation imbalance across object classes.

If this is right

PaQ-RT-DETR-X outperforms the strongest CNN baseline YOLO11n while retaining transformer advantages on the same dataset.
Per-class gains hold for every battery type, including those with few training examples.
The added pattern mechanism incurs only negligible computational overhead.
The benchmark supplies direct model-selection guidance for industrial battery-sorting pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same query-pattern idea could be tested on other imbalanced object-detection tasks such as defect inspection or rare-part counting.
If the pattern mechanism proves robust across datasets, it offers a lightweight upgrade path for other DETR-family detectors.
The consistent lift on the scarcest class suggests the method may help any detection setting where some categories appear infrequently.

Load-bearing premise

Introducing pattern-based dynamic query generation reliably reduces query activation imbalance and produces the observed accuracy gains without hidden dataset-specific artifacts or extra costs.

What would settle it

Re-running the exact training and evaluation protocol on a new battery image collection of similar size and class distribution and checking whether the 2.8 percent mAP@50 gain over baseline RT-DETR-X reappears.

Figures

Figures reproduced from arXiv: 2605.13670 by Enyuan Hu, Xu Zhong.

**Figure 2.** Figure 2: Overview of PaQ-RT-DETR. The pattern-based dynamic query module replaces RT-DETR’s direct top- [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative results of PaQ-RT-DETR-L on the validation set, covering [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Accurate and efficient battery detection is increasingly important for applications in electronic waste recycling, industrial quality control, and automated sorting systems. In this paper, we present both a comprehensive benchmark and a novel method for multi-class battery detection. We systematically compare three CNN-based detectors (YOLOv8n, YOLOv8s, YOLO11n) and two transformer-based detectors (RT-DETR-L, RT-DETR-X) on a publicly available dataset of approximately 8,591 annotated images under identical experimental conditions, and further propose PaQ-RT-DETR, which introduces pattern-based dynamic query generation into RT-DETR to alleviate query activation imbalance with negligible computational overhead. Among baselines, YOLO11n achieves the best CNN-based accuracy (mAP@50: 0.779) at only 2.6M parameters, while YOLOv8n delivers the fastest inference at ~1,667 FPS. PaQ-RT-DETR-X achieves the highest overall mAP@50 of 0.782, surpassing RT-DETR-X by +2.8% with consistent per-class gains across all six battery categories including the data-scarce Bike Battery class. Our findings provide practical guidance for selecting object detection models in battery-related industrial applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper benchmarks detectors on battery images and adds a pattern tweak to RT-DETR for a 2.8% mAP lift, but lacks ablations to confirm the tweak is what drives the gain.

read the letter

The main takeaway is a straightforward benchmark of YOLO and RT-DETR models on an 8591-image battery dataset, plus a small modification to RT-DETR that produces the highest mAP@50 of 0.782. PaQ-RT-DETR-X beats the base RT-DETR-X by 2.8 points with gains across all six classes, including the low-data bike battery category. YOLO11n looks competitive on accuracy with only 2.6M parameters, and the work gives usable numbers for anyone choosing a model for recycling or sorting tasks. The new element is the pattern-based dynamic query generation meant to reduce activation imbalance with little added cost. The comparison under identical conditions is a plus and the per-class consistency is worth noting. The soft spot is the missing isolation of that query change. There is no ablation that swaps only the pattern generator while holding training schedule, augmentations, and seeds fixed, nor any before-and-after measure of query activation. Without those controls the reported delta could trace to other unstated differences rather than the claimed mechanism. The abstract also omits error bars and the precise formulation of the pattern generator. This is applied work aimed at industrial battery detection. Readers who need a quick model comparison on this domain will find the numbers helpful, but the paper does not move the broader DETR literature forward. I would send it to peer review and ask the authors to add the missing ablation and implementation details; the empirical core is clear enough to be worth referee time.

Referee Report

3 major / 2 minor

Summary. The paper benchmarks CNN-based (YOLOv8n/s, YOLO11n) and transformer-based (RT-DETR-L/X) detectors on an 8,591-image multi-class battery dataset under identical conditions, then introduces PaQ-RT-DETR by adding pattern-based dynamic query generation to RT-DETR to mitigate query activation imbalance. It reports PaQ-RT-DETR-X reaching the highest mAP@50 of 0.782 (+2.8% over RT-DETR-X) with consistent per-class gains, including on the data-scarce Bike Battery class, while claiming negligible computational overhead.

Significance. If the reported gains are shown to stem specifically from the pattern-based query mechanism rather than training variations, the work supplies a practical, low-overhead enhancement to RT-DETR for industrial battery detection tasks and offers useful model-selection guidance on a public dataset. The consistent per-class improvements and inclusion of a scarce class strengthen applicability claims.

major comments (3)

[Method] Method section: the exact formulation of the pattern generator (including how patterns are extracted, encoded, and injected into the query mechanism) is not provided, preventing reproduction and direct verification that the component alleviates activation imbalance as claimed.
[Experiments] Experiments section: no ablation isolates the pattern-based dynamic query generation while freezing all other hyperparameters, data splits, optimizer schedule, and random seeds; the +2.8% mAP@50 delta cannot be confidently attributed to the proposed mechanism rather than unstated training differences.
[Results] Results section: no quantitative metric (e.g., query activation histograms or imbalance scores) is reported before versus after the modification, leaving the central claim that the change alleviates activation imbalance unsupported by direct evidence.

minor comments (2)

[Abstract] Abstract and results tables omit error bars or standard deviations across multiple runs, making it difficult to assess whether the reported deltas exceed run-to-run variance.
[Experiments] Implementation details (exact learning-rate schedule, augmentation pipeline, and query initialization) are insufficiently specified to allow exact replication of the baseline RT-DETR-X versus PaQ-RT-DETR-X comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. We address each major comment point by point below and will revise the manuscript to enhance reproducibility, experimental rigor, and evidential support.

read point-by-point responses

Referee: [Method] Method section: the exact formulation of the pattern generator (including how patterns are extracted, encoded, and injected into the query mechanism) is not provided, preventing reproduction and direct verification that the component alleviates activation imbalance as claimed.

Authors: We agree that the Method section lacked the precise formulation needed for full reproducibility. In the revised manuscript we will add a dedicated subsection providing the exact mathematical formulation of the pattern generator. This will detail pattern extraction from encoder features, the encoding into dynamic pattern embeddings, and the precise injection into RT-DETR's query mechanism, enabling direct verification of the activation-imbalance mitigation. revision: yes
Referee: [Experiments] Experiments section: no ablation isolates the pattern-based dynamic query generation while freezing all other hyperparameters, data splits, optimizer schedule, and random seeds; the +2.8% mAP@50 delta cannot be confidently attributed to the proposed mechanism rather than unstated training differences.

Authors: We concur that a controlled ablation is required to isolate the contribution of the pattern-based query generation. We will run and report a new ablation study in the revised Experiments section, adding or removing only the pattern-based dynamic query component while strictly freezing all other factors (hyperparameters, data splits, optimizer schedule, and random seeds). The results will be presented alongside the original numbers to attribute the performance delta specifically to the proposed mechanism. revision: yes
Referee: [Results] Results section: no quantitative metric (e.g., query activation histograms or imbalance scores) is reported before versus after the modification, leaving the central claim that the change alleviates activation imbalance unsupported by direct evidence.

Authors: We acknowledge that direct quantitative evidence for query activation imbalance alleviation was not provided. In the revised Results section we will add query activation histograms together with imbalance metrics (e.g., activation variance and entropy) computed before and after the pattern-based modification. These visualizations and scores will directly support the claim that the proposed component reduces activation imbalance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains shown via held-out comparison, no self-referential derivations

full rationale

The paper introduces PaQ-RT-DETR by adding pattern-based dynamic query generation to RT-DETR and reports mAP@50 improvements (0.782 vs 0.754 for RT-DETR-X) through direct evaluation on a held-out portion of the 8,591-image dataset. No equations, fitted parameters, or first-principles derivations are supplied whose outputs reduce by construction to the inputs; the central claim rests on empirical metrics rather than any self-definitional loop, fitted-input prediction, or load-bearing self-citation. The method is presented as a practical modification with stated negligible overhead, and results are externally falsifiable via the reported per-class gains and baseline comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The claim rests on standard assumptions of deep learning training and the effectiveness of transformer query mechanisms; no explicit free parameters, axioms, or invented entities are detailed beyond the new query generation component.

pith-pipeline@v0.9.0 · 5518 in / 1020 out tokens · 50328 ms · 2026-05-14T20:33:14.460809+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

introduces pattern-based dynamic query generation into RT-DETR to alleviate query activation imbalance... qC_i = sum wD_ij · qP_j (convex combination of shared learnable patterns QP)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PaQ-RT-DETR-X achieves the highest overall mAP@50 of 0.782, surpassing RT-DETR-X by +2.8%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Lithium-ion battery recycling—overview of techniques and trends,

Z. J. Baum, R. E. Birdet al., “Lithium-ion battery recycling—overview of techniques and trends,”ACS Energy Letters, vol. 7, no. 2, pp. 712– 719, 2022

work page 2022
[2]

Ultralytics YOLOv8,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” https: //github.com/ultralytics/ultralytics, 2023

work page 2023
[3]

Ultralytics, “YOLO11,” https://github.com/ultralytics/ultralytics, 2024

work page 2024
[4]

End-to-end object detection with transform- ers,

N. Carion, F. Massaet al., “End-to-end object detection with transform- ers,” inECCV, 2020, pp. 213–229

work page 2020
[5]

DINO: DETR with improved denoising anchor boxes for end-to-end object detection,

H. Zhang, F. Liet al., “DINO: DETR with improved denoising anchor boxes for end-to-end object detection,” inICLR, 2023

work page 2023
[6]

Deformable DETR: Deformable transformers for end-to-end object detection,

X. Zhu, W. Suet al., “Deformable DETR: Deformable transformers for end-to-end object detection,” inICLR, 2021

work page 2021
[7]

DETRs beat YOLOs on real-time object detection,

Y . Zhao, W. Lvet al., “DETRs beat YOLOs on real-time object detection,” inCVPR, 2024, pp. 16 965–16 974

work page 2024
[8]

PaQ-DETR: Learning pattern and quality-aware dynamic queries for object detection,

Z. Kang, J. Zhuanget al., “PaQ-DETR: Learning pattern and quality-aware dynamic queries for object detection,”arXiv preprint arXiv:2603.06917, 2026

work page arXiv 2026
[9]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvalaet al., “You only look once: Unified, real-time object detection,” inCVPR, 2016, pp. 779–788

work page 2016
[10]

In-line sorting system with battery detection capabilities in E-waste using combination of X-ray transmission scanning and deep learning,

T. Ueda, S. Koyanaka, and T. Oki, “In-line sorting system with battery detection capabilities in E-waste using combination of X-ray transmission scanning and deep learning,”Resources, Conservation and Recycling, vol. 201, p. 107345, 2024

work page 2024
[11]

Towards automatic power battery detection: New challenge, benchmark dataset and baseline,

X. Zhao, Y . Panget al., “Towards automatic power battery detection: New challenge, benchmark dataset and baseline,” inCVPR, 2024, pp. 22 020–22 029

work page 2024
[12]

RecyBat24: A dataset for detecting lithium- ion batteries in electronic waste disposal,

X. C. Acaro Chac ´onet al., “RecyBat24: A dataset for detecting lithium- ion batteries in electronic waste disposal,”Scientific Data, vol. 12, no. 1, p. 843, 2025

work page 2025
[13]

Battery detection dataset,

Project TICS, “Battery detection dataset,” https://universe.roboflow.com/ project-tics-ylrlr/battery-detection-sszwf, 2024

work page 2024

[1] [1]

Lithium-ion battery recycling—overview of techniques and trends,

Z. J. Baum, R. E. Birdet al., “Lithium-ion battery recycling—overview of techniques and trends,”ACS Energy Letters, vol. 7, no. 2, pp. 712– 719, 2022

work page 2022

[2] [2]

Ultralytics YOLOv8,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” https: //github.com/ultralytics/ultralytics, 2023

work page 2023

[3] [3]

Ultralytics, “YOLO11,” https://github.com/ultralytics/ultralytics, 2024

work page 2024

[4] [4]

End-to-end object detection with transform- ers,

N. Carion, F. Massaet al., “End-to-end object detection with transform- ers,” inECCV, 2020, pp. 213–229

work page 2020

[5] [5]

DINO: DETR with improved denoising anchor boxes for end-to-end object detection,

H. Zhang, F. Liet al., “DINO: DETR with improved denoising anchor boxes for end-to-end object detection,” inICLR, 2023

work page 2023

[6] [6]

Deformable DETR: Deformable transformers for end-to-end object detection,

X. Zhu, W. Suet al., “Deformable DETR: Deformable transformers for end-to-end object detection,” inICLR, 2021

work page 2021

[7] [7]

DETRs beat YOLOs on real-time object detection,

Y . Zhao, W. Lvet al., “DETRs beat YOLOs on real-time object detection,” inCVPR, 2024, pp. 16 965–16 974

work page 2024

[8] [8]

PaQ-DETR: Learning pattern and quality-aware dynamic queries for object detection,

Z. Kang, J. Zhuanget al., “PaQ-DETR: Learning pattern and quality-aware dynamic queries for object detection,”arXiv preprint arXiv:2603.06917, 2026

work page arXiv 2026

[9] [9]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvalaet al., “You only look once: Unified, real-time object detection,” inCVPR, 2016, pp. 779–788

work page 2016

[10] [10]

In-line sorting system with battery detection capabilities in E-waste using combination of X-ray transmission scanning and deep learning,

T. Ueda, S. Koyanaka, and T. Oki, “In-line sorting system with battery detection capabilities in E-waste using combination of X-ray transmission scanning and deep learning,”Resources, Conservation and Recycling, vol. 201, p. 107345, 2024

work page 2024

[11] [11]

Towards automatic power battery detection: New challenge, benchmark dataset and baseline,

X. Zhao, Y . Panget al., “Towards automatic power battery detection: New challenge, benchmark dataset and baseline,” inCVPR, 2024, pp. 22 020–22 029

work page 2024

[12] [12]

RecyBat24: A dataset for detecting lithium- ion batteries in electronic waste disposal,

X. C. Acaro Chac ´onet al., “RecyBat24: A dataset for detecting lithium- ion batteries in electronic waste disposal,”Scientific Data, vol. 12, no. 1, p. 843, 2025

work page 2025

[13] [13]

Battery detection dataset,

Project TICS, “Battery detection dataset,” https://universe.roboflow.com/ project-tics-ylrlr/battery-detection-sszwf, 2024

work page 2024