Error-Decomposed Class-Conditional Fusion for Statistically Guaranteed Hard-Category Robust Perception

(2) Lishui University; China; China); Guowei Luo (1); Hefei; Lishui; Zhao Xie (1) ((1) Hefei University of Technology; Ziqi Shi (2)

arxiv: 2605.17591 · v1 · pith:SMT7MHUXnew · submitted 2026-05-17 · 💻 cs.CV

Error-Decomposed Class-Conditional Fusion for Statistically Guaranteed Hard-Category Robust Perception

Guowei Luo (1) , Ziqi Shi (2) , Zhao Xie (1) ((1) Hefei University of Technology , Hefei , China , (2) Lishui University , Lishui , China) This is my paper

Pith reviewed 2026-05-20 14:05 UTC · model grok-4.3

classification 💻 cs.CV

keywords Hard-Category Reliability ProblemError-Decomposed Class-Conditional Fusionobject detectionclass-conditional fusionquad-state error taxonomyrobust perceptionsafety-critical systemsperformance calibration

0 comments

The pith

Error-Decomposed Class-Conditional Fusion rectifies vulnerable detection classes via quad-state error taxonomy while preserving global performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the Hard-Category Reliability Problem as the persistent masking of catastrophic failures in long-tail minority classes by aggregate detection metrics. It proposes Error-Decomposed Class-Conditional Fusion as a decision-layer framework that projects predictions into a quad-state error taxonomy and activates calibration pathways only when data empirically justifies it. On a constrained 600-image benchmark isolating one critical class, the method delivers a 22.4 percent relative mAP50 gain for that class alongside a modest overall mAP50 rise and passes strict statistical tests across fifty paired trials. A sympathetic reader cares because safety-critical perception systems require reliable fixes for repeatable, operationally dangerous errors without introducing new trade-offs on stable categories.

Core claim

The paper claims that Error-Decomposed Class-Conditional Fusion formally dismantles the Hard-Category Reliability Problem by decomposing predictions into a quad-state error taxonomy and conditionally applying class-specific calibration pathways, thereby elevating the vulnerable class mAP50 from 0.089343 to 0.109353 while raising global mAP50 from 0.581925 to 0.584864 and demonstrating a 96 percent win rate with Bonferroni-corrected significance across fifty subset trials.

What carries the argument

Error-Decomposed Class-Conditional Fusion, the decision-layer framework that projects predictions into a quad-state error taxonomy to dynamically activate calibration pathways exclusively for vulnerable classes.

If this is right

Vulnerable classes receive targeted mAP50 gains of roughly 22 percent relative without compromising Pareto-optimal global stability.
Output-level fusion becomes an auditable, statistically guaranteed process rather than a heuristic post-processing step.
Repeatable failures masked by aggregate metrics can be isolated and rectified under stringent validation protocols.
The framework maintains performance boundaries of stable classes while rectifying hard categories in constrained benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same quad-state decomposition could be tested on segmentation or pose estimation tasks that exhibit similar long-tail class imbalances.
Integration with upstream hard-example mining during training might produce larger compounded robustness gains than inference-time fusion alone.
Replication on datasets with varying image resolutions or sensor types would clarify whether the dynamic activation rules transfer beyond the current benchmark conditions.

Load-bearing premise

The quad-state error taxonomy and the empirical rules for activating calibration pathways are assumed to remain justified outside the specific 600-image benchmark and the chosen vulnerable class.

What would settle it

A new validation set using a different vulnerable class where the method produces no statistically significant mAP50 gain for the target class while preserving or improving the aggregate score would falsify the claimed general applicability.

Figures

Figures reproduced from arXiv: 2605.17591 by (2) Lishui University, China, China), Guowei Luo (1), Hefei, Lishui, Zhao Xie (1) ((1) Hefei University of Technology, Ziqi Shi (2).

**Figure 1.** Figure 1: Left: the HCRP workflow from hard-category definition through error buckets to class-conditional fusion. Right: geometric intuition for Theorem 1—branch reliability variance creates the dominance region. 4.2 Output Decision Layer A uniform branch rule assigns every class to the same source. ED-CCF instead keeps stable classes on the all-class source and assigns a hard class to a controlled repair source on… view at source ↗

**Figure 2.** Figure 2: The main validation result includes bootstrap confidence intervals and corrected paired-test markers for the replay and class-conditional output candidates. Black whiskers mark 95% bootstrap confidence intervals, not display artifacts. Bonferroni-corrected p = 1.90e−08 for all mAP50 and p = 2.88e−08 for cz mAP50. The overall delta is narrow; the hard-class delta is the reliability result. Theorem 1 and th… view at source ↗

**Figure 3.** Figure 3: Five-fold held-out box plots show median lines, fold-level points, and whiskers for replay, class-conditional fusion, and the RCV check. comparison remains replay versus the output-decision candidate. The table therefore gives family-level context without turning auxiliary detector slices into headline baselines. 7 Ablation and Analysis 7.1 Hard-Class Error Structure [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 4.** Figure 4: HCEC and BSR expose hard-category error concentration and branch-switch pressure. 7.3 Reliability Gain Curve and Output Cost The RGC view plots movement relative to replay. Final WBF, RCV-best, and CRC-best sit on the same measured point: +0.002939 all mAP50 and +0.020010 cz mAP50. The statistical audit ties the Wilcoxon-backed claim to the finalversus-replay pair because CRC and RCV use the same verified… view at source ↗

**Figure 5.** Figure 5: The reliability-gain curve and output-level deployment audit show the measured hard-class gain and the cost of post-processing the prediction JSON. Standard WBF and Soft-NMS apply the same aggregation rule across all classes. ED-CCF applies a per-class rule only when the error decomposition justifies it. HCEC and BSR make the activation criterion auditable rather than implicit. 8.1 Implications and Future … view at source ↗

**Figure 6.** Figure 6: A representative qualitative panel compares ground truth, replay predictions, and final predictions while marking the HCRP hard-class check. 10 Conclusion This paper formally introduced HCRP, a rigorous decision-layer formulation that fundamentally addresses the Pareto tradeoff between hard-class breakthrough and stable-class preservation. ED-CCF elegantly decomposes errors, verifies branch-role asymmetry… view at source ↗

read the original abstract

Aggregate object detection metrics inherently mask catastrophic and repeatable failures in operationally critical, long-tail minority classes. This paper formally defines this pervasive vulnerability as the Hard-Category Reliability Problem (HCRP): the fundamental architectural challenge of strictly rectifying vulnerable categories without compromising the performance boundaries of stable classes under stringent protocols. To systematically dismantle this limitation, we propose Error-Decomposed Class-Conditional Fusion (ED-CCF), an elegant decision-layer inference framework. Diverging from heuristic global post-processing, ED-CCF projects predictions into a sophisticated quad-state error taxonomy, dynamically activating calibration pathways exclusively upon rigorous empirical justification. On a highly constrained 600-image validation benchmark, isolating cz as the critical vulnerability (HCEC=0.86, BSR=0.14), our framework achieves a targeted breakthrough: it elevates cz mAP50 from 0.089343 to 0.109353 (a massive +22.4% relative surge) while flawlessly preserving the Pareto optimality of global stability (raising all mAP50 from 0.581925 to 0.584864). Backed by exhaustive validation across 50 paired subset trials demonstrating an overwhelming 96% win rate and strict Bonferroni-corrected Wilcoxon significance (p<0.05), this work fundamentally redefines output-level fusion as an auditable, statistically guaranteed paradigm for safety-critical visual perception.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ED-CCF shows targeted gains on one hard class via quad-state post-processing but the statistical guarantee does not hold up independently of the small shared benchmark.

read the letter

The punchline is that this paper gives a post-processing method to improve detection on hard categories using a quad-state error taxonomy and class-conditional fusion, showing gains on one class in a small benchmark with some statistical tests. What the paper does well is to name the Hard-Category Reliability Problem clearly and demonstrate a targeted fix that raises the mAP on the vulnerable class while slightly improving the global score. The 22 percent relative gain on cz and the 96 percent win rate across trials are specific enough to be useful for someone wanting to replicate the approach. The idea of only activating calibration when there is empirical justification is a practical way to keep the method from hurting other classes. The soft spots are in the strength of the evidence for generalization. The whole thing rests on a 600-image validation benchmark, which is quite limited for claims about robust perception. The choice of class cz with its high error rate looks like it was identified within this data, and the quad-state taxonomy construction is not detailed beyond the results. Most importantly, the 50 paired subset trials come from the same images, making the Bonferroni-corrected Wilcoxon test dependent on the benchmark construction. This means the statistical significance does not establish an independent guarantee that the method would perform the same way on new data. The stress-test concern is accurate on this point. This kind of work is for computer vision practitioners dealing with long-tail distributions in applications like autonomous systems. A reader could get value from the taxonomy as an idea to adapt, but it would need larger tests to be convincing. I recommend putting it through peer review so that the referees can examine the full details on the taxonomy and request additional benchmarks to test the transfer of the activation rules.

Referee Report

1 major / 2 minor

Summary. The manuscript defines the Hard-Category Reliability Problem (HCRP) as the challenge of rectifying vulnerable long-tail categories in object detection without compromising stable classes. It proposes Error-Decomposed Class-Conditional Fusion (ED-CCF), a decision-layer framework that projects predictions into a quad-state error taxonomy and dynamically activates calibration pathways upon empirical justification. On a 600-image validation benchmark isolating cz (HCEC=0.86), the method reports lifting cz mAP50 from 0.089343 to 0.109353 (+22.4% relative) while raising overall mAP50 from 0.581925 to 0.584864, backed by a 96% win rate across 50 paired subset trials and Bonferroni-corrected Wilcoxon p<0.05.

Significance. If the statistical guarantees can be shown to hold under independent validation, the approach could offer a practical output-level fusion technique for safety-critical perception by targeting repeatable failures in minority classes while preserving global Pareto optimality. The concrete numeric deltas, relative improvement, and explicit statistical test results constitute a strength that permits direct assessment of the claims.

major comments (1)

[Experimental Validation] The 50 paired subset trials and Bonferroni-corrected Wilcoxon test (p<0.05) are performed on partitions or resamples of the identical 600-image validation benchmark used both to identify cz as the critical vulnerability and to justify the calibration decisions and pathway activation rules. This dependence prevents the reported significance from establishing an independent statistical guarantee that the quad-state taxonomy and dynamic activation transfer beyond the specific benchmark construction.

minor comments (2)

[Method] Additional details on the precise construction of the quad-state error taxonomy, the empirical criteria for pathway activation, and any pseudocode or algorithmic description would improve reproducibility.
[Abstract] The abstract and results section could explicitly note the benchmark size (600 images) and the single-class focus when stating the statistical claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback, particularly on the experimental validation. We address the major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Experimental Validation] The 50 paired subset trials and Bonferroni-corrected Wilcoxon test (p<0.05) are performed on partitions or resamples of the identical 600-image validation benchmark used both to identify cz as the critical vulnerability and to justify the calibration decisions and pathway activation rules. This dependence prevents the reported significance from establishing an independent statistical guarantee that the quad-state taxonomy and dynamic activation transfer beyond the specific benchmark construction.

Authors: We agree that the current validation relies on resamples and partitions drawn from the same 600-image benchmark used to identify the hard category cz and to tune the activation rules. The 50 paired trials were constructed by repeatedly drawing random calibration/evaluation splits from this fixed benchmark to quantify variability and rule out single-split artifacts, with cz pre-identified on the full set. While this design provides evidence of stability within the benchmark, it does not constitute fully independent validation on a disjoint dataset. To strengthen the statistical guarantee of transfer, we will add a new experiment in the revised manuscript that applies the identical ED-CCF pipeline (including the same quad-state taxonomy and empirical justification thresholds) to a separate, larger held-out test collection and reports the corresponding mAP50 deltas together with the same Wilcoxon test. This addition will directly address the concern about benchmark-specific dependence. revision: yes

Circularity Check

1 steps flagged

Statistical significance and performance gains both derived from partitions of the identical 600-image validation benchmark

specific steps

fitted input called prediction [Abstract]
"dynamically activating calibration pathways exclusively upon rigorous empirical justification. On a highly constrained 600-image validation benchmark, isolating cz as the critical vulnerability (HCEC=0.86, BSR=0.14), our framework achieves a targeted breakthrough: it elevates cz mAP50 from 0.089343 to 0.109353 ... Backed by exhaustive validation across 50 paired subset trials demonstrating an overwhelming 96% win rate and strict Bonferroni-corrected Wilcoxon significance (p<0.05)"

The empirical justification for pathway activation is obtained from the 600-image benchmark; the same benchmark (and its subsets) is then used to compute the mAP improvements and the Wilcoxon p-value. Because the subsets share images, class distribution, and the pre-selected vulnerable class, the reported statistical guarantee is not independent of the data that defined the calibration rule.

full rationale

The paper's central claim of a 'statistically guaranteed' hard-category robustness rests on empirical justification for the quad-state taxonomy and dynamic pathway activation, followed by mAP lifts and 96% win-rate Wilcoxon tests. Both the justification and the reported metrics come from the same constrained 600-image set (with pre-chosen cz class). Subset trials are resamples of this single benchmark, so the significance test cannot establish independence from the data used to tune and validate the method. This matches the fitted-input-called-prediction pattern: the 'guarantee' reduces to performance on the data that selected the activation rule.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so free parameters, axioms, and invented entities cannot be audited in detail. The method appears to introduce a new error taxonomy and activation rule whose justification is described as empirical but not further specified.

pith-pipeline@v0.9.0 · 5810 in / 1278 out tokens · 51859 ms · 2026-05-20T14:05:14.009285+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ED-CCF projects predictions into a sophisticated quad-state error taxonomy, dynamically activating calibration pathways exclusively upon rigorous empirical justification.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HCEC(c) = NP A(c) +N W C(c) / total errors; BSR(c) measures all-class drop from class-preferred branch.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Alexandridis, K.P., Elezi, I., Deng, J., Nguyen, A., Luo, S.: Fractal calibration for long-tailed object detection. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025). https://doi.org/10.1109/cvpr52734.2025.01410, https://doi.org/10.1109/cvpr52734.2025.01410

work page doi:10.1109/cvpr52734.2025.01410 2025
[2]

In: Computer Vision (2024)

Bhowmik, M.K.: Real-time benchmark datasets for object detection. In: Computer Vision (2024). https://doi.org/10.1201/9781003432036-4, https://doi.org/10.1201/9781003432036-4

work page doi:10.1201/9781003432036-4 2024
[3]

Soft-NMS -- Improving Object Detection With One Line of Code

Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS – improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision (2017),http://arxiv.org/abs/1704.04503

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

Dai, T., Yang, L., Guo, H., Wang, J., Zhu, Z.: Dcsf-kd: Dynamic channel-wise spatial feature knowledge distillation for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025). https://doi.org/10.1609/aaai.v39i3.32266, https://doi.org/10.1609/aaai.v39i3.32266

work page doi:10.1609/aaai.v39i3.32266 2025
[5]

In: Lecture Notes in ED-CCF for Hard-Category Robust Perception 13 Computer Science (2025)

Ding, Z., Zhang, Z., Yuan, M., Ma, G., Lv, G.: Cedp-yolo: Uav object detection based on context enhancement and dynamic perception. In: Lecture Notes in ED-CCF for Hard-Category Robust Perception 13 Computer Science (2025). https://doi.org/10.1007/978-981-97-8502-5_25, https://doi.org/10.1007/978-981-97-8502-5_25

work page doi:10.1007/978-981-97-8502-5_25 2025
[6]

Journal of Big Data (2025)

Elsharkawy, Z.F., Kasban, H., Abbass, M.Y.: Efficient surface crack segmentation for industrial and civil applications based on an enhanced yolov8 model. Journal of Big Data (2025). https://doi.org/10.1186/s40537-025-01065-1, https://doi.org/10.1186/s40537-025-01065-1

work page doi:10.1186/s40537-025-01065-1 2025
[7]

In: 2025 19th International Conference on Semantic Computing (ICSC) (2025)

Gaba, S.: Improving long-tailed object detection with balanced group softmax and metric learning. In: 2025 19th International Conference on Semantic Computing (ICSC) (2025). https://doi.org/10.1109/icsc64641.2025.00051, https://doi.org/10.1109/icsc64641.2025.00051

work page doi:10.1109/icsc64641.2025.00051 2025
[8]

Scientific Reports (2025)

Han, R., Wang, C., Wang, Y., Zhang, Y., Guo, W., Zi, Y., Zhao, J.: Defect detection in ebsm components through selective box fusion of modern object detection. Scientific Reports (2025). https://doi.org/10.1038/s41598-025-96406-8, https://doi.org/10.1038/s41598-025-96406-8

work page doi:10.1038/s41598-025-96406-8 2025
[9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ho, C.H., Peng, K.C., Vasconcelos, N.: Long-tailed anomaly detection with learnable class names. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12435–12446 (2024), https://openaccess.thecvf.com/content/CVPR2024/html/Ho_Long-Tailed_ Anomaly_Detection_with_Learnable_Class_Names_CVPR_2024_paper.html

work page 2024
[10]

In: Lecture Notes in Computer Science (2026)

Hu, J.: Yolo-fda: Integrating hierarchical attention and detail enhancement for surface defect detection. In: Lecture Notes in Computer Science (2026). https://doi.org/10.1007/978-981-95-5758-5_15, https://doi.org/10.1007/978-981-95-5758-5_15

work page doi:10.1007/978-981-95-5758-5_15 2026
[11]

Nature Communications (2025)

Hu, Y., Chen, N., Hou, Y., Lin, X., Jing, B., Liu, P.: Lightweight deep learning for real-time road distress detection on mobile devices. Nature Communications (2025). https://doi.org/10.1038/s41467-025-59516-5, https://doi.org/10.1038/s41467-025-59516-5

work page doi:10.1038/s41467-025-59516-5 2025
[12]

International Journal of Computer Vision133, 1033–1047 (2025)

Huseljic, D., Herde, M., Hahn, P., Müjde, M., Sick, B.: Systematic evaluation of uncertainty calibration in pretrained object detectors. International Journal of Computer Vision133, 1033–1047 (2025). https://doi.org/10.1007/s11263-024-02219-z, https://link.springer.com/article/10.1007/s11263-024-02219-z

work page doi:10.1007/s11263-024-02219-z 2025
[13]

In: Lecture notes in computer science (2024)

Kuzucu, S., Oksuz, K., Sadeghi, J., Dokania, P.K.: On calibration of object detectors: Pitfalls, evaluation and baselines. In: Lecture notes in computer science (2024). https://doi.org/10.1007/978-3-031-72664-4_11, https://doi.org/10.1007/978-3-031-72664-4_11

work page doi:10.1007/978-3-031-72664-4_11 2024
[14]

Applied Sciences (2025)

Li, W., Luo, X., Yang, C., Fang, M., Liu, W.: A lightweight yolov11n-based framework for highway pavement distress detection under occlusion conditions. Applied Sciences (2025). https://doi.org/10.3390/app15179664, https://doi.org/10.3390/app15179664

work page doi:10.3390/app15179664 2025
[15]

IEEE Transactions on Geoscience and Remote Sensing (2025)

Li, Y., Ling, Q., An, Y., Yin, H., Gao, X., Zhu, Z., Han, P.: Dhc-net: A remote sensing object detection under haze and class imbalance. IEEE Transactions on Geoscience and Remote Sensing (2025). https://doi.org/10.1109/tgrs.2025.3551286, https://doi.org/10.1109/tgrs.2025.3551286

work page doi:10.1109/tgrs.2025.3551286 2025
[16]

In: 2024 IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV), pp

Popordanoska, T., Tiulpin, A., Blaschko, M.B.: Beyond classification: Definition and density-based estimation of calibration in object detection. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024). https://doi.org/10.1109/wacv57701.2024.00064, https://doi.org/10.1109/wacv57701.2024.00064 14 Guowei Luo, Ziqi Shi, and Zhao Xie

work page doi:10.1109/wacv57701.2024.00064 2024
[17]

Processes (2024)

Rattanaphan, S., Briassouli, A.: Evaluating generalization, bias, and fairness in deep learning for metal surface defect detection: A comparative study. Processes (2024). https://doi.org/10.3390/pr12030456, https://doi.org/10.3390/pr12030456

work page doi:10.3390/pr12030456 2024
[18]

Image and Vision Computing107, 104117 (2021)

Solovyev, R., Wang, W., Gabruseva, T.: Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing107, 104117 (2021). https://doi.org/10.1016/j.imavis.2021.104117, https://doi.org/10.1016/j.imavis.2021.104117

work page doi:10.1016/j.imavis.2021.104117 2021
[19]

Multimedia Systems (2025)

Tong, K., Wu, Y.: Small object detection using hybrid evaluation metric with context decoupling. Multimedia Systems (2025). https://doi.org/10.1007/s00530-025-01738-0, https://doi.org/10.1007/s00530-025-01738-0

work page doi:10.1007/s00530-025-01738-0 2025
[20]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Tran, P.V.: Simltd: Simple supervised and semi-supervised long-tailed object detection. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025). https://doi.org/10.1109/cvpr52734.2025.00440, https://doi.org/10.1109/cvpr52734.2025.00440

work page doi:10.1109/cvpr52734.2025.00440 2025
[21]

Explaining object detection through difference map

Tsai, C.M., Wu, L.L., Chen, T.Y.: Enhanced fisheye object detection via yolo ensemble learning and weighted box fusion. In: 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2025). https://doi.org/10.1109/iccvw69036.2025.00552, https://doi.org/10.1109/iccvw69036.2025.00552

work page doi:10.1109/iccvw69036.2025.00552 2025
[22]

In: Proceedings of the 37th International Conference on Machine Learning

Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 9919–9928. PMLR (2020),https://proceedings.mlr.press/v119/wang20j.html

work page 2020
[23]

SSRN Electronic Journal (2024)

Yesmin, F.: Bias detection and fairness analysis in object detection and image classification using open images v7. SSRN Electronic Journal (2024). https://doi.org/10.2139/ssrn.5018209,https://doi.org/10.2139/ssrn.5018209

work page doi:10.2139/ssrn.5018209 2024
[24]

Tengda Zhou, Shaoyang Men, Jingxian Liang, Baoxian Yu, Han Zhang, and Xiaomu Luo

Zhang, C., Zhang, Y., Guan, J., Zhou, S.: Dumo: A dual-model framework for effective long-tailed object detection. In: 2025 IEEE International Conference on Multimedia and Expo (ICME) (2025). https://doi.org/10.1109/icme59968.2025.11209185, https://doi.org/10.1109/icme59968.2025.11209185

work page doi:10.1109/icme59968.2025.11209185 2025
[25]

An llm-powered natural-to-robotic language translation framework with correctness guarantees

Zhang, F.: Multiscale attention knowledge distillation for object detection. In: 2025 International Joint Conference on Neural Networks (IJCNN) (2025). https://doi.org/10.1109/ijcnn64981.2025.11227248, https://doi.org/10.1109/ijcnn64981.2025.11227248

work page doi:10.1109/ijcnn64981.2025.11227248 2025
[26]

Neurocomputing (2025)

Zhang, Y., Long, J., Li, C.: Knowledge distillation for object detection with diffusion model. Neurocomputing (2025). https://doi.org/10.1016/j.neucom.2025.130019, https://doi.org/10.1016/j.neucom.2025.130019

work page doi:10.1016/j.neucom.2025.130019 2025
[27]

Scientific Reports (2025)

Zhong, J., Kong, D., Wei, Y., Pan, B.: Yolov8 and point cloud fusion for enhanced road pothole detection and quantification. Scientific Reports (2025). https://doi.org/10.1038/s41598-025-94993-0, https://doi.org/10.1038/s41598-025-94993-0

work page doi:10.1038/s41598-025-94993-0 2025
[28]

Sensors (2025)

Zhu, J., Sheng, J., Cai, Q.: Fd2-yolo: A frequency-domain dual-stream network based on yolo for crack detection. Sensors (2025). https://doi.org/10.3390/s25113427,https://doi.org/10.3390/s25113427

work page doi:10.3390/s25113427 2025

[1] [1]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Alexandridis, K.P., Elezi, I., Deng, J., Nguyen, A., Luo, S.: Fractal calibration for long-tailed object detection. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025). https://doi.org/10.1109/cvpr52734.2025.01410, https://doi.org/10.1109/cvpr52734.2025.01410

work page doi:10.1109/cvpr52734.2025.01410 2025

[2] [2]

In: Computer Vision (2024)

Bhowmik, M.K.: Real-time benchmark datasets for object detection. In: Computer Vision (2024). https://doi.org/10.1201/9781003432036-4, https://doi.org/10.1201/9781003432036-4

work page doi:10.1201/9781003432036-4 2024

[3] [3]

Soft-NMS -- Improving Object Detection With One Line of Code

Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS – improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision (2017),http://arxiv.org/abs/1704.04503

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

Dai, T., Yang, L., Guo, H., Wang, J., Zhu, Z.: Dcsf-kd: Dynamic channel-wise spatial feature knowledge distillation for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025). https://doi.org/10.1609/aaai.v39i3.32266, https://doi.org/10.1609/aaai.v39i3.32266

work page doi:10.1609/aaai.v39i3.32266 2025

[5] [5]

In: Lecture Notes in ED-CCF for Hard-Category Robust Perception 13 Computer Science (2025)

Ding, Z., Zhang, Z., Yuan, M., Ma, G., Lv, G.: Cedp-yolo: Uav object detection based on context enhancement and dynamic perception. In: Lecture Notes in ED-CCF for Hard-Category Robust Perception 13 Computer Science (2025). https://doi.org/10.1007/978-981-97-8502-5_25, https://doi.org/10.1007/978-981-97-8502-5_25

work page doi:10.1007/978-981-97-8502-5_25 2025

[6] [6]

Journal of Big Data (2025)

Elsharkawy, Z.F., Kasban, H., Abbass, M.Y.: Efficient surface crack segmentation for industrial and civil applications based on an enhanced yolov8 model. Journal of Big Data (2025). https://doi.org/10.1186/s40537-025-01065-1, https://doi.org/10.1186/s40537-025-01065-1

work page doi:10.1186/s40537-025-01065-1 2025

[7] [7]

In: 2025 19th International Conference on Semantic Computing (ICSC) (2025)

Gaba, S.: Improving long-tailed object detection with balanced group softmax and metric learning. In: 2025 19th International Conference on Semantic Computing (ICSC) (2025). https://doi.org/10.1109/icsc64641.2025.00051, https://doi.org/10.1109/icsc64641.2025.00051

work page doi:10.1109/icsc64641.2025.00051 2025

[8] [8]

Scientific Reports (2025)

Han, R., Wang, C., Wang, Y., Zhang, Y., Guo, W., Zi, Y., Zhao, J.: Defect detection in ebsm components through selective box fusion of modern object detection. Scientific Reports (2025). https://doi.org/10.1038/s41598-025-96406-8, https://doi.org/10.1038/s41598-025-96406-8

work page doi:10.1038/s41598-025-96406-8 2025

[9] [9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ho, C.H., Peng, K.C., Vasconcelos, N.: Long-tailed anomaly detection with learnable class names. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12435–12446 (2024), https://openaccess.thecvf.com/content/CVPR2024/html/Ho_Long-Tailed_ Anomaly_Detection_with_Learnable_Class_Names_CVPR_2024_paper.html

work page 2024

[10] [10]

In: Lecture Notes in Computer Science (2026)

Hu, J.: Yolo-fda: Integrating hierarchical attention and detail enhancement for surface defect detection. In: Lecture Notes in Computer Science (2026). https://doi.org/10.1007/978-981-95-5758-5_15, https://doi.org/10.1007/978-981-95-5758-5_15

work page doi:10.1007/978-981-95-5758-5_15 2026

[11] [11]

Nature Communications (2025)

Hu, Y., Chen, N., Hou, Y., Lin, X., Jing, B., Liu, P.: Lightweight deep learning for real-time road distress detection on mobile devices. Nature Communications (2025). https://doi.org/10.1038/s41467-025-59516-5, https://doi.org/10.1038/s41467-025-59516-5

work page doi:10.1038/s41467-025-59516-5 2025

[12] [12]

International Journal of Computer Vision133, 1033–1047 (2025)

Huseljic, D., Herde, M., Hahn, P., Müjde, M., Sick, B.: Systematic evaluation of uncertainty calibration in pretrained object detectors. International Journal of Computer Vision133, 1033–1047 (2025). https://doi.org/10.1007/s11263-024-02219-z, https://link.springer.com/article/10.1007/s11263-024-02219-z

work page doi:10.1007/s11263-024-02219-z 2025

[13] [13]

In: Lecture notes in computer science (2024)

Kuzucu, S., Oksuz, K., Sadeghi, J., Dokania, P.K.: On calibration of object detectors: Pitfalls, evaluation and baselines. In: Lecture notes in computer science (2024). https://doi.org/10.1007/978-3-031-72664-4_11, https://doi.org/10.1007/978-3-031-72664-4_11

work page doi:10.1007/978-3-031-72664-4_11 2024

[14] [14]

Applied Sciences (2025)

Li, W., Luo, X., Yang, C., Fang, M., Liu, W.: A lightweight yolov11n-based framework for highway pavement distress detection under occlusion conditions. Applied Sciences (2025). https://doi.org/10.3390/app15179664, https://doi.org/10.3390/app15179664

work page doi:10.3390/app15179664 2025

[15] [15]

IEEE Transactions on Geoscience and Remote Sensing (2025)

Li, Y., Ling, Q., An, Y., Yin, H., Gao, X., Zhu, Z., Han, P.: Dhc-net: A remote sensing object detection under haze and class imbalance. IEEE Transactions on Geoscience and Remote Sensing (2025). https://doi.org/10.1109/tgrs.2025.3551286, https://doi.org/10.1109/tgrs.2025.3551286

work page doi:10.1109/tgrs.2025.3551286 2025

[16] [16]

In: 2024 IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV), pp

Popordanoska, T., Tiulpin, A., Blaschko, M.B.: Beyond classification: Definition and density-based estimation of calibration in object detection. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024). https://doi.org/10.1109/wacv57701.2024.00064, https://doi.org/10.1109/wacv57701.2024.00064 14 Guowei Luo, Ziqi Shi, and Zhao Xie

work page doi:10.1109/wacv57701.2024.00064 2024

[17] [17]

Processes (2024)

Rattanaphan, S., Briassouli, A.: Evaluating generalization, bias, and fairness in deep learning for metal surface defect detection: A comparative study. Processes (2024). https://doi.org/10.3390/pr12030456, https://doi.org/10.3390/pr12030456

work page doi:10.3390/pr12030456 2024

[18] [18]

Image and Vision Computing107, 104117 (2021)

Solovyev, R., Wang, W., Gabruseva, T.: Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing107, 104117 (2021). https://doi.org/10.1016/j.imavis.2021.104117, https://doi.org/10.1016/j.imavis.2021.104117

work page doi:10.1016/j.imavis.2021.104117 2021

[19] [19]

Multimedia Systems (2025)

Tong, K., Wu, Y.: Small object detection using hybrid evaluation metric with context decoupling. Multimedia Systems (2025). https://doi.org/10.1007/s00530-025-01738-0, https://doi.org/10.1007/s00530-025-01738-0

work page doi:10.1007/s00530-025-01738-0 2025

[20] [20]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Tran, P.V.: Simltd: Simple supervised and semi-supervised long-tailed object detection. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025). https://doi.org/10.1109/cvpr52734.2025.00440, https://doi.org/10.1109/cvpr52734.2025.00440

work page doi:10.1109/cvpr52734.2025.00440 2025

[21] [21]

Explaining object detection through difference map

Tsai, C.M., Wu, L.L., Chen, T.Y.: Enhanced fisheye object detection via yolo ensemble learning and weighted box fusion. In: 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2025). https://doi.org/10.1109/iccvw69036.2025.00552, https://doi.org/10.1109/iccvw69036.2025.00552

work page doi:10.1109/iccvw69036.2025.00552 2025

[22] [22]

In: Proceedings of the 37th International Conference on Machine Learning

Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 9919–9928. PMLR (2020),https://proceedings.mlr.press/v119/wang20j.html

work page 2020

[23] [23]

SSRN Electronic Journal (2024)

Yesmin, F.: Bias detection and fairness analysis in object detection and image classification using open images v7. SSRN Electronic Journal (2024). https://doi.org/10.2139/ssrn.5018209,https://doi.org/10.2139/ssrn.5018209

work page doi:10.2139/ssrn.5018209 2024

[24] [24]

Tengda Zhou, Shaoyang Men, Jingxian Liang, Baoxian Yu, Han Zhang, and Xiaomu Luo

Zhang, C., Zhang, Y., Guan, J., Zhou, S.: Dumo: A dual-model framework for effective long-tailed object detection. In: 2025 IEEE International Conference on Multimedia and Expo (ICME) (2025). https://doi.org/10.1109/icme59968.2025.11209185, https://doi.org/10.1109/icme59968.2025.11209185

work page doi:10.1109/icme59968.2025.11209185 2025

[25] [25]

An llm-powered natural-to-robotic language translation framework with correctness guarantees

Zhang, F.: Multiscale attention knowledge distillation for object detection. In: 2025 International Joint Conference on Neural Networks (IJCNN) (2025). https://doi.org/10.1109/ijcnn64981.2025.11227248, https://doi.org/10.1109/ijcnn64981.2025.11227248

work page doi:10.1109/ijcnn64981.2025.11227248 2025

[26] [26]

Neurocomputing (2025)

Zhang, Y., Long, J., Li, C.: Knowledge distillation for object detection with diffusion model. Neurocomputing (2025). https://doi.org/10.1016/j.neucom.2025.130019, https://doi.org/10.1016/j.neucom.2025.130019

work page doi:10.1016/j.neucom.2025.130019 2025

[27] [27]

Scientific Reports (2025)

Zhong, J., Kong, D., Wei, Y., Pan, B.: Yolov8 and point cloud fusion for enhanced road pothole detection and quantification. Scientific Reports (2025). https://doi.org/10.1038/s41598-025-94993-0, https://doi.org/10.1038/s41598-025-94993-0

work page doi:10.1038/s41598-025-94993-0 2025

[28] [28]

Sensors (2025)

Zhu, J., Sheng, J., Cai, Q.: Fd2-yolo: A frequency-domain dual-stream network based on yolo for crack detection. Sensors (2025). https://doi.org/10.3390/s25113427,https://doi.org/10.3390/s25113427

work page doi:10.3390/s25113427 2025