M^2C-EvDet: Multi-Domain Multi-Order Cross-Modal Knowledge Distillation for Event-based Object Detection

Shouan Pan; Siqi Li; Wei Bao; Yi Xie; Yue Gao

arxiv: 2606.24248 · v1 · pith:FEF3QEMNnew · submitted 2026-06-23 · 💻 cs.CV

M²C-EvDet: Multi-Domain Multi-Order Cross-Modal Knowledge Distillation for Event-based Object Detection

Wei Bao , Siqi Li , Shouan Pan , Yi Xie , Yue Gao This is my paper

Pith reviewed 2026-06-26 00:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords event-based object detectionknowledge distillationcross-modal learningfrequency learninghypergraph computationmulti-order relationsadaptive feature distillationevent cameras

0 comments

The pith

M^2C-EvDet improves event-based object detection by distilling frequency features and multi-order relations from frame-based models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that prior knowledge distillation methods for event-based detection fall short because they address only spatial semantics or pairwise relations, leaving a performance gap with frame-based systems in complex scenes. It proposes integrating two new modules into a single framework: one that adaptively decouples and distills frequency-domain information, and another that models multi-order relations through hypergraph computation. A sympathetic reader would care because event cameras provide high temporal resolution and wide dynamic range that frame cameras lack, yet their sparse data makes accurate detection harder without better cross-modal transfer of visual meaning. If the approach works, event-based detectors would close more of the accuracy gap while retaining their speed and robustness advantages.

Core claim

M^2C-EvDet is a Multi-domain and Multi-order Cross-modal knowledge distillation framework for Event-based object Detection. Built upon frequency learning and hypergraph computation, M^2C-EvDet integrates two specialized modules: Adaptive Frequency-Decoupled Feature Distillation (AF^2D^2) and Multi-Order Relational Distillation (MORD) to address limitations of prior distillation methods that only focus on spatial semantics or pair-wise relations.

What carries the argument

The M^2C-EvDet framework, with AF^2D^2 for adaptive frequency-decoupled feature distillation and MORD for multi-order relational distillation via hypergraph computation.

If this is right

Event-based detectors achieve higher accuracy in complex scenarios than methods limited to spatial or pairwise distillation.
Visual semantics missing from sparse event data are transferred more completely across modalities.
The performance disparity between EvDet and frame-based detection is reduced through combined frequency and relational knowledge transfer.
Distillation now operates effectively on both frequency domains and higher-order scene relations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same frequency-plus-hypergraph approach could be tested on other event-based tasks such as tracking or segmentation.
Combining the framework with existing large frame-based pretrained models might lower the data requirements for training event detectors.
Hypergraph modeling of multi-order relations may prove useful in other sparse sensing domains beyond vision.

Load-bearing premise

That the frequency-decoupled and multi-order relational modules will successfully capture and transfer the missing visual semantics from frame-based models to event data in complex scenarios beyond what prior spatial or pairwise methods achieved.

What would settle it

Experiments on standard event detection benchmarks showing no meaningful accuracy gain over prior distillation baselines specifically in complex or high-dynamic-range sequences.

Figures

Figures reproduced from arXiv: 2606.24248 by Shouan Pan, Siqi Li, Wei Bao, Yi Xie, Yue Gao.

**Figure 1.** Figure 1: Comparison of our proposed M2C-EvDet with existing methods, including (a) whole spatial domain distillation and (b) low-order relational distill. Our method (c) transfers low-frequency and high-frequency decoupled object semantics, and low-order and high-order object relations through frequency learning and hypergraph computation. semantics. It can be observed that existing cross-modal knowledge distillat… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed M2C-EvDet framework. During training phase, our method is built on a teacher-student architecture and takes event streams and RGB frames as inputs. The M2CFD module is employed to implement low-order and high-order relational distillation, and low-frequency and high-frequency decoupled distillation. During inference phase, our method solely take event streams as inputs to perform o… view at source ↗

**Figure 3.** Figure 3: Illustration of the AF2D 2 module, which leverages adaptive lowpass and high-pass filter to achieve the decoupling of low-frequency and highfrequency feature distillation. low-pass and high-pass filtering modules to perform decoupled distillation of low-frequency and high-frequency components. During the inference stage, our method solely takes event streams as inputs and leverages the enhanced the multi… view at source ↗

**Figure 4.** Figure 4: Illustration of the MORD module, which leverages self attention and hyper attention to transfer low-order and high-order relations. and Tl − Sl to represent differential modality. The final highfrequency and low-frequency components of the RGB teacher features and Event student features can be obtained as: S lf l = X T t=1 A lf(t) l · Slf(t) l , S hf l = X T t=1 A hf(t) l · Shf(t) l T lf l = X T t=1 A lf(… view at source ↗

**Figure 5.** Figure 5: Experimental results of sensitivity analysis of distillation loss weight [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 7.** Figure 7: Visual comparison of detection results between the baseline and [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of adaptive hyperedges for MORD module on the DSECDetection dataset [10]. Visualization of Detection Results [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Visual results of failure case for M2C-Det on the DSEC-Detection dataset [10]. mitigates the cross-modal gap but cannot completely eliminate modal discrepancies, due to the low-quality RGB images and the inadequate model optimization process. As illustrated in the low-light scenarios of [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

Event-based object Detection (EvDet), as a biologically inspired visual perception paradigm, demonstrates superior performance in scenarios demanding high temporal resolution and a wide dynamic range. Nevertheless, the inherent sparse representations and inadequate visual semantics of event data result in a considerable performance disparity between EvDet and frame-based object detection. Previous works attempt to alleviate this cross-modal discrepancy through knowledge distillation, yet they only focus on spatial visual semantics or pair-wise relational information, thus limiting performance in more complex scenarios. To address this challenge, this paper proposes M^2C-EvDet, a Multi-domain and Multi-order Cross-modal knowledge distillation framework for EvDet. Built upon frequency learning and hypergraph computation, M^2C-EvDet integrates two specialized modules: Adaptive Frequency-Decoupled Feature Distillation (AF^2D^2) and Multi-Order Relational Distillation (MORD).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two new distillation modules using frequency and hypergraphs for event-based detection, but the abstract supplies no equations, numbers, or ablations to check if they deliver.

read the letter

The paper's core move is a distillation framework that adds frequency-decoupled features and multi-order relational transfer to close the gap between frame-based and event-based detectors. It names two modules, AF^2D^2 and MORD, built on frequency learning and hypergraph computation, and argues that prior distillation work stopped at spatial semantics or pairwise relations.

That framing is reasonable. The motivation matches a known limitation in event vision, and extending distillation to frequency domains plus higher-order relations is a logical next step rather than a reinvention.

The problem is that the abstract contains none of the supporting material needed to evaluate the claim. No equations for either module, no dataset details, no quantitative gains, and no ablation results appear. Without those, it is impossible to tell whether the new components actually transfer the missing semantics or simply increase model complexity. The full manuscript may contain the experiments, but they are not visible here.

This work sits squarely in the event-camera and cross-modal distillation corner of computer vision. Readers already working on knowledge transfer for sparse sensors could extract the module ideas for their own pipelines, but only after seeing whether the numbers hold up.

I would send it to peer review. The direction is coherent and the gap it targets is real; the experiments will decide whether the modules are worth adopting.

Referee Report

2 major / 0 minor

Summary. The paper proposes M^2C-EvDet, a Multi-domain Multi-Order Cross-Modal knowledge distillation framework for Event-based Object Detection (EvDet). It introduces two modules—Adaptive Frequency-Decoupled Feature Distillation (AF^2D^2) and Multi-Order Relational Distillation (MORD)—built on frequency learning and hypergraph computation to transfer richer visual semantics from frame-based models to sparse event data, addressing limitations of prior distillation approaches that focus only on spatial semantics or pairwise relations.

Significance. If the modules prove effective at capturing and transferring multi-domain and multi-order information, the work could meaningfully narrow the performance gap between EvDet and frame-based detection in complex, high-dynamic-range scenarios, contributing a new direction for cross-modal distillation in event-based vision.

major comments (2)

[Abstract] Abstract: The central claim of improved performance in complex scenarios rests on the success of AF^2D^2 and MORD, yet the manuscript provides no equations, algorithmic details, or pseudocode for how frequency decoupling is performed adaptively or how hypergraph-based multi-order relations are computed and distilled.
[Abstract] Abstract: No experimental results, ablation studies, or quantitative comparisons are presented, so the assertion that the proposed modules outperform prior spatial or pairwise distillation methods cannot be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments on the abstract. The full manuscript contains the technical details and experimental results referenced in the body; we address each point below and indicate where revisions to the abstract may be appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of improved performance in complex scenarios rests on the success of AF^2D^2 and MORD, yet the manuscript provides no equations, algorithmic details, or pseudocode for how frequency decoupling is performed adaptively or how hypergraph-based multi-order relations are computed and distilled.

Authors: The abstract is a high-level summary. The full manuscript provides the requested equations for adaptive frequency decoupling in AF^2D^2 (Section 3.2, including the frequency-domain formulation and adaptive weighting) and the hypergraph construction plus multi-order distillation loss for MORD (Section 3.3, with explicit hyperedge definitions and message-passing steps). Algorithmic details and a pseudocode outline appear in the supplementary material. We can add one sentence to the abstract that points to these sections if the editor prefers. revision: partial
Referee: [Abstract] Abstract: No experimental results, ablation studies, or quantitative comparisons are presented, so the assertion that the proposed modules outperform prior spatial or pairwise distillation methods cannot be assessed.

Authors: The full manuscript reports comprehensive experiments in Section 4, including quantitative mAP comparisons on multiple event-based datasets, ablation studies isolating AF^2D^2 and MORD, and direct comparisons against prior spatial and pairwise distillation baselines. The abstract summarizes the outcome of these experiments without specific numbers due to length limits. No change to the abstract is required on this point. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes a new framework M^2C-EvDet integrating AF^2D^2 and MORD modules built on frequency learning and hypergraph computation for cross-modal distillation. No equations, derivations, or fitted parameters are described in the provided text that reduce by construction to the inputs. The central claim is an architectural proposal whose validity rests on empirical performance rather than any self-referential definition or self-citation chain. No load-bearing steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no details on free parameters, axioms, or invented entities; full text required to audit.

pith-pipeline@v0.9.1-grok · 5691 in / 1086 out tokens · 20776 ms · 2026-06-26T00:30:17.692908+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 9 canonical work pages

[1]

Event-Based Vision: A Survey,

G. Guillermo, D. Tobi, M. O. Garrick, B. Chiara, T. Brian, C. Andrea, L. Stefan, D. Andrew, C. Jorg, D. Kostas, and S. Davide, “Event-Based Vision: A Survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

2020
[2]

EVCSLR: Event-guided Continuous Sign Language Recognition and Benchmark,

Y . Jiang, Y . Wang, S. Li, Y . Zhang, Q. Guo, Q. Chu, and Y . Gao, “EVCSLR: Event-guided Continuous Sign Language Recognition and Benchmark,”IEEE Transactions on Multimedia, 2024

2024
[3]

RGB-D Visual Per- ception for Occluded Scenes via Event Camera,

S. Li, Z. Wu, Y . Li, Z. Xue, Y .-S. Liu, and Y . Gao, “RGB-D Visual Per- ception for Occluded Scenes via Event Camera,”International Journal of Computer Vision, pp. 1–22, 2025

2025
[4]

Event- based Low-illumination Image Enhancement,

Y . Jiang, Y . Wang, S. Li, Y . Zhang, M. Zhao, and Y . Gao, “Event- based Low-illumination Image Enhancement,”IEEE Transactions on Multimedia, vol. 26, pp. 1920–1931, 2023

1920
[5]

3D Feature Tracking via Event Camera,

S. Li, Z. Zhou, Z. Xue, Y . Li, S. Du, and Y . Gao, “3D Feature Tracking via Event Camera,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 974–18 983

2024
[6]

AEGNN: Asynchronous Event-based Graph Neural Networks,

S. Schaefer, D. Gehrig, and D. Scaramuzza, “AEGNN: Asynchronous Event-based Graph Neural Networks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 371–12 381

2022
[7]

Spiking Transformers for Event-based Single Object Tracking,

J. Zhang, B. Dong, H. Zhang, J. Ding, F. Heide, B. Yin, and X. Yang, “Spiking Transformers for Event-based Single Object Tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 8801–8810

2022
[8]

Recurrent Vision Transformers for Object Detection with Event Cameras,

M. Gehrig and D. Scaramuzza, “Recurrent Vision Transformers for Object Detection with Event Cameras,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893

2023
[9]

EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision,

D. Torbunov, Y . Ren, A. Ghose, O. Dim, and Y . Cui, “EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision,” in Proceedings of the IEEE International Conference on Computer Vision, 2025, pp. 9812–9821

2025
[10]

Low-Latency Automotive Vision with Event Cameras,

D. Gehrig and D. Scaramuzza, “Low-Latency Automotive Vision with Event Cameras,”Nature, vol. 629, no. 8014, pp. 1034–1040, 2024

2024
[11]

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequen- cies,

D. Lu, L. Kong, G. Lee, C. Chane, and W. Ooi, “FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequen- cies,”arXiv preprint arXiv:2412.06708, 2025

work page arXiv 2025
[12]

Frequency-Adaptive Low-Latency Object Detection Using Events and Frames,

H. Zhang, X. Wang, C. Xu, X. Wang, F. Xu, H. Yu, L. Yu, and W. Yang, “Frequency-Adaptive Low-Latency Object Detection Using Events and Frames,”arXiv preprint arXiv:2412.04149, 2024

work page arXiv 2024
[13]

From Global to Local: Multi-Scale Out-of-Distribution Detection,

J. Zhang, L. Gao, B. Hao, H. Huang, J. Song, and H. Shen, “From Global to Local: Multi-Scale Out-of-Distribution Detection,”IEEE Transactions on Image Processing, vol. 32, pp. 6115–6128, 2023

2023
[14]

From Channel Bias to Feature Redundancy: Uncovering the

J. Zhang, X. Luo, L. Gao, D. Zou, H. Shen, and J. Song, “From Channel Bias to Feature Redundancy: Uncovering the” Less is More” Principle in Few-Shot Learning,”arXiv e-prints, pp. arXiv–2310, 2023

2023
[15]

A Closer Look at Conditional Prompt Tuning for Vision-Language Models,

J. Zhang, S. Wu, L. Gao, J. Song, N. Sebe, and H. T. Shen, “A Closer Look at Conditional Prompt Tuning for Vision-Language Models,”arXiv preprint arXiv:2506.23856, 2025

work page arXiv 2025
[16]

Reliable Few-shot Learning under Dual Noises,

J. Zhang, J. Song, L. Gao, N. Sebe, and H. T. Shen, “Reliable Few-shot Learning under Dual Noises,” 2025

2025
[17]

Object-Centric Cross-Modal Feature Distillation for Event-based Ob- ject Detection,

L. Li, A. Linger, M. Millhaeusler, V . Tsiminaki, Y . Li, and D. Dai, “Object-Centric Cross-Modal Feature Distillation for Event-based Ob- ject Detection,” inProceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2024, pp. 15 440–15 447

2024
[18]

Event-Aware Distilled DETR for Object Detection in an Automotive Context,

D. Rossi, P. Vasseur, F. Morbidi, C. Demonceaux, and F. Rameau, “Event-Aware Distilled DETR for Object Detection in an Automotive Context,” inIEEE Intelligent V ehicles Symposium, 2025

2025
[19]

Wavelet Integrated CNNs for Noise- robust Image Classification,

Q. Li, L. Shen, S. Guo, and Z. Lai, “Wavelet Integrated CNNs for Noise- robust Image Classification,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 7245–7254

2020
[20]

HGNN +: General Hypergraph Neural Networks,

Y . Gao, Y . Feng, S. Ji, and R. Ji, “HGNN +: General Hypergraph Neural Networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3181–3199, 2023

2023
[21]

SuperEvent: Cross- Modal Learning of Event-based Keypoint Detection,

Y . Burkhardt, S. Schaefer, and S. Leutenegger, “SuperEvent: Cross- Modal Learning of Event-based Keypoint Detection,”arXiv preprint arXiv:2504.00139, 2025

work page arXiv 2025
[22]

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event- Based Monocular Depth Estimation,

L. Bartolomei, E. Mannocci, F. Tosi, M. Poggi, and S. Mattoccia, “Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event- Based Monocular Depth Estimation,” inProceedings of the IEEE International Conference on Computer Vision, 2025, pp. 19 669–19 678

2025
[23]

I2EKD: Efficient and Versatile Image-to-Event Knowledge Distillation,

H. Liu, G. Yu, H. Cao, S. Qu, F. Lu, Y . Zhong, Z. Lu, L. Leng, and G. Chen, “I2EKD: Efficient and Versatile Image-to-Event Knowledge Distillation,”IEEE Transactions on Circuits and Systems for Video Technology, 2025. 12

2025
[24]

When Object Detection Meets Knowledge Distillation: A Survey,

Z. Li, P. Xu, X. Chang, L. Yang, Y . Zhang, L. Yao, and X. Chen, “When Object Detection Meets Knowledge Distillation: A Survey,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 10 555–10 579, 2023

2023
[25]

Context Matters: Distilling Knowledge Graph for Enhanced Object Detection,

A. Yang, S. Lin, C.-H. Yeh, M. Shu, Y . Yang, and X. Chang, “Context Matters: Distilling Knowledge Graph for Enhanced Object Detection,” IEEE Transactions on Multimedia, vol. 26, pp. 487–500, 2023

2023
[26]

A Hier- archical Semantic Distillation Framework for Open-V ocabulary Object Detection,

S. Fu, J. Yan, Q. Yang, X. Wei, X. Xie, and W.-S. Zheng, “A Hier- archical Semantic Distillation Framework for Open-V ocabulary Object Detection,”IEEE Transactions on Multimedia, 2025

2025
[27]

Improve Object Detection with Feature-Based Knowledge Distillation: Towards Accurate and Efficient Detectors,

L. Zhang and K. Ma, “Improve Object Detection with Feature-Based Knowledge Distillation: Towards Accurate and Efficient Detectors,” in Proceedings of the International Conference on Learning Representa- tions, 2020

2020
[28]

Focal and Global Knowledge Distillation for Detectors,

Z. Yang, Z. Li, X. Jiang, Y . Gong, Z. Yuan, D. Zhao, and C. Yuan, “Focal and Global Knowledge Distillation for Detectors,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 4643–4652

2022
[29]

Masked Generative Distillation,

Z. Yang, Z. Li, M. Shao, D. Shi, Z. Yuan, and C. Yuan, “Masked Generative Distillation,” inProceedings of the European Conference on Computer Vision. Springer, 2022, pp. 53–69

2022
[30]

D 3ETR: Decoder Distillation for Detection Transformer,

X. Chen, J. Chen, Y . Liu, and G. Zeng, “D 3ETR: Decoder Distillation for Detection Transformer,”arXiv preprint arXiv:2211.09768, 2022

work page arXiv 2022
[31]

DE- TRDistill: A Universal Knowledge Distillation Framework for DETR- families,

J. Chang, S. Wang, H.-M. Xu, Z. Chen, C. Yang, and F. Zhao, “DE- TRDistill: A Universal Knowledge Distillation Framework for DETR- families,” inProceedings of the IEEE International Conference on Computer Vision, 2023, pp. 6898–6908

2023
[32]

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling,

Y . Wang, X. Li, S. Weng, G. Zhang, H. Yue, H. Feng, J. Han, and E. Ding, “KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 016–16 025

2024
[33]

CLoCKDistill: Consistent Location-and- Context-aware Knowledge Distillation for DETRs,

Q. Lan and Q. Tian, “CLoCKDistill: Consistent Location-and- Context-aware Knowledge Distillation for DETRs,”arXiv preprint arXiv:2502.10683, 2025

work page arXiv 2025
[34]

Frequency- aware Feature Fusion for Dense Image Prediction,

L. Chen, Y . Fu, L. Gu, C. Yan, T. Harada, and G. Huang, “Frequency- aware Feature Fusion for Dense Image Prediction,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024
[35]

WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras,

Y . Cui, B. Gao, Y . Zhang, X. Dong, J. Xiang, D. Li, and Z. Tu, “WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras,”arXiv preprint arXiv:2506.09098, 2025

work page arXiv 2025
[36]

Frequency Attention for Knowledge Distillation,

C. Pham, V .-A. Nguyen, T. Le, D. Phung, G. Carneiro, and T.-T. Do, “Frequency Attention for Knowledge Distillation,” inProceedings of the IEEE Winter Conference on Applications of Computer Vision, 2024, pp. 2277–2286

2024
[37]

FreeKD: Knowledge Distillation via Semantic Frequency Prompt,

Y . Zhang, T. Huang, J. Liu, T. Jiang, K. Cheng, and S. Zhang, “FreeKD: Knowledge Distillation via Semantic Frequency Prompt,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 931–15 940

2024
[38]

Learning Representation on Opti- mized High-order Manifold for Visual classification,

X. Ma, W. Liu, Q. Tian, and Y . Gao, “Learning Representation on Opti- mized High-order Manifold for Visual classification,”IEEE Transactions on Multimedia, vol. 24, pp. 3989–4001, 2021

2021
[39]

CrossHypergraph: Consis- tent High-order Semantic Network for Few-shot Image Classification,

Y . Zhang, H. Wang, S. Zhang, and B. Leng, “CrossHypergraph: Consis- tent High-order Semantic Network for Few-shot Image Classification,” IEEE Transactions on Multimedia, pp. 1–12, 2025

2025
[40]

Hypergraph-Based Remaining Prototype Alignment for Open-Set Cross-Domain Image Retrieval,

Y . Xu, Y . Feng, X. Zhong, Y . Gao, and Z. Wu, “Hypergraph-Based Remaining Prototype Alignment for Open-Set Cross-Domain Image Retrieval,”IEEE Transactions on Multimedia, 2025

2025
[41]

Residual Fuzzy Alignment on Hypergraph for Open-Set 3D Cross-Modal Retrieval,

Y . Xu, Y . Feng, X. Zhuang, J. Wang, Z. Wu, and Y . Gao, “Residual Fuzzy Alignment on Hypergraph for Open-Set 3D Cross-Modal Retrieval,” IEEE Transactions on Multimedia, 2025

2025
[42]

Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation,

Y . Feng, J. Huang, S. Du, S. Ying, J.-H. Yong, Y . Li, G. Ding, R. Ji, and Y . Gao, “Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024
[43]

Yolov13: Real-time object detection with hypergraph- enhanced adaptive visual perception,

M. Lei, S. Li, Y . Wu, H. Hu, Y . Zhou, X. Zheng, G. Ding, S. Du, Z. Wu, and Y . Gao, “YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception,”arXiv preprint arXiv:2506.17733, 2025

work page arXiv 2025
[44]

Feature Pyramid Networks for Object Detection,

T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125

2017
[45]

Path Aggregation Network for Instance Segmentation,

S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768

2018
[46]

Pyramid Vision Transformer: A versatile Backbone for Dense Prediction without Convolutions,

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid Vision Transformer: A versatile Backbone for Dense Prediction without Convolutions,” inProceedings of the IEEE International Conference on Computer Vision, 2021, pp. 568–578

2021
[47]

Attention is All You Need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All You Need,” inProceedings of the Advances in Neural Information Processing Systems, vol. 30, 2017

2017
[48]

Vision HGNN: An Image is More than a Graph of Nodes,

Y . Han, P. Wang, S. Kundu, Y . Ding, and Z. Wang, “Vision HGNN: An Image is More than a Graph of Nodes,” inProceedings of the IEEE International Conference on Computer Vision, 2023, pp. 19 878–19 888

2023
[49]

Motion Robust High- Speed Light-Weighted Object Detection with Event Camera,

B. Liu, C. Xu, W. Yang, H. Yu, and L. Yu, “Motion Robust High- Speed Light-Weighted Object Detection with Event Camera,”IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–13, 2023

2023
[50]

State Space Models for Event Cameras,

N. Zubic, M. Gehrig, and D. Scaramuzza, “State Space Models for Event Cameras,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 5819–5828

2024
[51]

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement,

Y . Peng, H. Li, P. Wu, Y . Zhang, X. Sun, and F. Wu, “D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement,” arXiv preprint arXiv:2410.13842, 2024

work page arXiv 2024
[52]

Fusing event-based and rgb camera for robust object detection in adverse conditions,

A. Tomy, A. Paigwar, K. S. Mann, A. Renzaglia, and C. Laugier, “Fusing event-based and rgb camera for robust object detection in adverse conditions,” inIEEE International Conference on Robotics and Automation. IEEE, 2022, pp. 933–939

2022
[53]

SODFormer: Streaming Object Detection with Transformer using Events and Frames,

D. Li, Y . Tian, and J. Li, “SODFormer: Streaming Object Detection with Transformer using Events and Frames,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 14 020–14 037, 2023

2023
[54]

Microsoft COCO: Common Objects in Context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” inProceedings of the European Conference on Computer Vision. Springer, 2014, pp. 740–755

2014
[55]

Ultralytics YOLO,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics

2023
[56]

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,

X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,”Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 21 002–21 012, 2020

2020

[1] [1]

Event-Based Vision: A Survey,

G. Guillermo, D. Tobi, M. O. Garrick, B. Chiara, T. Brian, C. Andrea, L. Stefan, D. Andrew, C. Jorg, D. Kostas, and S. Davide, “Event-Based Vision: A Survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

2020

[2] [2]

EVCSLR: Event-guided Continuous Sign Language Recognition and Benchmark,

Y . Jiang, Y . Wang, S. Li, Y . Zhang, Q. Guo, Q. Chu, and Y . Gao, “EVCSLR: Event-guided Continuous Sign Language Recognition and Benchmark,”IEEE Transactions on Multimedia, 2024

2024

[3] [3]

RGB-D Visual Per- ception for Occluded Scenes via Event Camera,

S. Li, Z. Wu, Y . Li, Z. Xue, Y .-S. Liu, and Y . Gao, “RGB-D Visual Per- ception for Occluded Scenes via Event Camera,”International Journal of Computer Vision, pp. 1–22, 2025

2025

[4] [4]

Event- based Low-illumination Image Enhancement,

Y . Jiang, Y . Wang, S. Li, Y . Zhang, M. Zhao, and Y . Gao, “Event- based Low-illumination Image Enhancement,”IEEE Transactions on Multimedia, vol. 26, pp. 1920–1931, 2023

1920

[5] [5]

3D Feature Tracking via Event Camera,

S. Li, Z. Zhou, Z. Xue, Y . Li, S. Du, and Y . Gao, “3D Feature Tracking via Event Camera,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 974–18 983

2024

[6] [6]

AEGNN: Asynchronous Event-based Graph Neural Networks,

S. Schaefer, D. Gehrig, and D. Scaramuzza, “AEGNN: Asynchronous Event-based Graph Neural Networks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 371–12 381

2022

[7] [7]

Spiking Transformers for Event-based Single Object Tracking,

J. Zhang, B. Dong, H. Zhang, J. Ding, F. Heide, B. Yin, and X. Yang, “Spiking Transformers for Event-based Single Object Tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 8801–8810

2022

[8] [8]

Recurrent Vision Transformers for Object Detection with Event Cameras,

M. Gehrig and D. Scaramuzza, “Recurrent Vision Transformers for Object Detection with Event Cameras,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893

2023

[9] [9]

EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision,

D. Torbunov, Y . Ren, A. Ghose, O. Dim, and Y . Cui, “EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision,” in Proceedings of the IEEE International Conference on Computer Vision, 2025, pp. 9812–9821

2025

[10] [10]

Low-Latency Automotive Vision with Event Cameras,

D. Gehrig and D. Scaramuzza, “Low-Latency Automotive Vision with Event Cameras,”Nature, vol. 629, no. 8014, pp. 1034–1040, 2024

2024

[11] [11]

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequen- cies,

D. Lu, L. Kong, G. Lee, C. Chane, and W. Ooi, “FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequen- cies,”arXiv preprint arXiv:2412.06708, 2025

work page arXiv 2025

[12] [12]

Frequency-Adaptive Low-Latency Object Detection Using Events and Frames,

H. Zhang, X. Wang, C. Xu, X. Wang, F. Xu, H. Yu, L. Yu, and W. Yang, “Frequency-Adaptive Low-Latency Object Detection Using Events and Frames,”arXiv preprint arXiv:2412.04149, 2024

work page arXiv 2024

[13] [13]

From Global to Local: Multi-Scale Out-of-Distribution Detection,

J. Zhang, L. Gao, B. Hao, H. Huang, J. Song, and H. Shen, “From Global to Local: Multi-Scale Out-of-Distribution Detection,”IEEE Transactions on Image Processing, vol. 32, pp. 6115–6128, 2023

2023

[14] [14]

From Channel Bias to Feature Redundancy: Uncovering the

J. Zhang, X. Luo, L. Gao, D. Zou, H. Shen, and J. Song, “From Channel Bias to Feature Redundancy: Uncovering the” Less is More” Principle in Few-Shot Learning,”arXiv e-prints, pp. arXiv–2310, 2023

2023

[15] [15]

A Closer Look at Conditional Prompt Tuning for Vision-Language Models,

J. Zhang, S. Wu, L. Gao, J. Song, N. Sebe, and H. T. Shen, “A Closer Look at Conditional Prompt Tuning for Vision-Language Models,”arXiv preprint arXiv:2506.23856, 2025

work page arXiv 2025

[16] [16]

Reliable Few-shot Learning under Dual Noises,

J. Zhang, J. Song, L. Gao, N. Sebe, and H. T. Shen, “Reliable Few-shot Learning under Dual Noises,” 2025

2025

[17] [17]

Object-Centric Cross-Modal Feature Distillation for Event-based Ob- ject Detection,

L. Li, A. Linger, M. Millhaeusler, V . Tsiminaki, Y . Li, and D. Dai, “Object-Centric Cross-Modal Feature Distillation for Event-based Ob- ject Detection,” inProceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2024, pp. 15 440–15 447

2024

[18] [18]

Event-Aware Distilled DETR for Object Detection in an Automotive Context,

D. Rossi, P. Vasseur, F. Morbidi, C. Demonceaux, and F. Rameau, “Event-Aware Distilled DETR for Object Detection in an Automotive Context,” inIEEE Intelligent V ehicles Symposium, 2025

2025

[19] [19]

Wavelet Integrated CNNs for Noise- robust Image Classification,

Q. Li, L. Shen, S. Guo, and Z. Lai, “Wavelet Integrated CNNs for Noise- robust Image Classification,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 7245–7254

2020

[20] [20]

HGNN +: General Hypergraph Neural Networks,

Y . Gao, Y . Feng, S. Ji, and R. Ji, “HGNN +: General Hypergraph Neural Networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3181–3199, 2023

2023

[21] [21]

SuperEvent: Cross- Modal Learning of Event-based Keypoint Detection,

Y . Burkhardt, S. Schaefer, and S. Leutenegger, “SuperEvent: Cross- Modal Learning of Event-based Keypoint Detection,”arXiv preprint arXiv:2504.00139, 2025

work page arXiv 2025

[22] [22]

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event- Based Monocular Depth Estimation,

L. Bartolomei, E. Mannocci, F. Tosi, M. Poggi, and S. Mattoccia, “Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event- Based Monocular Depth Estimation,” inProceedings of the IEEE International Conference on Computer Vision, 2025, pp. 19 669–19 678

2025

[23] [23]

I2EKD: Efficient and Versatile Image-to-Event Knowledge Distillation,

H. Liu, G. Yu, H. Cao, S. Qu, F. Lu, Y . Zhong, Z. Lu, L. Leng, and G. Chen, “I2EKD: Efficient and Versatile Image-to-Event Knowledge Distillation,”IEEE Transactions on Circuits and Systems for Video Technology, 2025. 12

2025

[24] [24]

When Object Detection Meets Knowledge Distillation: A Survey,

Z. Li, P. Xu, X. Chang, L. Yang, Y . Zhang, L. Yao, and X. Chen, “When Object Detection Meets Knowledge Distillation: A Survey,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 10 555–10 579, 2023

2023

[25] [25]

Context Matters: Distilling Knowledge Graph for Enhanced Object Detection,

A. Yang, S. Lin, C.-H. Yeh, M. Shu, Y . Yang, and X. Chang, “Context Matters: Distilling Knowledge Graph for Enhanced Object Detection,” IEEE Transactions on Multimedia, vol. 26, pp. 487–500, 2023

2023

[26] [26]

A Hier- archical Semantic Distillation Framework for Open-V ocabulary Object Detection,

S. Fu, J. Yan, Q. Yang, X. Wei, X. Xie, and W.-S. Zheng, “A Hier- archical Semantic Distillation Framework for Open-V ocabulary Object Detection,”IEEE Transactions on Multimedia, 2025

2025

[27] [27]

Improve Object Detection with Feature-Based Knowledge Distillation: Towards Accurate and Efficient Detectors,

L. Zhang and K. Ma, “Improve Object Detection with Feature-Based Knowledge Distillation: Towards Accurate and Efficient Detectors,” in Proceedings of the International Conference on Learning Representa- tions, 2020

2020

[28] [28]

Focal and Global Knowledge Distillation for Detectors,

Z. Yang, Z. Li, X. Jiang, Y . Gong, Z. Yuan, D. Zhao, and C. Yuan, “Focal and Global Knowledge Distillation for Detectors,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 4643–4652

2022

[29] [29]

Masked Generative Distillation,

Z. Yang, Z. Li, M. Shao, D. Shi, Z. Yuan, and C. Yuan, “Masked Generative Distillation,” inProceedings of the European Conference on Computer Vision. Springer, 2022, pp. 53–69

2022

[30] [30]

D 3ETR: Decoder Distillation for Detection Transformer,

X. Chen, J. Chen, Y . Liu, and G. Zeng, “D 3ETR: Decoder Distillation for Detection Transformer,”arXiv preprint arXiv:2211.09768, 2022

work page arXiv 2022

[31] [31]

DE- TRDistill: A Universal Knowledge Distillation Framework for DETR- families,

J. Chang, S. Wang, H.-M. Xu, Z. Chen, C. Yang, and F. Zhao, “DE- TRDistill: A Universal Knowledge Distillation Framework for DETR- families,” inProceedings of the IEEE International Conference on Computer Vision, 2023, pp. 6898–6908

2023

[32] [32]

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling,

Y . Wang, X. Li, S. Weng, G. Zhang, H. Yue, H. Feng, J. Han, and E. Ding, “KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 016–16 025

2024

[33] [33]

CLoCKDistill: Consistent Location-and- Context-aware Knowledge Distillation for DETRs,

Q. Lan and Q. Tian, “CLoCKDistill: Consistent Location-and- Context-aware Knowledge Distillation for DETRs,”arXiv preprint arXiv:2502.10683, 2025

work page arXiv 2025

[34] [34]

Frequency- aware Feature Fusion for Dense Image Prediction,

L. Chen, Y . Fu, L. Gu, C. Yan, T. Harada, and G. Huang, “Frequency- aware Feature Fusion for Dense Image Prediction,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024

[35] [35]

WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras,

Y . Cui, B. Gao, Y . Zhang, X. Dong, J. Xiang, D. Li, and Z. Tu, “WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras,”arXiv preprint arXiv:2506.09098, 2025

work page arXiv 2025

[36] [36]

Frequency Attention for Knowledge Distillation,

C. Pham, V .-A. Nguyen, T. Le, D. Phung, G. Carneiro, and T.-T. Do, “Frequency Attention for Knowledge Distillation,” inProceedings of the IEEE Winter Conference on Applications of Computer Vision, 2024, pp. 2277–2286

2024

[37] [37]

FreeKD: Knowledge Distillation via Semantic Frequency Prompt,

Y . Zhang, T. Huang, J. Liu, T. Jiang, K. Cheng, and S. Zhang, “FreeKD: Knowledge Distillation via Semantic Frequency Prompt,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 931–15 940

2024

[38] [38]

Learning Representation on Opti- mized High-order Manifold for Visual classification,

X. Ma, W. Liu, Q. Tian, and Y . Gao, “Learning Representation on Opti- mized High-order Manifold for Visual classification,”IEEE Transactions on Multimedia, vol. 24, pp. 3989–4001, 2021

2021

[39] [39]

CrossHypergraph: Consis- tent High-order Semantic Network for Few-shot Image Classification,

Y . Zhang, H. Wang, S. Zhang, and B. Leng, “CrossHypergraph: Consis- tent High-order Semantic Network for Few-shot Image Classification,” IEEE Transactions on Multimedia, pp. 1–12, 2025

2025

[40] [40]

Hypergraph-Based Remaining Prototype Alignment for Open-Set Cross-Domain Image Retrieval,

Y . Xu, Y . Feng, X. Zhong, Y . Gao, and Z. Wu, “Hypergraph-Based Remaining Prototype Alignment for Open-Set Cross-Domain Image Retrieval,”IEEE Transactions on Multimedia, 2025

2025

[41] [41]

Residual Fuzzy Alignment on Hypergraph for Open-Set 3D Cross-Modal Retrieval,

Y . Xu, Y . Feng, X. Zhuang, J. Wang, Z. Wu, and Y . Gao, “Residual Fuzzy Alignment on Hypergraph for Open-Set 3D Cross-Modal Retrieval,” IEEE Transactions on Multimedia, 2025

2025

[42] [42]

Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation,

Y . Feng, J. Huang, S. Du, S. Ying, J.-H. Yong, Y . Li, G. Ding, R. Ji, and Y . Gao, “Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024

[43] [43]

Yolov13: Real-time object detection with hypergraph- enhanced adaptive visual perception,

M. Lei, S. Li, Y . Wu, H. Hu, Y . Zhou, X. Zheng, G. Ding, S. Du, Z. Wu, and Y . Gao, “YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception,”arXiv preprint arXiv:2506.17733, 2025

work page arXiv 2025

[44] [44]

Feature Pyramid Networks for Object Detection,

T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125

2017

[45] [45]

Path Aggregation Network for Instance Segmentation,

S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768

2018

[46] [46]

Pyramid Vision Transformer: A versatile Backbone for Dense Prediction without Convolutions,

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid Vision Transformer: A versatile Backbone for Dense Prediction without Convolutions,” inProceedings of the IEEE International Conference on Computer Vision, 2021, pp. 568–578

2021

[47] [47]

Attention is All You Need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All You Need,” inProceedings of the Advances in Neural Information Processing Systems, vol. 30, 2017

2017

[48] [48]

Vision HGNN: An Image is More than a Graph of Nodes,

Y . Han, P. Wang, S. Kundu, Y . Ding, and Z. Wang, “Vision HGNN: An Image is More than a Graph of Nodes,” inProceedings of the IEEE International Conference on Computer Vision, 2023, pp. 19 878–19 888

2023

[49] [49]

Motion Robust High- Speed Light-Weighted Object Detection with Event Camera,

B. Liu, C. Xu, W. Yang, H. Yu, and L. Yu, “Motion Robust High- Speed Light-Weighted Object Detection with Event Camera,”IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–13, 2023

2023

[50] [50]

State Space Models for Event Cameras,

N. Zubic, M. Gehrig, and D. Scaramuzza, “State Space Models for Event Cameras,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024, pp. 5819–5828

2024

[51] [51]

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement,

Y . Peng, H. Li, P. Wu, Y . Zhang, X. Sun, and F. Wu, “D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement,” arXiv preprint arXiv:2410.13842, 2024

work page arXiv 2024

[52] [52]

Fusing event-based and rgb camera for robust object detection in adverse conditions,

A. Tomy, A. Paigwar, K. S. Mann, A. Renzaglia, and C. Laugier, “Fusing event-based and rgb camera for robust object detection in adverse conditions,” inIEEE International Conference on Robotics and Automation. IEEE, 2022, pp. 933–939

2022

[53] [53]

SODFormer: Streaming Object Detection with Transformer using Events and Frames,

D. Li, Y . Tian, and J. Li, “SODFormer: Streaming Object Detection with Transformer using Events and Frames,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 14 020–14 037, 2023

2023

[54] [54]

Microsoft COCO: Common Objects in Context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” inProceedings of the European Conference on Computer Vision. Springer, 2014, pp. 740–755

2014

[55] [55]

Ultralytics YOLO,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics

2023

[56] [56]

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,

X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,”Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 21 002–21 012, 2020

2020