OD3: Optimization-free Dataset Distillation for Object Detection

Ahmed ElHagry; Salwa K. Al Khatib; Shitong Shao; Zhiqiang Shen

arxiv: 2506.01942 · v2 · submitted 2025-06-02 · 💻 cs.CV

OD3: Optimization-free Dataset Distillation for Object Detection

Salwa K. Al Khatib , Ahmed ElHagry , Shitong Shao , Zhiqiang Shen This is my paper

Pith reviewed 2026-05-19 10:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords dataset distillationobject detectionsynthetic dataoptimization-freeCOCOPASCAL VOCdata compressioncomputer vision

0 comments

The pith

OD3 distills object detection datasets to tiny sizes without any optimization, beating prior methods by over 14% mAP50 on COCO at 1% compression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OD3 as a way to synthesize small training sets for object detectors by avoiding the optimization steps used in most dataset distillation work. It first places real object instances into new images at locations judged suitable, then applies a pre-trained observer model to discard placements the model does not confidently recognize. A reader would care because full-scale detection datasets like COCO demand enormous training compute, so a reliable compression technique could let models be trained on far less data while preserving accuracy.

Core claim

OD3 is an optimization-free dataset distillation framework for object detection that operates in two stages. The first stage performs candidate selection by iteratively placing object instances into synthesized images at suitable locations. The second stage performs candidate screening by passing the synthesized images through a pre-trained observer model and removing objects that receive low confidence scores. When tested on MS COCO and PASCAL VOC at compression ratios between 0.25% and 5%, the resulting datasets yield higher detection accuracy than the sole prior detection-specific distillation method and conventional core-set selection baselines, including an improvement of more than 14%m

What carries the argument

The two-stage pipeline of location-based instance placement for candidate selection followed by pre-trained observer model screening to remove low-confidence objects.

If this is right

Detectors trained on the distilled sets maintain higher accuracy than those trained on data from earlier distillation or selection techniques at the same compression level.
The approach works directly on standard benchmarks such as MS COCO and PASCAL VOC across a range of compression ratios from 0.25% to 5%.
No gradient-based optimization is required during the synthesis process, lowering the compute needed to create the distilled data.
Smaller synthetic datasets allow faster iteration and reduced resource use when training object detection models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The placement and screening steps could be adapted to other dense prediction tasks such as semantic segmentation if the observer model is swapped for a segmentation network.
Performance may degrade if the observer model was trained on data whose distribution differs strongly from the target detection task.
Combining the non-optimization placement heuristic with limited optimization on a subset of parameters might yield further accuracy gains at moderate extra cost.

Load-bearing premise

Suitable locations for placing object instances can be identified reliably and a pre-trained observer model can remove low-quality placements without introducing systematic errors.

What would settle it

Train a standard object detector on the OD3-synthesized 1% COCO dataset and compute mAP50; if the score falls below the previous best distillation or core-set result by more than a small margin, the performance claim is refuted.

Figures

Figures reproduced from arXiv: 2506.01942 by Ahmed ElHagry, Salwa K. Al Khatib, Shitong Shao, Zhiqiang Shen.

**Figure 1.** Figure 1: Performance Comparison of OD3 on COCO. We compare the mAP performance of our method to others with different compression rates. The upper bound is 39.8% on the full dataset. Deep neural networks have achieved remarkable performance across a wide range of computer vision tasks [3, 4, 5, 6], but training these models generally requires substantial computational and data resources. Conventional strategies of… view at source ↗

**Figure 2.** Figure 2: Illustration of the OD3 framework. In initial stage ①, each object in xi ∈ T is assigned a random location in the synthesized image ˆxj ∈ S for j = 1, ..., IPD, and its overlap with existing candidates is checked to decide placement. After IPD synthesized images are initially constructed, a pre-trained observer model produces predictions for screening. The observer iteratively evaluates the current canvas … view at source ↗

**Figure 3.** Figure 3: Illustration of our sampling controller. It ensures that the same object is not placed in different distilled images. The original dataset is divided into IPD segments. Each segment is distilled into a single image, resulting in a compact dataset S. Information Density. To quantify how thoroughly a canvas is occupied by valuable objects, we define an Information Density function Φ(x): Φ(x) = g (fθ(x)) a… view at source ↗

**Figure 4.** Figure 4: Extended bounding box for more object context. The figure shows a reconstructed image along with an extended object using the SADCE function to better capture the object’s context. where F is the SA-DCE function, r ∈ R is a scalar representing a small pre-determined number of pixels, a(oir ) is the area of r-th object in i-th image, and amax and amin represent the maximum and minimum areas of objects in… view at source ↗

**Figure 5.** Figure 5: Qualitative results of the synthesis process of OD3 on MS COCO. Initial backgrounds of the canvas are randomly selected from the training set, and objects are inserted using their boundingbox level labels. Those images are generated at 0.5% IPD. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study of overlap threshold τ . mAP and mAP50 are evaluated at different thresholds used in candidate selection with compression ratios ranging from 0.25% to 5%. Cross-architecture generalization. To assess the generalization capability of our condensed datasets, we conduct experiments with Faster R-CNN [4], a two-stage detector and RetinaNet [23], a one-stage detector [PITH_FULL_IMAGE:figures/ful… view at source ↗

**Figure 7.** Figure 7: Percentage of images in the distilled datasets relative to the full dataset [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: further illustrates the relative probability distribution of supercategories across dataset versions. Despite significant compression that can be seen in [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Training large neural networks on large-scale datasets requires substantial computational resources, particularly for dense prediction tasks such as object detection. Although dataset distillation (DD) has been proposed to alleviate these demands by synthesizing compact datasets from larger ones, most existing work focuses solely on image classification, leaving the more complex detection setting largely unexplored. In this paper, we introduce OD3, a novel optimization-free data distillation framework specifically designed for object detection. Our approach involves two stages: first, a candidate selection process in which object instances are iteratively placed in synthesized images based on their suitable locations, and second, a candidate screening process using a pre-trained observer model to remove low-confidence objects. We perform our data synthesis framework on MS COCO and PASCAL VOC, two popular detection datasets, with compression ratios ranging from 0.25% to 5%. Compared to the prior solely existing dataset distillation method on detection and conventional core set selection methods, OD3 delivers superior accuracy, establishes new state-of-the-art results, surpassing prior best method by more than 14% on COCO mAP50 at a compression ratio of 1.0%. Code is available at: https://github.com/VILA-Lab/OD3.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OD3 gives a heuristic placement plus observer-screening recipe for optimization-free detection dataset distillation and claims a 14-point mAP50 lift at 1% compression, but the gains rest on how well those two steps actually work.

read the letter

OD3 is the first optimization-free distillation method built for object detection. It replaces the usual gradient-based synthesis with two heuristic stages: iteratively placing object instances into background images at chosen locations, then running a pre-trained observer to drop low-confidence boxes. The authors test this on COCO and VOC at compression ratios from 0.25 % to 5 % and report that it beats the single prior detection distillation paper plus standard core-set baselines by more than 14 % mAP50 on COCO at the 1 % setting. They also release code, which is useful for anyone who wants to try it quickly.

Referee Report

2 major / 2 minor

Summary. The paper introduces OD3, an optimization-free dataset distillation framework for object detection consisting of two stages: (1) iterative candidate selection that places object instances into synthesized images at heuristically determined suitable locations, and (2) candidate screening that uses a pre-trained observer model to filter low-confidence detections. Experiments are performed on MS COCO and PASCAL VOC at compression ratios 0.25%–5%; the central claim is that OD3 outperforms the sole prior detection-specific DD method and conventional core-set baselines, establishing new SOTA results including a >14% mAP50 gain on COCO at 1% compression.

Significance. If the performance claims hold under rigorous controls, the work would be significant because it extends dataset distillation to the more complex object-detection setting and demonstrates that an explicitly optimization-free pipeline can still produce usable distilled sets. The public code release at https://github.com/VILA-Lab/OD3 is a concrete strength that supports reproducibility.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): the claim of >14% mAP50 improvement over the prior best method at 1% compression on COCO is load-bearing for the paper’s contribution, yet the manuscript supplies no description of the training protocol for the downstream detector, the exact baselines (including their hyper-parameters and implementation details), the number of random seeds, or any statistical significance tests. Without these elements the superiority cannot be assessed for post-hoc selection or missing controls.
[§3.1] §3.1 (Candidate Selection): the iterative placement of object instances into “suitable locations” is described only at the level of heuristic rules with no equations, saliency map formulation, or geometric constraints; because the entire framework is optimization-free, any weakness in this placement heuristic directly determines whether the reported accuracy gains are explained by the proposed mechanism rather than by the observer screening step alone.

minor comments (2)

[§4] The compression-ratio definition (images or instances per class) should be stated explicitly in the experimental setup to allow direct comparison with prior DD literature.
[Figure 3] Figure captions for the synthesized images should indicate the observer-model threshold used during screening so readers can reproduce the filtering step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below and will revise the manuscript to incorporate additional details and clarifications where appropriate.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of >14% mAP50 improvement over the prior best method at 1% compression on COCO is load-bearing for the paper’s contribution, yet the manuscript supplies no description of the training protocol for the downstream detector, the exact baselines (including their hyper-parameters and implementation details), the number of random seeds, or any statistical significance tests. Without these elements the superiority cannot be assessed for post-hoc selection or missing controls.

Authors: We agree that these experimental details are essential for assessing the validity of the reported gains. In the revised manuscript, we will expand §4 to provide: a full specification of the downstream detector training protocol (architecture, optimizer, learning rate, epochs, and data augmentation); precise hyper-parameters and implementation notes for every baseline, including the prior detection-specific DD method; results averaged over multiple random seeds with standard deviations; and statistical significance tests (e.g., t-tests) comparing OD3 against baselines. The public code repository will be referenced explicitly for exact reproduction. revision: yes
Referee: [§3.1] §3.1 (Candidate Selection): the iterative placement of object instances into “suitable locations” is described only at the level of heuristic rules with no equations, saliency map formulation, or geometric constraints; because the entire framework is optimization-free, any weakness in this placement heuristic directly determines whether the reported accuracy gains are explained by the proposed mechanism rather than by the observer screening step alone.

Authors: We acknowledge that the current description of candidate selection remains at a high-level heuristic. In the revision we will formalize §3.1 by adding equations for the iterative placement procedure, explicit geometric constraints (non-overlap, border margins, scale consistency), and any saliency considerations used to score locations. We will also insert an ablation study that isolates the placement heuristic from the observer screening step, thereby demonstrating that both stages contribute to the final performance rather than screening alone. revision: yes

Circularity Check

0 steps flagged

No circularity: heuristic pipeline with independent empirical claims

full rationale

The paper describes an optimization-free two-stage procedure consisting of iterative candidate selection for placing object instances into synthesized images followed by screening via a pre-trained observer model. No equations, fitted parameters, or self-citations are presented that reduce the reported mAP improvements or SOTA claims to the method's own inputs by construction. The performance numbers are obtained by training detectors on the resulting distilled sets and evaluating on held-out test data from COCO and VOC, which constitutes an external benchmark rather than a tautological re-expression of any internal fit or prior self-result. The derivation chain is therefore self-contained as a sequence of heuristic steps whose validity is tested empirically rather than assumed.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on two domain assumptions about object placement and model-based filtering; no free parameters are explicitly fitted to target metrics and no new physical or mathematical entities are introduced.

axioms (2)

domain assumption Object instances can be iteratively placed into synthesized images at suitable locations without requiring optimization.
This premise underpins the candidate selection stage described in the abstract.
domain assumption A pre-trained observer model can effectively identify and remove low-confidence objects from the synthesized images.
This premise underpins the candidate screening stage described in the abstract.

pith-pipeline@v0.9.0 · 5753 in / 1435 out tokens · 37837 ms · 2026-05-19T10:42:34.599528+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

candidate selection process in which object instances are iteratively placed in synthesized images based on their suitable locations, and second, a candidate screening process using a pre-trained observer model to remove low-confidence objects
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Information Density Φ(x) = ... aggregate detection confidence scores ... combined area

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 7 internal anchors

[1]

A fast knowledge distillation framework for visual recognition

Zhiqiang Shen and Eric Xing. A fast knowledge distillation framework for visual recognition. In European conference on computer vision, pages 673–690. Springer, 2022

work page 2022
[2]

Pkd: General distillation framework for object detectors via pearson correlation coefficient

Weihan Cao, Yifan Zhang, Jianfei Gao, Anda Cheng, Ke Cheng, and Jian Cheng. Pkd: General distillation framework for object detectors via pearson correlation coefficient. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 15394–15406. Curran Associates, Inc., 2022

work page 2022
[3]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[4]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[6]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023

work page 2023
[7]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009
[8]

Scaling vision transformers to 22 billion parameters

Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, An- dreas Peter Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, et al. Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning, pages 7480–7512. PMLR, 2023

work page 2023
[9]

Objects365: A large-scale, high-quality dataset for object detection

Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. Objects365: A large-scale, high-quality dataset for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8430–8439, 2019

work page 2019
[10]

Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective

Zeyuan Yin, Eric Xing, and Zhiqiang Shen. Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[11]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Channel-wise knowledge distillation for dense prediction

Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, and Chunhua Shen. Channel-wise knowledge distillation for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5311–5320, 2021

work page 2021
[13]

Fetch and forge: Efficient dataset condensation for object detection

Ding Qi, Jian Li, Jinlong Peng, Bo Zhao, Shuguang Dou, Jialin Li, Jiangning Zhang, Yabiao Wang, Chengjie Wang, and Cairong Zhao. Fetch and forge: Efficient dataset condensation for object detection. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[14]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014

work page 2014
[15]

Williams, J

Mark Everingham, Luc Van Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88:303–338, 2010. 10

work page 2010
[16]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[17]

Openmmlab model compression toolbox and benchmark

MMRazor Contributors. Openmmlab model compression toolbox and benchmark. https://github. com/open-mmlab/mmrazor, 2021

work page 2021
[18]

icarl: Incremental classifier and representation learning

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017

work page 2001
[19]

Coreset selection for object detection

Hojun Lee, Suyoung Kim, Junhoo Lee, Jaeyoung Yoo, and Nojun Kwak. Coreset selection for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 7682–7691, 2024

work page 2024
[20]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

End-to-end incremental learning

Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pages 233–248, 2018

work page 2018
[22]

Super-Samples from Kernel Herding

Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel herding. arXiv preprint arXiv:1203.3472, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[23]

Focal Loss for Dense Object Detection

T Lin. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

Exploring plain vision transformer backbones for object detection

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Exploring plain vision transformer backbones for object detection. In European conference on computer vision, pages 280–296. Springer, 2022

work page 2022
[25]

Deepcore: A comprehensive library for coreset selection in deep learning

Chengcheng Guo, Bo Zhao, and Yanbing Bai. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications, pages 181–195. Springer, 2022

work page 2022
[26]

Coresets for ordered weighted clustering

Vladimir Braverman, Shaofeng H-C Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for ordered weighted clustering. In International Conference on Machine Learning, pages 744–753. PMLR, 2019

work page 2019
[27]

Coresets for clustering with fairness constraints

Lingxiao Huang, Shaofeng Jiang, and Nisheeth Vishnoi. Coresets for clustering with fairness constraints. Advances in neural information processing systems, 32, 2019

work page 2019
[28]

Training-free dataset pruning for instance segmentation

Anonymous. Training-free dataset pruning for instance segmentation. In Submitted to The Thirteenth International Conference on Learning Representations, 2024. under review

work page 2024
[29]

Dataset Distillation , journal =

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. Dataset distillation. arXiv preprint arXiv:1811.10959, 2018

work page arXiv 2018
[30]

Dataset distillation by matching training trajectories

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4750–4759, 2022

work page 2022
[31]

On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm

Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9390–9399, 2024

work page 2024
[32]

blank canvas

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4750–4759, 2022. 11 Appendix A Limitations and Societal Impacts There are many potential societal impacts of our work, such...

work page 2022

[1] [1]

A fast knowledge distillation framework for visual recognition

Zhiqiang Shen and Eric Xing. A fast knowledge distillation framework for visual recognition. In European conference on computer vision, pages 673–690. Springer, 2022

work page 2022

[2] [2]

Pkd: General distillation framework for object detectors via pearson correlation coefficient

Weihan Cao, Yifan Zhang, Jianfei Gao, Anda Cheng, Ke Cheng, and Jian Cheng. Pkd: General distillation framework for object detectors via pearson correlation coefficient. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 15394–15406. Curran Associates, Inc., 2022

work page 2022

[3] [3]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[4] [4]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[5] [5]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[6] [6]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023

work page 2023

[7] [7]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009

[8] [8]

Scaling vision transformers to 22 billion parameters

Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, An- dreas Peter Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, et al. Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning, pages 7480–7512. PMLR, 2023

work page 2023

[9] [9]

Objects365: A large-scale, high-quality dataset for object detection

Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. Objects365: A large-scale, high-quality dataset for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8430–8439, 2019

work page 2019

[10] [10]

Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective

Zeyuan Yin, Eric Xing, and Zhiqiang Shen. Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective. Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[11] [11]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

Channel-wise knowledge distillation for dense prediction

Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, and Chunhua Shen. Channel-wise knowledge distillation for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5311–5320, 2021

work page 2021

[13] [13]

Fetch and forge: Efficient dataset condensation for object detection

Ding Qi, Jian Li, Jinlong Peng, Bo Zhao, Shuguang Dou, Jialin Li, Jiangning Zhang, Yabiao Wang, Chengjie Wang, and Cairong Zhao. Fetch and forge: Efficient dataset condensation for object detection. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[14] [14]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014

work page 2014

[15] [15]

Williams, J

Mark Everingham, Luc Van Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88:303–338, 2010. 10

work page 2010

[16] [16]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[17] [17]

Openmmlab model compression toolbox and benchmark

MMRazor Contributors. Openmmlab model compression toolbox and benchmark. https://github. com/open-mmlab/mmrazor, 2021

work page 2021

[18] [18]

icarl: Incremental classifier and representation learning

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017

work page 2001

[19] [19]

Coreset selection for object detection

Hojun Lee, Suyoung Kim, Junhoo Lee, Jaeyoung Yoo, and Nojun Kwak. Coreset selection for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 7682–7691, 2024

work page 2024

[20] [20]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

End-to-end incremental learning

Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pages 233–248, 2018

work page 2018

[22] [22]

Super-Samples from Kernel Herding

Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel herding. arXiv preprint arXiv:1203.3472, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[23] [23]

Focal Loss for Dense Object Detection

T Lin. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

Exploring plain vision transformer backbones for object detection

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Exploring plain vision transformer backbones for object detection. In European conference on computer vision, pages 280–296. Springer, 2022

work page 2022

[25] [25]

Deepcore: A comprehensive library for coreset selection in deep learning

Chengcheng Guo, Bo Zhao, and Yanbing Bai. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications, pages 181–195. Springer, 2022

work page 2022

[26] [26]

Coresets for ordered weighted clustering

Vladimir Braverman, Shaofeng H-C Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for ordered weighted clustering. In International Conference on Machine Learning, pages 744–753. PMLR, 2019

work page 2019

[27] [27]

Coresets for clustering with fairness constraints

Lingxiao Huang, Shaofeng Jiang, and Nisheeth Vishnoi. Coresets for clustering with fairness constraints. Advances in neural information processing systems, 32, 2019

work page 2019

[28] [28]

Training-free dataset pruning for instance segmentation

Anonymous. Training-free dataset pruning for instance segmentation. In Submitted to The Thirteenth International Conference on Learning Representations, 2024. under review

work page 2024

[29] [29]

Dataset Distillation , journal =

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. Dataset distillation. arXiv preprint arXiv:1811.10959, 2018

work page arXiv 2018

[30] [30]

Dataset distillation by matching training trajectories

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4750–4759, 2022

work page 2022

[31] [31]

On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm

Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9390–9399, 2024

work page 2024

[32] [32]

blank canvas

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4750–4759, 2022. 11 Appendix A Limitations and Societal Impacts There are many potential societal impacts of our work, such...

work page 2022