Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression
Pith reviewed 2026-05-18 08:35 UTC · model grok-4.3
The pith
A framework gradually blends outputs from original and compressed neural networks during fine-tuning to stabilize iterative model compression.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VCON executes both the original uncompressed model and the compressed model in parallel during fine-tuning; the contribution of the original model is progressively reduced while that of the compressed model is gradually increased via an affine combination, improving stability and mitigating accuracy degradation. In most settings this yields accuracy gains exceeding 1% over baselines, with some configurations showing improvements above 15%.
What carries the argument
The affine combination of outputs from the original and compressed models, which gradually shifts emphasis from the uncompressed network to the compressed one throughout fine-tuning.
If this is right
- The same blending process works with pruning, quantization, and low-rank decomposition without needing separate iterative schedules for each.
- Accuracy improves by more than 1 percent over post-shot and iterative baselines on computer vision and natural language processing benchmarks in most tested cases.
- Some compression configurations reach accuracy gains above 15 percent when the gradual transition is used.
- Training remains more stable because the network never experiences an instantaneous jump from full to compressed behavior.
Where Pith is reading between the lines
- The parallel-execution idea could extend to other sudden model changes such as architecture search or domain adaptation where abrupt shifts cause instability.
- Resource costs during the transition phase might be reduced by running the two models only on selected layers or batches rather than the full network.
- The blending schedule itself could be made data-driven instead of fixed, potentially removing the need for manual tuning across new datasets.
Load-bearing premise
An affine combination of outputs from the original and compressed models during fine-tuning will reliably allow the network to adapt smoothly without introducing new instabilities or requiring technique-specific tuning of the blending schedule.
What would settle it
An experiment applying VCON to a standard compression task and finding no reduction in accuracy degradation or stability improvement compared with direct replacement or existing iterative baselines would falsify the central claim.
Figures
read the original abstract
The increasing scale of Deep Neural Networks (DNNs) introduces the need for compression techniques such as pruning, quantization, and low-rank decomposition. While these methods are very effective at reducing memory, computation, and energy consumption, they may introduce severe accuracy degradation, which is often mitigated by using iterative, gradual compression. However, different compression techniques require distinct iterative approaches, and some result in unstable, discontinuous model fine-tuning. We introduce Vanishing Contributions (VCON), a unified framework for the smooth, iterative transition of DNNs into a compressed form. Rather than replacing the original network directly with its compressed version, VCON executes both in parallel during fine-tuning. The contribution of the original (uncompressed) model is progressively reduced, while that of the compressed model is gradually increased. This affine combination allows the network to slowly adapt, improving stability and mitigating accuracy degradation. We evaluate VCON on computer vision and natural language processing benchmarks, using multiple compression strategies. In most settings, our framework improves accuracy over post-shot and iterative baselines. Typical gains exceed 1%, while some configuration exhibits improvements above 15%. VCON is thus compatible with existing compression techniques and consistently improves performance across diverse tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Vanishing Contributions (VCON), a unified framework for smooth iterative compression of deep neural networks. Rather than directly substituting the original model with a compressed version (via pruning, quantization, or low-rank decomposition), VCON runs both models in parallel during fine-tuning and applies an affine combination to progressively reduce the original model's contribution while increasing the compressed model's. This is claimed to improve training stability and yield accuracy gains over post-shot and iterative baselines, with typical improvements exceeding 1% and some configurations above 15% on vision and NLP benchmarks.
Significance. If the central claims hold under rigorous validation, VCON could offer a practical, general-purpose technique for mitigating accuracy loss during compression transitions, reducing the need for bespoke iterative fine-tuning schedules per compression method. This would be valuable for efficient ML deployment, particularly if the approach proves robust across diverse architectures and compression types without introducing new instabilities.
major comments (3)
- [Abstract] Abstract: the description of the affine combination ((1-α)·f_orig + α·f_comp) does not specify the combination point (logits vs. intermediate features) or the functional form of the ramp schedule α(t). Without these, it is impossible to verify whether the same blending rule remains stable and technique-agnostic when the compressed model changes structure or precision, which is load-bearing for the unified-framework claim.
- [Experimental section] Experimental section (as referenced in the abstract's performance claims): the abstract asserts 'consistent accuracy improvements across vision and language benchmarks' and 'gains exceed 1%' (some >15%) but supplies no dataset sizes, number of runs, error bars, or statistical tests. This prevents verification of whether the reported gains are reliable or could be artifacts of single-run evaluation.
- [Method description] Method description: the central claim presupposes that a single, technique-agnostic schedule for α exists that avoids accuracy cliffs or new instabilities. No ablation on schedule sensitivity or per-technique retuning is referenced, which directly tests the unification premise.
minor comments (1)
- [Abstract] Abstract: the sentence 'some configuration exhibits improvements above 15%' should identify the specific compression method, dataset, and configuration for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our work. We provide point-by-point responses to the major comments below, indicating revisions where we have updated the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the description of the affine combination ((1-α)·f_orig + α·f_comp) does not specify the combination point (logits vs. intermediate features) or the functional form of the ramp schedule α(t). Without these, it is impossible to verify whether the same blending rule remains stable and technique-agnostic when the compressed model changes structure or precision, which is load-bearing for the unified-framework claim.
Authors: We agree with this observation and have revised the abstract to specify that the affine combination is performed on the logits and that α(t) is implemented as a linear ramp schedule. These details, along with the justification for their choice to maintain stability and technique-agnosticism, are elaborated in Section 3 of the paper. revision: yes
-
Referee: [Experimental section] Experimental section (as referenced in the abstract's performance claims): the abstract asserts 'consistent accuracy improvements across vision and language benchmarks' and 'gains exceed 1%' (some >15%) but supplies no dataset sizes, number of runs, error bars, or statistical tests. This prevents verification of whether the reported gains are reliable or could be artifacts of single-run evaluation.
Authors: The full paper in Section 4 includes dataset sizes, results from multiple runs with error bars, and statistical comparisons. To improve the abstract, we have added a sentence noting that the gains are consistent across multiple runs with reported standard deviations. This addresses the concern while maintaining abstract brevity. revision: yes
-
Referee: [Method description] Method description: the central claim presupposes that a single, technique-agnostic schedule for α exists that avoids accuracy cliffs or new instabilities. No ablation on schedule sensitivity or per-technique retuning is referenced, which directly tests the unification premise.
Authors: We have added an ablation study on different schedules for α in the revised manuscript, demonstrating that the linear schedule is robust and does not require per-technique retuning for the compression methods tested. This is now referenced in the method section to strengthen the unification claim. revision: yes
Circularity Check
No significant circularity; VCON defined independently and evaluated externally
full rationale
The paper introduces VCON by defining an affine combination of original and compressed model outputs during joint fine-tuning, with the original contribution progressively reduced via a ramped parameter. This construction is presented as a new technique and then tested on external CV and NLP benchmarks across pruning, quantization, and low-rank methods. No equation or claim reduces the performance gains to a self-definition, a fitted input relabeled as prediction, or a load-bearing self-citation chain. The derivation remains self-contained because the method is specified first and its benefits are measured against independent baselines rather than derived tautologically from its own parameters.
Axiom & Free-Parameter Ledger
free parameters (1)
- contribution schedule
axioms (1)
- domain assumption Affine combination of model outputs during fine-tuning permits stable adaptation across compression techniques
Reference graph
Works this paper leans on
-
[1]
Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, May 2015. doi:10.1038/nature14539 NIKIFOROSet al.: V ANISHING CONTRIBUTIONS: A UNIFIED APPROACH TO SMOOTHLY TRANSITION NEURAL MODELS INTO COMPRESSED FORM 9 TABLE VI TRAINING CONFIGURATIONS AND HYPERPARAMETERS Configuration ViT-T/16, ViT-S/16, ViT-B/16 BERT, dist...
-
[2]
A survey on deploying mobile deep learning applications: A systemic and technical perspective,
Y . Wang, J. Wang, W. Zhang, Y . Zhan, S. Guo, Q. Zheng, and X. Wang, “A survey on deploying mobile deep learning applications: A systemic and technical perspective,”Digital Communications and Networks, vol. 8, no. 1, pp. 1–17, Feb. 2022. doi:10.1016/j.dcan.2021.06.001
-
[3]
A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommenda- tions,
H. Cheng, M. Zhang, and J. Q. Shi, “A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommenda- tions,” Aug. 2024. doi:10.48550/arXiv.2308.06767
-
[4]
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers,
Z. Allen-Zhu, Y . Li, and Y . Liang, “Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019
work page 2019
-
[5]
S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” Feb. 2016. doi:10.48550/arXiv.1510.00149
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1510.00149 2016
-
[6]
Methods for Pruning Deep Neu- ral Networks,
S. Vadera and S. Ameen, “Methods for Pruning Deep Neu- ral Networks,”IEEE Access, vol. 10, pp. 63 280–63 300, 2022. doi:10.1109/ACCESS.2022.3182659
-
[7]
A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions,
R. Mishra, H. P. Gupta, and T. Dutta, “A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions,” Oct
-
[8]
doi:10.48550/arXiv.2010.03954
-
[9]
A Survey on Methods and Theories of Quantized Neural Networks
Y . Guo, “A Survey on Methods and Theories of Quantized Neural Networks,” Dec. 2018. doi:10.48550/arXiv.1808.04752
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1808.04752 2018
-
[10]
[2112.06126] Neural Network Quantization for Efficient Inference: A Survey,
“[2112.06126] Neural Network Quantization for Efficient Inference: A Survey,” https://arxiv.org/abs/2112.06126
-
[11]
A Multiply-And-Max/Min Neuron Paradigm for Aggressively Prunable Deep Neural Networks,
L. Prono, P. Bich, C. Boretti, M. Mangia, F. Pareschi, R. Rovatti, and G. Setti, “A Multiply-And-Max/Min Neuron Paradigm for Aggressively Prunable Deep Neural Networks,”IEEE Transactions on Neural Net- works and Learning Systems, vol. 36, no. 8, pp. 14 414–14 427, Aug
-
[12]
doi:10.1109/TNNLS.2025.3527644
-
[13]
Pruning neural networks without any data by iteratively conserving synaptic flow,
H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 6377–6389
work page 2020
-
[14]
Adaptive Iterative Pruning for Accelerating Deep Neural Networks,
Y . Gordienko, Y . Kochura, V . Taran, N. Gordienko, A. Bugaiov, and S. Stirenko, “Adaptive Iterative Pruning for Accelerating Deep Neural Networks,” in2019 XIth International Scientific and Practical Confer- ence on Electronics and Information Technologies (ELIT), Sep. 2019, pp. 173–178. doi:10.1109/ELIT.2019.8892346
-
[15]
DropNet: Reducing Neural Network Com- plexity via Iterative Pruning,
C. M. J. Tan and M. Motani, “DropNet: Reducing Neural Network Com- plexity via Iterative Pruning,” inProceedings of the 37th International Conference on Machine Learning. PMLR, Nov. 2020, pp. 9356–9366
work page 2020
-
[16]
Progressive Channel-Shrinking Network,
J. Pan, S. Yang, L. G. Foo, Q. Ke, H. Rahmani, Z. Fan, and J. Liu, “Progressive Channel-Shrinking Network,”IEEE Transactions on Mul- timedia, vol. 26, pp. 2016–2026, 2024. doi:10.1109/TMM.2023.3291197
-
[17]
Towards Higher Ranks via Adversarial Weight Pruning,
Y . Tian, H. Chen, T. Guo, C. Xu, and Y . Wang, “Towards Higher Ranks via Adversarial Weight Pruning,” Nov. 2023. doi:10.48550/arXiv.2311.17493
-
[18]
Embedding Com- pression with Isotropic Iterative Quantization,
S. Liao, J. Chen, Y . Wang, Q. Qiu, and B. Yuan, “Embedding Com- pression with Isotropic Iterative Quantization,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8336–8343, Apr. 2020. doi:10.1609/aaai.v34i05.6350
-
[19]
Gradient-Aware In- cremental Network Quantization,
J. Meng, Z. Qu, W. Zhou, S. Hu, and B. Ye, “Gradient-Aware In- cremental Network Quantization,” inNetwork and Parallel Computing, X. Chen, G. Min, D. Guo, X. Xie, and L. Pu, Eds. Singapore: Springer Nature, 2025, pp. 430–441. doi:10.1007/978-981-96-2864-3 34
-
[20]
Iterative Low-Rank Approximation for CNN Com- pression,
M. Kholiavchenko, “Iterative Low-Rank Approximation for CNN Com- pression,” Nov. 2019. doi:10.48550/arXiv.1803.08995
-
[21]
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
Y . He, P. Liu, Z. Wang, Z. Hu, and Y . Yang, “Filter Pruning via Geo- metric Median for Deep Convolutional Neural Networks Acceleration,” Jul. 2019. doi:10.48550/arXiv.1811.00250
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.00250 2019
-
[22]
NISP: Pruning Networks using Neuron Importance Score Propagation
R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V . I. Morariu, X. Han, M. Gao, C.-Y . Lin, and L. S. Davis, “NISP: Pruning Networks using Neuron Impor- tance Score Propagation,” Mar. 2018. doi:10.48550/arXiv.1711.05908
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.05908 2018
-
[23]
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
J.-H. Luo, J. Wu, and W. Lin, “ThiNet: A Filter Level Prun- ing Method for Deep Neural Network Compression,” Jul. 2017. doi:10.48550/arXiv.1707.06342
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06342 2017
-
[24]
Pruning Filters for Efficient ConvNets
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning Fil- ters for Efficient ConvNets,” Mar. 2017. doi:10.48550/arXiv.1608.08710
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1608.08710 2017
-
[25]
A Simple and Effective Pruning Approach for Large Language Models
M. Sun, Z. Liu, A. Bair, and J. Z. Kolter, “A Simple and Ef- fective Pruning Approach for Large Language Models,” May 2024. doi:10.48550/arXiv.2306.11695
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.11695 2024
-
[26]
Group Fisher Pruning for Practical Network Compression,
L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y . Chen, W. Yang, Q. Liao, and W. Zhang, “Group Fisher Pruning for Practical Network Compression,” Aug. 2021. doi:10.48550/arXiv.2108.00708
-
[27]
Manifold Regularized Dynamic Network Pruning,
Y . Tang, Y . Wang, Y . Xu, Y . Deng, C. Xu, D. Tao, and C. Xu, “Manifold Regularized Dynamic Network Pruning,” Mar. 2021. doi:10.48550/arXiv.2103.05861
-
[28]
Sparsegpt: Massive language models can be accurately pruned in one-shot, 2023 b
E. Frantar and D. Alistarh, “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot,” Mar. 2023. doi:10.48550/arXiv.2301.00774
-
[29]
Predicting Parameters in Deep Learning
M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. de Freitas, “Predicting Parameters in Deep Learning,” Oct. 2014. doi:10.48550/arXiv.1306.0543
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1306.0543 2014
-
[30]
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition
V . Lebedev, Y . Ganin, M. Rakhuba, I. Oseledets, and V . Lempitsky, “Speeding-up Convolutional Neural Networks Using Fine-tuned CP- Decomposition,” Apr. 2015. doi:10.48550/arXiv.1412.6553
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6553 2015
-
[31]
Constrained Optimization Based Low-Rank Ap- proximation of Deep Neural Networks,
C. Li and C. J. R. Shi, “Constrained Optimization Based Low-Rank Ap- proximation of Deep Neural Networks,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 732–747
work page 2018
-
[32]
Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets,
T. N. Sainath, B. Kingsbury, V . Sindhwani, E. Arisoy, and B. Ramabhad- ran, “Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets,” in2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 6655–6659. doi:10.1109/ICASSP.2013.6638949
-
[33]
Accelerating Very Deep Convo- lutional Networks for Classification and Detection,
X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating Very Deep Convo- lutional Networks for Classification and Detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 1943– 1955, Oct. 2016. doi:10.1109/TPAMI.2015.2502579
-
[34]
A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, “Tensorizing Neural Networks,” Dec. 2015. doi:10.48550/arXiv.1509.06569
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1509.06569 2015
-
[35]
AdaBin: Improving Bi- nary Neural Networks with Adaptive Binary Sets,
Z. Tu, X. Chen, P. Ren, and Y . Wang, “AdaBin: Improving Bi- nary Neural Networks with Adaptive Binary Sets,” Oct. 2022. doi:10.48550/arXiv.2208.08084
-
[36]
BiPer: Binary Neural Networks using a Periodic Function,
E. Vargas, C. V . Correa, C. Hinojosa, and H. Arguello, “BiPer: Binary Neural Networks using a Periodic Function,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5684–5693
work page 2024
-
[37]
TernaryLLM: Ternarized Large Language Model,
T. Chen, Z. Li, W. Xu, Z. Zhu, D. Li, L. Tian, E. Barsoum, P. Wang, and J. Cheng, “TernaryLLM: Ternarized Large Language Model,” Jun
-
[38]
doi:10.48550/arXiv.2406.07177
-
[39]
TerViT: An Efficient Ternary Vision Transformer,
S. Xu, Y . Li, T. Ma, B. Zeng, B. Zhang, P. Gao, and J. Lv, “TerViT: An Efficient Ternary Vision Transformer,” Jan. 2022. doi:10.48550/arXiv.2201.08050
-
[40]
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,
M. Rastegari, V . Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” NIKIFOROSet al.: V ANISHING CONTRIBUTIONS: A UNIFIED APPROACH TO SMOOTHLY TRANSITION NEURAL MODELS INTO COMPRESSED FORM 10 inComputer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer Inte...
-
[41]
Low-bit Quantization of Neural Networks for Efficient Inference
Y . Choukroun, E. Kravchik, F. Yang, and P. Kisilev, “Low-bit Quan- tization of Neural Networks for Efficient Inference,” Mar. 2019. doi:10.48550/arXiv.1902.06822
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.06822 2019
-
[42]
Only Train Once: A One-Shot Neural Network Training And Pruning Framework,
T. Chen, B. Ji, T. Ding, B. Fang, G. Wang, Z. Zhu, L. Liang, Y . Shi, S. Yi, and X. Tu, “Only Train Once: A One-Shot Neural Network Training And Pruning Framework,” inAdvances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., 2021, pp. 19 637–19 651
work page 2021
-
[43]
SLiM: One- shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression,
M. Mozaffari, A. Yazdanbakhsh, and M. M. Dehnavi, “SLiM: One- shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression,” Aug. 2025. doi:10.48550/arXiv.2410.09615
-
[44]
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization,
P. Hu, X. Peng, H. Zhu, M. M. S. Aly, and J. Lin, “OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization,” May 2022. doi:10.48550/arXiv.2205.11141
-
[45]
Iterative clustering pruning for convolutional neural networks,
J. Chang, Y . Lu, P. Xue, Y . Xu, and Z. Wei, “Iterative clustering pruning for convolutional neural networks,”Knowledge-Based Systems, vol. 265, p. 110386, Apr. 2023. doi:10.1016/j.knosys.2023.110386
-
[46]
S. Ye, X. Feng, T. Zhang, X. Ma, S. Lin, Z. Li, K. Xu, W. Wen, S. Liu, J. Tang, M. Fardad, X. Lin, Y . Liu, and Y . Wang, “Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quanti- zation Rates using ADMM,” Mar. 2019. doi:10.48550/arXiv.1903.09769
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1903.09769 2019
-
[47]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
“[1308.3432] Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation,” https://arxiv.org/abs/1308.3432
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
M. Huh, B. Cheung, P. Agrawal, and P. Isola, “Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks,” May 2023. doi:10.48550/arXiv.2305.08842
-
[49]
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training,
Y . Hu, J. Zhu, and J. Chen, “S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training,” Dec. 2024. doi:10.48550/arXiv.2409.09099
-
[50]
Learning Multiple Layers of Features from Tiny Im- ages,
A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Im- ages,” 2009
work page 2009
-
[51]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848
-
[52]
Transforming Question Answering Datasets Into Natural Language Inference Datasets
D. Demszky, K. Guu, and P. Liang, “Transforming Question Answer- ing Datasets Into Natural Language Inference Datasets,” Sep. 2018. doi:10.48550/arXiv.1809.02922
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1809.02922 2018
-
[53]
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference,
A. Williams, N. Nangia, and S. Bowman, “A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference,” inProceedings of the 2018 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), M. Walker, H. Ji, and A. Stent, Eds. New Orleans, Louisiana: Ass...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.