arxiv: 2508.03949 · v2 · submitted 2025-08-05 · 💻 cs.SE

Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code

Md. Abdul Awal , Mrigank Rochan , Chanchal K. Roy This is my paper

Pith reviewed 2026-05-18 23:57 UTC · model grok-4.3

classification 💻 cs.SE

keywords model compressionadversarial robustnesslanguage models for codepruningquantizationknowledge distillationsoftware analyticsempirical evaluation

0 comments p. Extension

The pith

Compressing language models for code preserves task performance but sharply reduces resistance to adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how common compression methods affect the adversarial robustness of transformer models used for code. It evaluates pruned, quantized, and distilled versions of three standard code language models on three software analytics tasks. The study applies four classical adversarial attacks and measures outcomes with six metrics. Compressed models match the accuracy of their larger counterparts on clean inputs yet suffer substantially larger performance drops under attack. This points to a practical trade-off between smaller model size and security in adversarial settings.

Core claim

The central claim is that model compression techniques such as pruning, quantization, and knowledge distillation applied to language models for code produce versions that maintain comparable performance to uncompressed models on standard tasks, yet exhibit significantly reduced robustness when exposed to classical adversarial attacks. This trade-off holds across the tested models, tasks, attacks, and metrics, indicating that size reduction comes at the expense of adversarial resilience in code-related applications.

What carries the argument

Empirical evaluation comparing uncompressed and compressed variants (via pruning, quantization, and knowledge distillation) of code language models under four adversarial attacks using six performance metrics on three software analytics tasks.

If this is right

Deploying compressed code models in security-sensitive applications requires extra robustness safeguards beyond standard compression.
Compression choices must be assessed jointly on efficiency and adversarial performance rather than efficiency alone.
New compression methods should target preservation of robustness alongside size reduction.
Task-specific robustness testing becomes necessary when moving from uncompressed to compressed code models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observed trade-off could apply to language models outside the code domain if similar compression and attack protocols are used.
Post-compression fine-tuning or ensemble defenses might mitigate the robustness loss without sacrificing efficiency gains.
Different attack strengths or adaptive attacks could reveal whether the robustness drop is attack-specific or fundamental.

Load-bearing premise

The four classical adversarial attacks and six metrics are representative enough to establish a general robustness trade-off across compression strategies and code tasks.

What would settle it

An experiment showing that at least one compression strategy preserves or improves robustness scores under the same four attacks and six metrics on the same models and tasks would falsify the reported trade-off.

read the original abstract

Transformer-based language models for code have shown remarkable performance in various software analytics tasks, but their adoption is hindered by high computational costs, slow inference speeds, and substantial environmental impact. Model compression techniques such as pruning, quantization, and knowledge distillation have gained traction in addressing these challenges. However, the impact of these strategies on the robustness of compressed language models for code in adversarial scenarios remains poorly understood. Understanding how these compressed models behave under adversarial attacks is essential for their safe and effective deployment in real-world applications. To bridge this knowledge gap, we conduct a comprehensive evaluation of how common compression strategies affect the adversarial robustness of compressed models. We assess the robustness of compressed versions of three widely used language models for code across three software analytics tasks, using six evaluation metrics and four commonly used classical adversarial attacks. Our findings indicate that compressed models generally maintain comparable performance to their uncompressed counterparts. However, when subjected to adversarial attacks, compressed models exhibit significantly reduced robustness. These results reveal a trade-off between model size reduction and adversarial robustness, underscoring the need for careful consideration when deploying compressed models in security-critical software applications. Our study highlights the need for further research into compression strategies that strike a balance between computational efficiency and adversarial robustness, which is essential for deploying reliable language models for code in real-world software applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an empirical study evaluating the effects of model compression techniques (pruning, quantization, and knowledge distillation) on the adversarial robustness of transformer-based language models for code. Using three widely used code models across three software analytics tasks, six evaluation metrics, and four classical adversarial attacks, the authors report that compressed models maintain comparable task performance to their uncompressed counterparts but exhibit significantly reduced robustness under adversarial attacks, revealing a trade-off between compression and robustness that has implications for security-critical deployments.

Significance. If the central empirical findings hold after addressing validity concerns with the attacks, this work is significant for software engineering and AI security research. It fills a gap in understanding robustness implications of compression for code models and provides a broad multi-model, multi-task evaluation that can guide practitioners. The study correctly identifies the need for balanced compression strategies, though its impact depends on demonstrating that the observed robustness drop reflects genuine model vulnerabilities rather than artifacts of the attack methods.

major comments (2)

[Abstract and Section on Adversarial Attacks] The abstract and methods description of the four classical adversarial attacks provide no indication of adaptations, post-attack filtering, or checks to ensure generated examples preserve code syntax and semantics (e.g., compiler acceptance or functional equivalence). This is load-bearing for the central claim of 'significantly reduced robustness' because standard gradient-based or substitution attacks from NLP frequently yield syntactically invalid or semantically altered code; without explicit validation steps, the robustness gap could be an artifact of attacking malformed inputs rather than a compression-induced vulnerability.
[Evaluation and Results] The evaluation lacks detail on statistical testing (e.g., significance tests for the 'significantly reduced' robustness claim) or explicit baseline comparisons beyond uncompressed models. This weakens assessment of the trade-off finding across compression strategies and tasks, as noted in the low-confidence soundness assessment.

minor comments (2)

[Evaluation Metrics] Clarify the exact implementation details of the six metrics and how they align with standard practices in code model evaluation.
[Experimental Setup] Ensure all model variants, compression hyperparameters, and attack parameters are fully specified in a reproducibility section or appendix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, providing clarifications and committing to revisions that strengthen the manuscript without altering its core empirical findings.

read point-by-point responses

Referee: [Abstract and Section on Adversarial Attacks] The abstract and methods description of the four classical adversarial attacks provide no indication of adaptations, post-attack filtering, or checks to ensure generated examples preserve code syntax and semantics (e.g., compiler acceptance or functional equivalence). This is load-bearing for the central claim of 'significantly reduced robustness' because standard gradient-based or substitution attacks from NLP frequently yield syntactically invalid or semantically altered code; without explicit validation steps, the robustness gap could be an artifact of attacking malformed inputs rather than a compression-induced vulnerability.

Authors: We agree that the current description lacks sufficient detail on validation procedures, which is important for substantiating the robustness claims. Our experiments did apply code-specific adaptations of the four attacks (e.g., syntax-preserving substitutions and variable renaming that respect AST structure) along with post-attack filtering to retain only examples that compile successfully and pass functional equivalence checks via provided test suites. However, these steps were not explicitly documented in the methods. We will add a dedicated subsection to the methods describing the adaptations, filtering criteria, compiler acceptance rates, and the fraction of generated examples retained after validation. This revision will make clear that the reported robustness reductions are not artifacts of malformed inputs. revision: yes
Referee: [Evaluation and Results] The evaluation lacks detail on statistical testing (e.g., significance tests for the 'significantly reduced' robustness claim) or explicit baseline comparisons beyond uncompressed models. This weakens assessment of the trade-off finding across compression strategies and tasks, as noted in the low-confidence soundness assessment.

Authors: We concur that adding statistical tests and clearer baseline framing will improve the rigor of the trade-off analysis. We will incorporate paired statistical tests (e.g., Wilcoxon signed-rank or t-tests with multiple random seeds) on the robustness metrics to support the 'significantly reduced' statements. We will also expand the presentation of baseline comparisons by more explicitly tabulating results against the uncompressed models and, space permitting, across compression intensities. These changes directly address the soundness concerns while preserving the multi-model, multi-task scope of the study. revision: yes

Circularity Check

0 steps flagged

No significant circularity in this empirical measurement study

full rationale

The paper conducts a direct empirical evaluation by applying standard compression techniques (pruning, quantization, distillation) to three code language models, then measuring performance and robustness on three software analytics tasks using six metrics and four classical adversarial attacks. No equations, parameter fitting, derivations, or self-citation chains appear in the provided text. All claims rest on observed measurement differences rather than any reduction of outputs to inputs by construction. The study is therefore self-contained against external benchmarks, with the central robustness trade-off claim arising from experimental results rather than definitional or fitted equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study is purely empirical and introduces no mathematical axioms, free parameters, or new postulated entities.

pith-pipeline@v0.9.0 · 5770 in / 917 out tokens · 55038 ms · 2026-05-18T23:57:17.339416+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We assess the robustness of compressed versions of three widely used language models for code across three software analytics tasks, using six evaluation metrics and four commonly used classical adversarial attacks.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our findings indicate that compressed models generally maintain comparable performance to their uncompressed counterparts. However, when subjected to adversarial attacks, compressed models exhibit significantly reduced robustness.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 13 internal anchors

[1]

Advances in neural information processing systems 30 (2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

work page 2017
[2]

Queen’s School of computing TR 541(115), 64–68 (2007) 8Replication-packages 9https://app.grammarly.com/ 10https://chat.openai.com/ 27

Roy, C.K., Cordy, J.R.: A survey on software clone detection research. Queen’s School of computing TR 541(115), 64–68 (2007) 8Replication-packages 9https://app.grammarly.com/ 10https://chat.openai.com/ 27

work page 2007
[3]

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2002
[4]

GraphCodeBERT: Pre-training Code Representations with Data Flow

Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svy- atkovskiy, A., Fu, S., et al.: Graphcodebert: Pre-training code representations with data flow. arXiv:2009.08366 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2009
[5]

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., Tang, D., et al.: Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

arXiv:2103.06333 (2021)

Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.-W.: Unified pre-training for program understanding and generation. arXiv:2103.06333 (2021)

work page arXiv 2021
[7]

In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Shi, J., Yang, Z., Xu, B., Kang, H.J., Lo, D.: Compressing pre-trained mod- els of code into 3 mb. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ASE ’22. Association for Com- puting Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3551349. 3556964 . https://doi.org/10.1145/3551349.3556964

work page doi:10.1145/3551349 2023
[8]

In: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Society

Shi, J., Yang, Z., Kang, H.J., Xu, B., He, J., Lo, D.: Greening large lan- guage models of code. In: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Society. ICSE-SEIS’24, pp. 142–153. Association for Computing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/3639475.3640097 . https://doi-org.l...

work page doi:10.1145/3639475.3640097 2024
[9]

ACM Transactions on Software Engineering and Methodology (2024)

Shi, J., Yang, Z., Lo, D.: Efficient and green large language models for soft- ware engineering: Vision and the road ahead. ACM Transactions on Software Engineering and Methodology (2024)

work page 2024
[10]

Communications of the ACM 63(12), 54–63 (2020)

Schwartz, R., Dodge, J., Smith, N.A., Etzioni, O.: Green ai. Communications of the ACM 63(12), 54–63 (2020)

work page 2020
[11]

arXiv preprint arXiv:2412.13737 (2024)

d’Aloisio, G., Traini, L., Sarro, F., Di Marco, A.: On the compression of language models for code: An empirical study on codebert. arXiv preprint arXiv:2412.13737 (2024)

work page arXiv 2024
[12]

arXiv preprint arXiv:2407.04147 (2024)

Saad, M., L´ opez, J.A.H., Chen, B., Varr´ o, D., Sharma, T.: Alpine: An adaptive language-agnostic pruning method for language models for code. arXiv preprint arXiv:2407.04147 (2024)

work page arXiv 2024
[13]

Proceedings of the ACM on Software Engineering 2(FSE), 3057–3080 (2025) 28

Chen, Y., Ye, Y., Li, Z., Ma, Y., Gao, C.: Smaller but better: Self-paced knowledge distillation for lightweight yet effective lcms. Proceedings of the ACM on Software Engineering 2(FSE), 3057–3080 (2025) 28

work page 2025
[14]

In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp

Hellendoorn, V.J., Proksch, S., Gall, H.C., Bacchelli, A.: When code comple- tion fails: A case study on real-world completions. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 960–970 (2019). IEEE

work page 2019
[15]

Advances in neural information processing systems 33, 20378–20389 (2020)

Sanh, V., Wolf, T., Rush, A.: Movement pruning: Adaptive sparsity by fine- tuning. Advances in neural information processing systems 33, 20378–20389 (2020)

work page 2020
[16]

In: 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pp

Zafrir, O., Boudoukh, G., Izsak, P., Wasserblat, M.: Q8bert: Quantized 8bit bert. In: 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pp. 36–39 (2019). IEEE

work page 2019
[17]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp

Wei, X., Gonugondla, S.K., Wang, S., Ahmad, W., Ray, B., Qian, H., Li, X., Kumar, V., Wang, Z., Tian, Y., et al.: Towards greener yet powerful code genera- tion via quantization: An empirical study. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 224–236 (2023)

work page 2023
[19]

Empirical Software Engineering 21, 159–182 (2016)

Guo, Y., Sp´ ınola, R.O., Seaman, C.: Exploring the costs of technical debt management–a case study. Empirical Software Engineering 21, 159–182 (2016)

work page 2016
[20]

Cast report Charette RN (1989) Software engineering, risk analysis and management Intertext publications (2012)

McGraw-Hill Book Co, N.Y.: Cast worldwide application software quality study: summary of key findings. Cast report Charette RN (1989) Software engineering, risk analysis and management Intertext publications (2012)

work page 1989
[21]

Journal of Systems and Software 158, 110407 (2019)

Mondal, M., Roy, B., Roy, C.K., Schneider, K.A.: An empirical study on bug propagation through code cloning. Journal of Systems and Software 158, 110407 (2019)

work page 2019
[22]

In: Proceedings of the 44th ICSE, pp

Yang, Z., Shi, J., He, J., Lo, D.: Natural attack for pre-trained models of code. In: Proceedings of the 44th ICSE, pp. 1482–1493 (2022)

work page 2022
[23]

In: 31st FSE, pp

Du, X., Wen, M., Wei, Z., Wang, S., Jin, H.: An extensive study on adversarial attack against pre-trained models of code. In: 31st FSE, pp. 489–501 (2023)

work page 2023
[24]

In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp

Tian, Z., Chen, J., Jin, Z.: Code difference guided adversarial example generation for deep code models. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 850–862 (2023). IEEE

work page 2023
[25]

ACM Transactions on Software Engineering and Methodology (TOSEM) 31(3), 1–40 (2022) 29

Zhang, H., Fu, Z., Li, G., Ma, L., Zhao, Z., Yang, H., Sun, Y., Liu, Y., Jin, Z.: Towards robustness of deep program processing models—detection, estimation, and enhancement. ACM Transactions on Software Engineering and Methodology (TOSEM) 31(3), 1–40 (2022) 29

work page 2022
[26]

arXiv preprint arXiv:2109.03228 (2021)

Xu, C., Zhou, W., Ge, T., Xu, K., McAuley, J., Wei, F.: Beyond preserved accuracy: Evaluating loyalty and robustness of bert compression. arXiv preprint arXiv:2109.03228 (2021)

work page arXiv 2021
[27]

model compression, or both? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Ye, S., Xu, K., Liu, S., Cheng, H., Lambrechts, J.-H., Zhang, H., Zhou, A., Ma, K., Wang, Y., Lin, X.: Adversarial robustness vs. model compression, or both? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 111–120 (2019)

work page 2019
[28]

In: Proceedings of the AAAI Conference on AI, vol

Zhang, H., Li, Z., Li, G., Ma, L., Liu, Y., Jin, Z.: Generating adversarial examples for holding robustness of source code processing models. In: Proceedings of the AAAI Conference on AI, vol. 34, pp. 1169–1176 (2020)

work page 2020
[29]

In: 31st ACM SIGSOFT ISSTA, pp

Zeng, Z., Tan, H., Zhang, H., Li, J., Zhang, Y., Zhang, L.: An extensive study on pre-trained models for program understanding and generation. In: 31st ACM SIGSOFT ISSTA, pp. 39–51 (2022)

work page 2022
[30]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Xu, C., McAuley, J.: A survey on model compression and acceleration for pre- trained language models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 10566–10575 (2023)

work page 2023
[32]

Transactions of the Association for Computational Linguistics 12, 1556–1577 (2024)

Zhu, X., Li, J., Liu, Y., Ma, C., Wang, W.: A survey on model compression for large language models. Transactions of the Association for Computational Linguistics 12, 1556–1577 (2024)

work page 2024
[33]

In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp

Casta˜ no, J., Mart´ ınez-Fern´ andez, S., Franch, X., Bogner, J.: Exploring the car- bon footprint of hugging face’s ml models: A repository mining study. In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–12 (2023). IEEE

work page 2023
[34]

In: 2023 ACM/IEEE Interna- tional Symposium on Empirical Software Engineering and Measurement (ESEM), pp

Hort, M., Grishina, A., Moonen, L.: An exploratory literature study on sharing and energy use of language models for source code. In: 2023 ACM/IEEE Interna- tional Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–12 (2023). IEEE

work page 2023
[35]

arXiv preprint arXiv:2402.09748 (2024)

Wang, W., Chen, W., Luo, Y., Long, Y., Lin, Z., Zhang, L., Lin, B., Cai, D., He, X.: Model compression and efficient inference for large language models: A survey. arXiv preprint arXiv:2402.09748 (2024)

work page arXiv 2024
[36]

A Survey on Knowledge Distillation of Large Language Models

Xu, X., Li, M., Tao, C., Shen, T., Cheng, R., Li, J., Xu, C., Tao, D., Zhou, T.: A survey on knowledge distillation of large language models. arXiv preprint arXiv:2402.13116 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

arXiv preprint arXiv:2401.08092 (2024)

Xu, M., Yin, W., Cai, D., Yi, R., Xu, D., Wang, Q., Wu, B., Zhao, Y., Yang, 30 C., Wang, S., et al.: A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv:2401.08092 (2024)

work page arXiv 2024
[38]

Intriguing properties of neural networks

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[39]

In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp

Svajlenko, J., Islam, J.F., Keivanloo, I., Roy, C.K., Mia, M.M.: Towards a big data curated benchmark of inter-project code clones. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 476–480 (2014). IEEE

work page 2014
[40]

In: 2020 IEEE 27th Interna- tional Conference on Software Analysis, Evolution and Reengineering (SANER), pp

Wang, W., Li, G., Ma, B., Xia, X., Jin, Z.: Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: 2020 IEEE 27th Interna- tional Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 261–271 (2020). IEEE

work page 2020
[41]

In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp

Hough, K., Welearegai, G., Hammer, C., Bell, J.: Revealing injection vulner- abilities by leveraging existing tests. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 284–296 (2020)

work page 2020
[42]

ACM Transactions on Software Engineering and Methodology 32(1), 1–45 (2023)

Sayar, I., Bartel, A., Bodden, E., Le Traon, Y.: An in-depth study of java deseri- alization remote-code execution exploits and vulnerabilities. ACM Transactions on Software Engineering and Methodology 32(1), 1–45 (2023)

work page 2023
[43]

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearch- net challenge: Evaluating the state of semantic code search. arXiv:1909.09436 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1909
[45]

In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp

Ahmed, T., Pai, K.S., Devanbu, P., Barr, E.: Automatic semantic augmentation of language model prompts (for code summarization). In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–13 (2024)

work page 2024
[46]

Advances in neural information processing systems 2 (1989)

LeCun, Y., Denker, J., Solla, S.: Optimal brain damage. Advances in neural information processing systems 2 (1989)

work page 1989
[47]

IEEE transactions on information theory 44(6), 2325–2383 (1998)

Gray, R.M., Neuhoff, D.L.: Quantization. IEEE transactions on information theory 44(6), 2325–2383 (1998)

work page 1998
[48]

In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp

Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low- rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6655–6659 (2013). IEEE 31

work page 2013
[49]

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1909
[50]

The annals of mathematical statistics 22(1), 79–86 (1951)

Kullback, S., Leibler, R.A.: On information and sufficiency. The annals of mathematical statistics 22(1), 79–86 (1951)

work page 1951
[51]

arXiv preprint arXiv:2002.08307 (2020)

Gordon, M.A., Duh, K., Andrews, N.: Compressing bert: Studying the effects of weight pruning on transfer learning. arXiv preprint arXiv:2002.08307 (2020)

work page arXiv 2002
[52]

arXiv preprint arXiv:2505.19433 (2025)

Dong, P., Tang, Z., Liu, X., Li, L., Chu, X., Li, B.: Can compressed llms truly act? an empirical evaluation of agentic capabilities in llm compression. arXiv preprint arXiv:2505.19433 (2025)

work page arXiv 2025
[53]

In: ICML, pp

Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: ICML, pp. 2137–2146 (2018). PMLR

work page 2018
[54]

HotFlip: White-Box Adversarial Examples for Text Classification

Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: Hotflip: White-box adversarial examples for text classification. arXiv arXiv:1712.06751 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[55]

Awal, M.A., Rochan, M., Roy, C.K.: Large language models as robust data gen- erators in software analytics: Are we there yet? arXiv preprint arXiv:2411.10565 (2024)

work page arXiv 2024
[56]

Journal of the american statistical association 32(200), 675–701 (1937)

Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association 32(200), 675–701 (1937)

work page 1937
[57]

Wiley encyclopedia of clinical trials, 1–3 (2007)

Woolson, R.F.: Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials, 1–3 (2007)

work page 2007
[58]

arXiv preprint arXiv:2110.08419 (2021)

Du, M., Mukherjee, S., Cheng, Y., Shokouhi, M., Hu, X., Awadallah, A.H.: Robustness challenges in model distillation and pruning for natural language understanding. arXiv preprint arXiv:2110.08419 (2021)

work page arXiv 2021
[59]

In: 2024 IEEE 10th Interna- tional Conference on Edge Computing and Scalable Cloud (EdgeCom), pp

Gourtani, S.K., Meratnia, N.: Improving robustness of compressed models with weight sharing through knowledge distillation. In: 2024 IEEE 10th Interna- tional Conference on Edge Computing and Scalable Cloud (EdgeCom), pp. 13–21 (2024). IEEE

work page 2024
[60]

In: Proceed- ings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp

Zhu, J., Wang, L., Han, X.: Safety and performance, why not both? bi-objective optimized model compression toward ai software deployment. In: Proceed- ings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–13 (2022)

work page 2022
[61]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Goldblum, M., Fowl, L., Feizi, S., Goldstein, T.: Adversarially robust distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 32 3996–4003 (2020)

work page 2020
[62]

ACM Computing Surveys 57(5), 1–39 (2025)

Xu, M., Cai, D., Yin, W., Wang, S., Jin, X., Liu, X.: Resource-efficient algorithms and systems of foundation models: A survey. ACM Computing Surveys 57(5), 1–39 (2025)

work page 2025
[63]

arXiv preprint arXiv:2111.05193 (2021)

Xu, J., Zhou, W., Fu, Z., Zhou, H., Li, L.: A survey on green deep learning. arXiv preprint arXiv:2111.05193 (2021)

work page arXiv 2021
[64]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Sanh, V.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv:1910.01108 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910
[65]

arXiv preprint arXiv:1908.09355 (2019)

Sun, S., Cheng, Y., Gan, Z., Liu, J.: Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355 (2019)

work page arXiv 1908
[66]

arXiv preprint arXiv:1909.10351 (2019)

Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., Liu, Q.: Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)

work page arXiv 1909
[67]

In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp

Buciluˇ a, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)

work page 2006
[68]

arXiv preprint arXiv:2305.12870 (2023)

Jiang, Y., Chan, C., Chen, M., Wang, W.: Lion: Adversarial distillation of proprietary large language models. arXiv preprint arXiv:2305.12870 (2023)

work page arXiv 2023
[69]

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

Zhang, T., Ye, S., Zhang, K., Ma, X., Liu, N., Zhang, L., Tang, J., Ma, K., Lin, X., Fardad, M., et al.: Structadmm: A systematic, high-efficiency framework of structured weight pruning for dnns. arXiv preprint arXiv:1807.11091 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[70]

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., Lin, J.: Distilling task- specific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903
[71]

arXiv preprint arXiv:2002.02925 (2020)

Xu, C., Zhou, W., Ge, T., Wei, F., Zhou, M.: Bert-of-theseus: Compressing bert by progressive module replacing. arXiv preprint arXiv:2002.02925 (2020)

work page arXiv 2002
[72]

arXiv preprint arXiv:1909.11556 (2019)

Fan, A., Grave, E., Joulin, A.: Reducing transformer depth on demand with structured dropout. arXiv preprint arXiv:1909.11556 (2019)

work page arXiv 1909
[73]

Michel, P., Levy, O., Neubig, G.: Are sixteen heads really better than one? Advances in neural information processing systems 32 (2019)

work page 2019
[74]

In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp

Sun, Z., Du, X., Song, F., Wang, S., Li, L.: When neural code completion models size up the situation: Attaining cheaper and faster completion through dynamic model inference. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–12 (2024) 33

work page 2024
[75]

In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp

Zhang, Z., Zhang, H., Shen, B., Gu, X.: Diet code is healthy: Simplifying programs for pre-trained models of code. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1073–1084 (2022)

work page 2022
[76]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Dong, J., Koniusz, P., Chen, J., Wang, Z.J., Ong, Y.-S.: Robust distillation via untargeted and targeted intermediate adversarial samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 28432– 28442 (2024)

work page 2024
[77]

IEEE Transactions on Information Forensics and Security (2023)

Bai, T., Zhao, J., Wen, B.: Guided adversarial contrastive distillation for robust students. IEEE Transactions on Information Forensics and Security (2023)

work page 2023
[78]

Advances in Neural Information Processing Systems 36, 10796–10813 (2023) 34

Kuang, H., Liu, H., Wu, Y., Satoh, S., Ji, R.: Improving adversarial robustness via information bottleneck distillation. Advances in Neural Information Processing Systems 36, 10796–10813 (2023) 34

work page 2023