pith. the verified trust layer for science. sign in

arxiv: 2508.03949 · v2 · submitted 2025-08-05 · 💻 cs.SE

Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code

Pith reviewed 2026-05-18 23:57 UTC · model grok-4.3

classification 💻 cs.SE
keywords model compressionadversarial robustnesslanguage models for codepruningquantizationknowledge distillationsoftware analyticsempirical evaluation
0
0 comments X p. Extension

The pith

Compressing language models for code preserves task performance but sharply reduces resistance to adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how common compression methods affect the adversarial robustness of transformer models used for code. It evaluates pruned, quantized, and distilled versions of three standard code language models on three software analytics tasks. The study applies four classical adversarial attacks and measures outcomes with six metrics. Compressed models match the accuracy of their larger counterparts on clean inputs yet suffer substantially larger performance drops under attack. This points to a practical trade-off between smaller model size and security in adversarial settings.

Core claim

The central claim is that model compression techniques such as pruning, quantization, and knowledge distillation applied to language models for code produce versions that maintain comparable performance to uncompressed models on standard tasks, yet exhibit significantly reduced robustness when exposed to classical adversarial attacks. This trade-off holds across the tested models, tasks, attacks, and metrics, indicating that size reduction comes at the expense of adversarial resilience in code-related applications.

What carries the argument

Empirical evaluation comparing uncompressed and compressed variants (via pruning, quantization, and knowledge distillation) of code language models under four adversarial attacks using six performance metrics on three software analytics tasks.

If this is right

  • Deploying compressed code models in security-sensitive applications requires extra robustness safeguards beyond standard compression.
  • Compression choices must be assessed jointly on efficiency and adversarial performance rather than efficiency alone.
  • New compression methods should target preservation of robustness alongside size reduction.
  • Task-specific robustness testing becomes necessary when moving from uncompressed to compressed code models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed trade-off could apply to language models outside the code domain if similar compression and attack protocols are used.
  • Post-compression fine-tuning or ensemble defenses might mitigate the robustness loss without sacrificing efficiency gains.
  • Different attack strengths or adaptive attacks could reveal whether the robustness drop is attack-specific or fundamental.

Load-bearing premise

The four classical adversarial attacks and six metrics are representative enough to establish a general robustness trade-off across compression strategies and code tasks.

What would settle it

An experiment showing that at least one compression strategy preserves or improves robustness scores under the same four attacks and six metrics on the same models and tasks would falsify the reported trade-off.

read the original abstract

Transformer-based language models for code have shown remarkable performance in various software analytics tasks, but their adoption is hindered by high computational costs, slow inference speeds, and substantial environmental impact. Model compression techniques such as pruning, quantization, and knowledge distillation have gained traction in addressing these challenges. However, the impact of these strategies on the robustness of compressed language models for code in adversarial scenarios remains poorly understood. Understanding how these compressed models behave under adversarial attacks is essential for their safe and effective deployment in real-world applications. To bridge this knowledge gap, we conduct a comprehensive evaluation of how common compression strategies affect the adversarial robustness of compressed models. We assess the robustness of compressed versions of three widely used language models for code across three software analytics tasks, using six evaluation metrics and four commonly used classical adversarial attacks. Our findings indicate that compressed models generally maintain comparable performance to their uncompressed counterparts. However, when subjected to adversarial attacks, compressed models exhibit significantly reduced robustness. These results reveal a trade-off between model size reduction and adversarial robustness, underscoring the need for careful consideration when deploying compressed models in security-critical software applications. Our study highlights the need for further research into compression strategies that strike a balance between computational efficiency and adversarial robustness, which is essential for deploying reliable language models for code in real-world software applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an empirical study evaluating the effects of model compression techniques (pruning, quantization, and knowledge distillation) on the adversarial robustness of transformer-based language models for code. Using three widely used code models across three software analytics tasks, six evaluation metrics, and four classical adversarial attacks, the authors report that compressed models maintain comparable task performance to their uncompressed counterparts but exhibit significantly reduced robustness under adversarial attacks, revealing a trade-off between compression and robustness that has implications for security-critical deployments.

Significance. If the central empirical findings hold after addressing validity concerns with the attacks, this work is significant for software engineering and AI security research. It fills a gap in understanding robustness implications of compression for code models and provides a broad multi-model, multi-task evaluation that can guide practitioners. The study correctly identifies the need for balanced compression strategies, though its impact depends on demonstrating that the observed robustness drop reflects genuine model vulnerabilities rather than artifacts of the attack methods.

major comments (2)
  1. [Abstract and Section on Adversarial Attacks] The abstract and methods description of the four classical adversarial attacks provide no indication of adaptations, post-attack filtering, or checks to ensure generated examples preserve code syntax and semantics (e.g., compiler acceptance or functional equivalence). This is load-bearing for the central claim of 'significantly reduced robustness' because standard gradient-based or substitution attacks from NLP frequently yield syntactically invalid or semantically altered code; without explicit validation steps, the robustness gap could be an artifact of attacking malformed inputs rather than a compression-induced vulnerability.
  2. [Evaluation and Results] The evaluation lacks detail on statistical testing (e.g., significance tests for the 'significantly reduced' robustness claim) or explicit baseline comparisons beyond uncompressed models. This weakens assessment of the trade-off finding across compression strategies and tasks, as noted in the low-confidence soundness assessment.
minor comments (2)
  1. [Evaluation Metrics] Clarify the exact implementation details of the six metrics and how they align with standard practices in code model evaluation.
  2. [Experimental Setup] Ensure all model variants, compression hyperparameters, and attack parameters are fully specified in a reproducibility section or appendix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, providing clarifications and committing to revisions that strengthen the manuscript without altering its core empirical findings.

read point-by-point responses
  1. Referee: [Abstract and Section on Adversarial Attacks] The abstract and methods description of the four classical adversarial attacks provide no indication of adaptations, post-attack filtering, or checks to ensure generated examples preserve code syntax and semantics (e.g., compiler acceptance or functional equivalence). This is load-bearing for the central claim of 'significantly reduced robustness' because standard gradient-based or substitution attacks from NLP frequently yield syntactically invalid or semantically altered code; without explicit validation steps, the robustness gap could be an artifact of attacking malformed inputs rather than a compression-induced vulnerability.

    Authors: We agree that the current description lacks sufficient detail on validation procedures, which is important for substantiating the robustness claims. Our experiments did apply code-specific adaptations of the four attacks (e.g., syntax-preserving substitutions and variable renaming that respect AST structure) along with post-attack filtering to retain only examples that compile successfully and pass functional equivalence checks via provided test suites. However, these steps were not explicitly documented in the methods. We will add a dedicated subsection to the methods describing the adaptations, filtering criteria, compiler acceptance rates, and the fraction of generated examples retained after validation. This revision will make clear that the reported robustness reductions are not artifacts of malformed inputs. revision: yes

  2. Referee: [Evaluation and Results] The evaluation lacks detail on statistical testing (e.g., significance tests for the 'significantly reduced' robustness claim) or explicit baseline comparisons beyond uncompressed models. This weakens assessment of the trade-off finding across compression strategies and tasks, as noted in the low-confidence soundness assessment.

    Authors: We concur that adding statistical tests and clearer baseline framing will improve the rigor of the trade-off analysis. We will incorporate paired statistical tests (e.g., Wilcoxon signed-rank or t-tests with multiple random seeds) on the robustness metrics to support the 'significantly reduced' statements. We will also expand the presentation of baseline comparisons by more explicitly tabulating results against the uncompressed models and, space permitting, across compression intensities. These changes directly address the soundness concerns while preserving the multi-model, multi-task scope of the study. revision: yes

Circularity Check

0 steps flagged

No significant circularity in this empirical measurement study

full rationale

The paper conducts a direct empirical evaluation by applying standard compression techniques (pruning, quantization, distillation) to three code language models, then measuring performance and robustness on three software analytics tasks using six metrics and four classical adversarial attacks. No equations, parameter fitting, derivations, or self-citation chains appear in the provided text. All claims rest on observed measurement differences rather than any reduction of outputs to inputs by construction. The study is therefore self-contained against external benchmarks, with the central robustness trade-off claim arising from experimental results rather than definitional or fitted equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study is purely empirical and introduces no mathematical axioms, free parameters, or new postulated entities.

pith-pipeline@v0.9.0 · 5770 in / 917 out tokens · 55038 ms · 2026-05-18T23:57:17.339416+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 13 internal anchors

  1. [1]

    Advances in neural information processing systems 30 (2017)

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

  2. [2]

    Queen’s School of computing TR 541(115), 64–68 (2007) 8Replication-packages 9https://app.grammarly.com/ 10https://chat.openai.com/ 27

    Roy, C.K., Cordy, J.R.: A survey on software clone detection research. Queen’s School of computing TR 541(115), 64–68 (2007) 8Replication-packages 9https://app.grammarly.com/ 10https://chat.openai.com/ 27

  3. [3]

    CodeBERT: A Pre-Trained Model for Programming and Natural Languages

    Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155 (2020)

  4. [4]

    GraphCodeBERT: Pre-training Code Representations with Data Flow

    Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svy- atkovskiy, A., Fu, S., et al.: Graphcodebert: Pre-training code representations with data flow. arXiv:2009.08366 (2020)

  5. [5]

    CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

    Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., Tang, D., et al.: Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664 (2021)

  6. [6]

    arXiv:2103.06333 (2021)

    Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.-W.: Unified pre-training for program understanding and generation. arXiv:2103.06333 (2021)

  7. [7]

    In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

    Shi, J., Yang, Z., Xu, B., Kang, H.J., Lo, D.: Compressing pre-trained mod- els of code into 3 mb. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ASE ’22. Association for Com- puting Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3551349. 3556964 . https://doi.org/10.1145/3551349.3556964

  8. [8]

    In: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Society

    Shi, J., Yang, Z., Kang, H.J., Xu, B., He, J., Lo, D.: Greening large lan- guage models of code. In: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Society. ICSE-SEIS’24, pp. 142–153. Association for Computing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/3639475.3640097 . https://doi-org.l...

  9. [9]

    ACM Transactions on Software Engineering and Methodology (2024)

    Shi, J., Yang, Z., Lo, D.: Efficient and green large language models for soft- ware engineering: Vision and the road ahead. ACM Transactions on Software Engineering and Methodology (2024)

  10. [10]

    Communications of the ACM 63(12), 54–63 (2020)

    Schwartz, R., Dodge, J., Smith, N.A., Etzioni, O.: Green ai. Communications of the ACM 63(12), 54–63 (2020)

  11. [11]

    arXiv preprint arXiv:2412.13737 (2024)

    d’Aloisio, G., Traini, L., Sarro, F., Di Marco, A.: On the compression of language models for code: An empirical study on codebert. arXiv preprint arXiv:2412.13737 (2024)

  12. [12]

    arXiv preprint arXiv:2407.04147 (2024)

    Saad, M., L´ opez, J.A.H., Chen, B., Varr´ o, D., Sharma, T.: Alpine: An adaptive language-agnostic pruning method for language models for code. arXiv preprint arXiv:2407.04147 (2024)

  13. [13]

    Proceedings of the ACM on Software Engineering 2(FSE), 3057–3080 (2025) 28

    Chen, Y., Ye, Y., Li, Z., Ma, Y., Gao, C.: Smaller but better: Self-paced knowledge distillation for lightweight yet effective lcms. Proceedings of the ACM on Software Engineering 2(FSE), 3057–3080 (2025) 28

  14. [14]

    In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp

    Hellendoorn, V.J., Proksch, S., Gall, H.C., Bacchelli, A.: When code comple- tion fails: A case study on real-world completions. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 960–970 (2019). IEEE

  15. [15]

    Advances in neural information processing systems 33, 20378–20389 (2020)

    Sanh, V., Wolf, T., Rush, A.: Movement pruning: Adaptive sparsity by fine- tuning. Advances in neural information processing systems 33, 20378–20389 (2020)

  16. [16]

    In: 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pp

    Zafrir, O., Boudoukh, G., Izsak, P., Wasserblat, M.: Q8bert: Quantized 8bit bert. In: 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pp. 36–39 (2019). IEEE

  17. [17]

    Distilling the Knowledge in a Neural Network

    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  18. [18]

    In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp

    Wei, X., Gonugondla, S.K., Wang, S., Ahmad, W., Ray, B., Qian, H., Li, X., Kumar, V., Wang, Z., Tian, Y., et al.: Towards greener yet powerful code genera- tion via quantization: An empirical study. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 224–236 (2023)

  19. [19]

    Empirical Software Engineering 21, 159–182 (2016)

    Guo, Y., Sp´ ınola, R.O., Seaman, C.: Exploring the costs of technical debt management–a case study. Empirical Software Engineering 21, 159–182 (2016)

  20. [20]

    Cast report Charette RN (1989) Software engineering, risk analysis and management Intertext publications (2012)

    McGraw-Hill Book Co, N.Y.: Cast worldwide application software quality study: summary of key findings. Cast report Charette RN (1989) Software engineering, risk analysis and management Intertext publications (2012)

  21. [21]

    Journal of Systems and Software 158, 110407 (2019)

    Mondal, M., Roy, B., Roy, C.K., Schneider, K.A.: An empirical study on bug propagation through code cloning. Journal of Systems and Software 158, 110407 (2019)

  22. [22]

    In: Proceedings of the 44th ICSE, pp

    Yang, Z., Shi, J., He, J., Lo, D.: Natural attack for pre-trained models of code. In: Proceedings of the 44th ICSE, pp. 1482–1493 (2022)

  23. [23]

    In: 31st FSE, pp

    Du, X., Wen, M., Wei, Z., Wang, S., Jin, H.: An extensive study on adversarial attack against pre-trained models of code. In: 31st FSE, pp. 489–501 (2023)

  24. [24]

    In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp

    Tian, Z., Chen, J., Jin, Z.: Code difference guided adversarial example generation for deep code models. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 850–862 (2023). IEEE

  25. [25]

    ACM Transactions on Software Engineering and Methodology (TOSEM) 31(3), 1–40 (2022) 29

    Zhang, H., Fu, Z., Li, G., Ma, L., Zhao, Z., Yang, H., Sun, Y., Liu, Y., Jin, Z.: Towards robustness of deep program processing models—detection, estimation, and enhancement. ACM Transactions on Software Engineering and Methodology (TOSEM) 31(3), 1–40 (2022) 29

  26. [26]

    arXiv preprint arXiv:2109.03228 (2021)

    Xu, C., Zhou, W., Ge, T., Xu, K., McAuley, J., Wei, F.: Beyond preserved accuracy: Evaluating loyalty and robustness of bert compression. arXiv preprint arXiv:2109.03228 (2021)

  27. [27]

    model compression, or both? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Ye, S., Xu, K., Liu, S., Cheng, H., Lambrechts, J.-H., Zhang, H., Zhou, A., Ma, K., Wang, Y., Lin, X.: Adversarial robustness vs. model compression, or both? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 111–120 (2019)

  28. [28]

    In: Proceedings of the AAAI Conference on AI, vol

    Zhang, H., Li, Z., Li, G., Ma, L., Liu, Y., Jin, Z.: Generating adversarial examples for holding robustness of source code processing models. In: Proceedings of the AAAI Conference on AI, vol. 34, pp. 1169–1176 (2020)

  29. [29]

    In: 31st ACM SIGSOFT ISSTA, pp

    Zeng, Z., Tan, H., Zhang, H., Li, J., Zhang, Y., Zhang, L.: An extensive study on pre-trained models for program understanding and generation. In: 31st ACM SIGSOFT ISSTA, pp. 39–51 (2022)

  30. [30]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

  31. [31]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Xu, C., McAuley, J.: A survey on model compression and acceleration for pre- trained language models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 10566–10575 (2023)

  32. [32]

    Transactions of the Association for Computational Linguistics 12, 1556–1577 (2024)

    Zhu, X., Li, J., Liu, Y., Ma, C., Wang, W.: A survey on model compression for large language models. Transactions of the Association for Computational Linguistics 12, 1556–1577 (2024)

  33. [33]

    In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp

    Casta˜ no, J., Mart´ ınez-Fern´ andez, S., Franch, X., Bogner, J.: Exploring the car- bon footprint of hugging face’s ml models: A repository mining study. In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–12 (2023). IEEE

  34. [34]

    In: 2023 ACM/IEEE Interna- tional Symposium on Empirical Software Engineering and Measurement (ESEM), pp

    Hort, M., Grishina, A., Moonen, L.: An exploratory literature study on sharing and energy use of language models for source code. In: 2023 ACM/IEEE Interna- tional Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–12 (2023). IEEE

  35. [35]

    arXiv preprint arXiv:2402.09748 (2024)

    Wang, W., Chen, W., Luo, Y., Long, Y., Lin, Z., Zhang, L., Lin, B., Cai, D., He, X.: Model compression and efficient inference for large language models: A survey. arXiv preprint arXiv:2402.09748 (2024)

  36. [36]

    A Survey on Knowledge Distillation of Large Language Models

    Xu, X., Li, M., Tao, C., Shen, T., Cheng, R., Li, J., Xu, C., Tao, D., Zhou, T.: A survey on knowledge distillation of large language models. arXiv preprint arXiv:2402.13116 (2024)

  37. [37]

    arXiv preprint arXiv:2401.08092 (2024)

    Xu, M., Yin, W., Cai, D., Yi, R., Xu, D., Wang, Q., Wu, B., Zhao, Y., Yang, 30 C., Wang, S., et al.: A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv:2401.08092 (2024)

  38. [38]

    Intriguing properties of neural networks

    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)

  39. [39]

    In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp

    Svajlenko, J., Islam, J.F., Keivanloo, I., Roy, C.K., Mia, M.M.: Towards a big data curated benchmark of inter-project code clones. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 476–480 (2014). IEEE

  40. [40]

    In: 2020 IEEE 27th Interna- tional Conference on Software Analysis, Evolution and Reengineering (SANER), pp

    Wang, W., Li, G., Ma, B., Xia, X., Jin, Z.: Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: 2020 IEEE 27th Interna- tional Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 261–271 (2020). IEEE

  41. [41]

    In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp

    Hough, K., Welearegai, G., Hammer, C., Bell, J.: Revealing injection vulner- abilities by leveraging existing tests. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 284–296 (2020)

  42. [42]

    ACM Transactions on Software Engineering and Methodology 32(1), 1–45 (2023)

    Sayar, I., Bartel, A., Bodden, E., Le Traon, Y.: An in-depth study of java deseri- alization remote-code execution exploits and vulnerabilities. ACM Transactions on Software Engineering and Methodology 32(1), 1–45 (2023)

  43. [43]

    CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

    Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearch- net challenge: Evaluating the state of semantic code search. arXiv:1909.09436 (2019)

  44. [45]

    In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp

    Ahmed, T., Pai, K.S., Devanbu, P., Barr, E.: Automatic semantic augmentation of language model prompts (for code summarization). In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–13 (2024)

  45. [46]

    Advances in neural information processing systems 2 (1989)

    LeCun, Y., Denker, J., Solla, S.: Optimal brain damage. Advances in neural information processing systems 2 (1989)

  46. [47]

    IEEE transactions on information theory 44(6), 2325–2383 (1998)

    Gray, R.M., Neuhoff, D.L.: Quantization. IEEE transactions on information theory 44(6), 2325–2383 (1998)

  47. [48]

    In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp

    Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low- rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6655–6659 (2013). IEEE 31

  48. [49]

    ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

    Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  49. [50]

    The annals of mathematical statistics 22(1), 79–86 (1951)

    Kullback, S., Leibler, R.A.: On information and sufficiency. The annals of mathematical statistics 22(1), 79–86 (1951)

  50. [51]

    arXiv preprint arXiv:2002.08307 (2020)

    Gordon, M.A., Duh, K., Andrews, N.: Compressing bert: Studying the effects of weight pruning on transfer learning. arXiv preprint arXiv:2002.08307 (2020)

  51. [52]

    arXiv preprint arXiv:2505.19433 (2025)

    Dong, P., Tang, Z., Liu, X., Li, L., Chu, X., Li, B.: Can compressed llms truly act? an empirical evaluation of agentic capabilities in llm compression. arXiv preprint arXiv:2505.19433 (2025)

  52. [53]

    In: ICML, pp

    Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: ICML, pp. 2137–2146 (2018). PMLR

  53. [54]

    HotFlip: White-Box Adversarial Examples for Text Classification

    Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: Hotflip: White-box adversarial examples for text classification. arXiv arXiv:1712.06751 (2017)

  54. [55]

    Awal, M.A., Rochan, M., Roy, C.K.: Large language models as robust data gen- erators in software analytics: Are we there yet? arXiv preprint arXiv:2411.10565 (2024)

  55. [56]

    Journal of the american statistical association 32(200), 675–701 (1937)

    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association 32(200), 675–701 (1937)

  56. [57]

    Wiley encyclopedia of clinical trials, 1–3 (2007)

    Woolson, R.F.: Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials, 1–3 (2007)

  57. [58]

    arXiv preprint arXiv:2110.08419 (2021)

    Du, M., Mukherjee, S., Cheng, Y., Shokouhi, M., Hu, X., Awadallah, A.H.: Robustness challenges in model distillation and pruning for natural language understanding. arXiv preprint arXiv:2110.08419 (2021)

  58. [59]

    In: 2024 IEEE 10th Interna- tional Conference on Edge Computing and Scalable Cloud (EdgeCom), pp

    Gourtani, S.K., Meratnia, N.: Improving robustness of compressed models with weight sharing through knowledge distillation. In: 2024 IEEE 10th Interna- tional Conference on Edge Computing and Scalable Cloud (EdgeCom), pp. 13–21 (2024). IEEE

  59. [60]

    In: Proceed- ings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp

    Zhu, J., Wang, L., Han, X.: Safety and performance, why not both? bi-objective optimized model compression toward ai software deployment. In: Proceed- ings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–13 (2022)

  60. [61]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Goldblum, M., Fowl, L., Feizi, S., Goldstein, T.: Adversarially robust distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 32 3996–4003 (2020)

  61. [62]

    ACM Computing Surveys 57(5), 1–39 (2025)

    Xu, M., Cai, D., Yin, W., Wang, S., Jin, X., Liu, X.: Resource-efficient algorithms and systems of foundation models: A survey. ACM Computing Surveys 57(5), 1–39 (2025)

  62. [63]

    arXiv preprint arXiv:2111.05193 (2021)

    Xu, J., Zhou, W., Fu, Z., Zhou, H., Li, L.: A survey on green deep learning. arXiv preprint arXiv:2111.05193 (2021)

  63. [64]

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

    Sanh, V.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv:1910.01108 (2019)

  64. [65]

    arXiv preprint arXiv:1908.09355 (2019)

    Sun, S., Cheng, Y., Gan, Z., Liu, J.: Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355 (2019)

  65. [66]

    arXiv preprint arXiv:1909.10351 (2019)

    Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., Liu, Q.: Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)

  66. [67]

    In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp

    Buciluˇ a, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)

  67. [68]

    arXiv preprint arXiv:2305.12870 (2023)

    Jiang, Y., Chan, C., Chen, M., Wang, W.: Lion: Adversarial distillation of proprietary large language models. arXiv preprint arXiv:2305.12870 (2023)

  68. [69]

    StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

    Zhang, T., Ye, S., Zhang, K., Ma, X., Liu, N., Zhang, L., Tang, J., Ma, K., Lin, X., Fardad, M., et al.: Structadmm: A systematic, high-efficiency framework of structured weight pruning for dnns. arXiv preprint arXiv:1807.11091 (2018)

  69. [70]

    Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

    Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., Lin, J.: Distilling task- specific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136 (2019)

  70. [71]

    arXiv preprint arXiv:2002.02925 (2020)

    Xu, C., Zhou, W., Ge, T., Wei, F., Zhou, M.: Bert-of-theseus: Compressing bert by progressive module replacing. arXiv preprint arXiv:2002.02925 (2020)

  71. [72]

    arXiv preprint arXiv:1909.11556 (2019)

    Fan, A., Grave, E., Joulin, A.: Reducing transformer depth on demand with structured dropout. arXiv preprint arXiv:1909.11556 (2019)

  72. [73]

    Michel, P., Levy, O., Neubig, G.: Are sixteen heads really better than one? Advances in neural information processing systems 32 (2019)

  73. [74]

    In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp

    Sun, Z., Du, X., Song, F., Wang, S., Li, L.: When neural code completion models size up the situation: Attaining cheaper and faster completion through dynamic model inference. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–12 (2024) 33

  74. [75]

    In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp

    Zhang, Z., Zhang, H., Shen, B., Gu, X.: Diet code is healthy: Simplifying programs for pre-trained models of code. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1073–1084 (2022)

  75. [76]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Dong, J., Koniusz, P., Chen, J., Wang, Z.J., Ong, Y.-S.: Robust distillation via untargeted and targeted intermediate adversarial samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 28432– 28442 (2024)

  76. [77]

    IEEE Transactions on Information Forensics and Security (2023)

    Bai, T., Zhao, J., Wen, B.: Guided adversarial contrastive distillation for robust students. IEEE Transactions on Information Forensics and Security (2023)

  77. [78]

    Advances in Neural Information Processing Systems 36, 10796–10813 (2023) 34

    Kuang, H., Liu, H., Wu, Y., Satoh, S., Ji, R.: Improving adversarial robustness via information bottleneck distillation. Advances in Neural Information Processing Systems 36, 10796–10813 (2023) 34