pith. machine review for the scientific record. sign in

arxiv: 2604.25903 · v1 · submitted 2026-04-28 · 💻 cs.SE · cs.LG

Recognition: unknown

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:48 UTC · model grok-4.3

classification 💻 cs.SE cs.LG
keywords LLM compressioncarbon efficiencysoftware engineeringcode clone detectioncode summarizationcode generationmodel optimizationgreen AI
0
0 comments X

The pith

Ordering compression techniques by a carbon-tax principle yields up to 49x memory reduction in language models for software engineering tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Carbon-Taxed Transformers as a pipeline that orders compression steps according to an economic carbon-tax analogy to penalize inefficient model structures and reward deployment-ready ones. It applies this across encoder-only, encoder-decoder, and decoder-only architectures on code clone detection, summarization, and generation. A sympathetic reader would care because the approach delivers large cuts in memory, runtime, and emissions while retaining most accuracy, addressing the environmental and scalability barriers to using large models in software engineering. Two ablation studies confirm that the specific ordering and component choices drive the gains rather than any single technique alone.

Core claim

CTT operationalizes a computational carbon tax to order compression techniques, producing up to 49x memory reduction, 8-10x faster inference on clone detection, 3x on summarization, 4-7x on generation, up to 81% lower CO2 emissions, and accuracy retention of around 98% on clone detection, 89% on summarization, and up to 91% textual metrics with 68% pass@1 on generation.

What carries the argument

The carbon-tax ordering principle in the CTT pipeline, which systematically sequences compression steps to penalize architectural inefficiencies and reward efficient ones across model types.

If this is right

  • Large language models become practical to deploy on modest hardware for clone detection, summarization, and generation in software engineering.
  • The carbon emissions associated with running these models drop substantially, supporting more sustainable AI use in the field.
  • The same pipeline ordering applies to encoder-only, encoder-decoder, and decoder-only architectures without architecture-specific redesign.
  • Accuracy remains high enough on standard benchmarks to support real-world SE applications after compression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The carbon-tax ordering might extend to non-SE language model tasks if the penalty metric is recalibrated for different objectives.
  • Testing the pipeline on models larger than those evaluated here could reveal whether the gains scale or plateau.
  • Economic metaphors like taxation could guide other efficiency optimizations in AI by making tradeoffs explicit and quantifiable.

Load-bearing premise

The assumption that ordering compression techniques according to a computational carbon-tax principle will reliably produce multiplicative efficiency gains without unacceptable accuracy loss across the tested architectures and SE tasks.

What would settle it

An experiment that applies the same compression components in random or alternative orderings and measures whether the efficiency-accuracy tradeoffs match or exceed those of the carbon-tax ordering on the same tasks and models.

read the original abstract

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. This reality threatens not only the scalability and accessibility of AI-powered SE, but also its long-term environmental sustainability. The research challenge is clear: we must go beyond accuracy and address efficiency and environmental cost as first-class design constraints. To meet this challenge, we introduce Carbon-Taxed Transformers (CTT), a systematic multi-architectural compression principled pipeline ordering inspired by economic carbon taxation principles. Drawing from the economic concept of carbon pricing, CTT operationalizes a computational carbon tax that penalizes architectural inefficiencies and rewards deployment-ready compression. We evaluate CTT across three core SE tasks: code clone detection, code summarization, and code generation, with models spanning encoder-only, encoder-decoder, and decoder-only architecture. Our results show that CTT delivers on inference: (1) up to 49x memory reduction, (2) time reduction up to 8-10x for clone detection, up to 3x for summarization, and 4-7x for generation, (3) up to 81% reduction in CO2 emissions and (4) CTT retains around 98% accuracy on clone detection, around 89% on summarization, and up to 91% (textual metrics) and 68% (pass@1) for generation. Two ablation studies show that pipeline ordering and individual component contributions are both essential, providing empirical justification for CTT's design and effectiveness. This work establishes a viable path toward responsible AI in SE through aggressive yet performance-preserving compression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Carbon-Taxed Transformers (CTT), a systematic compression pipeline for LLMs in software engineering that orders techniques according to a computational carbon tax principle inspired by economic carbon pricing. It evaluates the pipeline on code clone detection, code summarization, and code generation tasks using encoder-only, encoder-decoder, and decoder-only models. Reported outcomes include up to 49x memory reduction, inference speedups of 8-10x (clone detection), 3x (summarization), and 4-7x (generation), up to 81% CO2 reduction, and accuracy retention of ~98% (clone detection), ~89% (summarization), and up to 91% textual / 68% pass@1 (generation). Two ablation studies are cited to establish that both the ordering and the individual components are essential.

Significance. If the empirical results prove robust, the work would offer a practical, environmentally-aware approach to LLM compression tailored to SE tasks, with the multi-architecture and multi-task evaluation providing a useful breadth. The framing of compression as a 'carbon-taxed' pipeline is a novel heuristic that could stimulate further research on sustainability-driven design choices. Strengths include the focus on inference metrics and the attempt to justify the pipeline via ablations. However, the absence of a formal definition for the carbon tax, detailed baselines, and statistical validation limits the immediate significance and generalizability of the claimed multiplicative gains.

major comments (2)
  1. [Section 3 and Section 5] Section 3 (CTT Pipeline) and Section 5 (Ablation Studies): The manuscript provides no formula, metric, or operational definition for the 'computational carbon tax' used to order compression techniques. This is load-bearing for the central claim, because the paper attributes the reported multiplicative efficiency gains (49x memory, 3-10x time, 81% CO2) specifically to this principled ordering, yet the ablations supply no quantitative comparison against alternative sequences or random orderings of the same components.
  2. [Section 4] Section 4 (Results): The accuracy and efficiency claims (e.g., 98% clone detection accuracy, 68% pass@1 for generation) are stated without error bars, confidence intervals, statistical tests, or per-model baseline tables comparing CTT against uncompressed models and against the same techniques applied in non-carbon-tax orderings. This directly affects verification of whether the carbon-tax ordering is necessary for acceptable accuracy retention across architectures.
minor comments (2)
  1. [Abstract and Section 4] The abstract and results sections use approximate phrasing ('around 98%', 'up to 49x') without accompanying tables that list exact values, standard deviations, or hardware/measurement details for CO2 and time metrics.
  2. [Section 3] No description is given of the specific compression techniques included in the pipeline or how their individual carbon costs were estimated prior to ordering.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. We address each major comment point by point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Section 3 and Section 5] Section 3 (CTT Pipeline) and Section 5 (Ablation Studies): The manuscript provides no formula, metric, or operational definition for the 'computational carbon tax' used to order compression techniques. This is load-bearing for the central claim, because the paper attributes the reported multiplicative efficiency gains (49x memory, 3-10x time, 81% CO2) specifically to this principled ordering, yet the ablations supply no quantitative comparison against alternative sequences or random orderings of the same components.

    Authors: We agree that a formal operational definition is needed to make the central claim fully verifiable. Section 3 currently describes the carbon tax as a heuristic inspired by economic carbon pricing that prioritizes techniques with lower computational and environmental cost, but it lacks an explicit formula. In the revised manuscript we will add a precise metric: a carbon-tax score defined as a normalized weighted sum of memory footprint, inference latency, and estimated CO2 emissions for each technique, with techniques ordered by ascending score. For the ablation studies, the existing experiments already compare the full ordered pipeline against versions that omit ordering or individual components; however, we acknowledge the value of additional controls. We will include new results comparing the carbon-tax ordering against random permutations of the same techniques and against alternative heuristics (e.g., ordering by model size alone or by latency alone) to quantify the benefit of the proposed ordering. revision: yes

  2. Referee: [Section 4] Section 4 (Results): The accuracy and efficiency claims (e.g., 98% clone detection accuracy, 68% pass@1 for generation) are stated without error bars, confidence intervals, statistical tests, or per-model baseline tables comparing CTT against uncompressed models and against the same techniques applied in non-carbon-tax orderings. This directly affects verification of whether the carbon-tax ordering is necessary for acceptable accuracy retention across architectures.

    Authors: We accept this criticism on the presentation of empirical results. The current version reports point estimates without measures of variability or formal statistical comparisons. In the revision we will add standard deviations across repeated runs, 95% confidence intervals, and appropriate statistical tests (paired t-tests or Wilcoxon signed-rank tests) for all accuracy and efficiency metrics. We will also expand the baseline tables to show, for each model and task, the uncompressed baseline, the CTT pipeline, and the same compression techniques applied in non-carbon-tax orderings. These additions will allow direct assessment of whether the carbon-tax ordering is required to retain acceptable accuracy while delivering the reported efficiency gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline without derivations or self-referential reductions

full rationale

The paper presents CTT as an empirical multi-stage compression pipeline for LLMs on SE tasks, with results from experiments and two ablation studies. No equations, derivations, or formal definitions appear in the abstract or described structure. The carbon-tax ordering is described as an inspirational principle operationalized into a pipeline, but without any quoted formula, fitted parameter, or self-citation chain that reduces a claimed result to its own inputs by construction. Ablations are invoked to support ordering and components, yet the absence of mathematical steps means no load-bearing claim reduces to a tautology or renamed fit. This is a standard empirical evaluation whose central claims rest on measured metrics rather than internal theoretical circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of the ordered compression pipeline; the carbon-tax analogy is treated as a guiding principle rather than a derived result.

axioms (1)
  • domain assumption A principled ordering of standard compression techniques can produce multiplicative gains in memory, speed, and emissions while preserving task performance.
    The paper states that pipeline ordering is essential and justifies it via ablations.
invented entities (1)
  • Computational carbon tax no independent evidence
    purpose: To operationalize penalization of architectural inefficiencies in the compression ordering
    The tax is an analogy used to motivate the pipeline design; no actual economic mechanism or external validation is provided.

pith-pipeline@v0.9.0 · 5647 in / 1527 out tokens · 68780 ms · 2026-05-07T15:48:47.336348+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 29 canonical work pages · 10 internal anchors

  1. [1]

    Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2655–2668. doi:10.18653/v1/2021.naacl-main.211 Proc. ACM Softw. Eng., Vol. 3,...

  2. [2]

    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. 2021. Program Synthesis with Large Language Models.arXiv preprint arXiv:2108.07732 (2021)

  3. [3]

    Shamil Ayupov and Nadezhda Chirkova. 2022. Parameter-Efficient Finetuning of Transformers for Source Code. Proceedings of the Second Workshop on Efficient Natural Language and Speech Processing. https://neurips2022-enlsp. github.io/papers/paper_24.pdf

  4. [4]

    Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. InProceedings of the 60th Annual Meeting of the Association for Compu- tational Linguistics. Association for Computational Linguistics, Dublin, Ireland, 1–9. doi:10.18653/v1/2022.acl-short.1

  5. [5]

    Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. InProceedings of the Eighth International Conference on Learning Representations. https://openreview.net/forum?id=HylxE1HKwS

  6. [6]

    , Xie X: A survey on evaluation of large language models

    Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2024. A Survey on Evaluation of Large Language Models.ACM transactions on intelligent systems and technology15, 3, 1–45. doi:10.1145/3641289

  7. [7]

    Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, and Jingren Zhou. 2021. AdaBERT: task-adaptive BERT compression with differentiable neural architecture search. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. Article 341, 2463-2469 pages

  8. [8]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al . 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374

  9. [9]

    de Araújo and JPW and MinervaBooks , title =

    Benoit Courty, Victor Schmidt, Sasha Luccioni, Goyal-Kamal, MarionCoutarel, Boris Feld, et al . 2024.mlco2/codecarbon: v2.4.1. doi:10.5281/zenodo.11171501

  10. [10]

    Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. LLM.int8(): 8-bit matrix multiplication for transformers at scale. InProceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, LA, USA, Article 2198, 30318-30332 pages

  11. [11]

    Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research20, 55, 1–21

  12. [12]

    Sol Farahmand. 2025. Working with LLMS: Using Lora vs quantization vs both. https://medium.com/@sol. farahmand1986/working-with-llms-using-lora-vs-quantization-vs-both-8b20c7db427d

  13. [13]

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing. 1536–1547. doi:10.18653/v1/2020.findi...

  14. [14]

    Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. InInternational Conference on Learning Representations. https://openreview.net/forum?id=rJl-b3RcF7

  15. [15]

    Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2022. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323(2022)

  16. [16]

    Mitchell Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing bert: Studying the effects of weight pruning on transfer learning. InProceedings of the 5th Workshop on Representation Learning for NLP. 143–155

  17. [17]

    Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Dublin, Ireland, 7212–7225. doi:10.18653/v1/2022.acl-long.499

  18. [18]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology33, 8, 1–79

  19. [19]

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. InProceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97). 2790–2799. https://proceedings.mlr.pre...

  20. [20]

    Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

  21. [21]

    https://openreview.net/forum?id=nZeVKeeFYf9

    LoRA: Low-Rank Adaptation of Large Language Models.Proceedings of the Tenth International Conference on Learning Representations(2022). https://openreview.net/forum?id=nZeVKeeFYf9

  22. [22]

    Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, et al. 2024. Qwen2. 5-Coder Technical Report.arXiv preprint arXiv:2409.12186

  23. [23]

    Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026. Preprint FSE047:22 Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani R...

  24. [24]

    Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713

  25. [25]

    Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. InFindings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 4163–4174. doi:10.18653/ v1/2020.finding...

  26. [26]

    Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, and Kurt Keutzer. 2022. Learned token pruning for transformers. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 784–794

  27. [27]

    Baolin Li, Yankai Jiang, Vijay Gadepally, and Devesh Tiwari. 2024. Toward sustainable genai using generation directives for carbon-friendly large language model inference.arXiv preprint arXiv:2403.12900

  28. [28]

    Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia LI, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Joel Lamy-Poirier, Joao Monteiro, Nicolas Gontier, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Ben Lipkin, Muhtasham Oblokulov, Zhi...

  29. [29]

    Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Online, 4582–4597. doi:10.18653/v1/2021.acl- long.353

  30. [30]

    Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al . 2022. Competition-level code generation with alphacode.Science378, 6624, 1092–1097

  31. [31]

    Ji Lin, Wei-Ming Chen, Yujun Lin, Chuang Gan, Song Han, et al . 2020. Mcunet: Tiny deep learning on iot devices. Advances in neural information processing systems33 (2020), 11711–11722

  32. [32]

    Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems6, 87–100

  33. [33]

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

  34. [34]

    Sasha Luccioni, Yacine Jernite, and Emma Strubell. 2024. Power Hungry Processing: Watts Driving the Cost of AI Deployment?. InProceedings of the ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, Rio de Janeiro, Brazil, 85–99. doi:10.1145/3630106.3658542

  35. [35]

    Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. COMPACTER: efficient low-rank hypercomplex adapter layers. InProceedings of the 35th International Conference on Neural Information Processing Systems. Article 79, 1022–1035 pages

  36. [36]

    Paul Michel, Omer Levy, and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?. InAdvances in Neural Information Processing Systems, Vol. 32. 14037–14047. https://proceedings.neurips.cc/paper_files/paper/2019/file/ 2c601ad9d2ff9bc8b282670cdd54f69f-Paper.pdf

  37. [37]

    Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. InThe Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_

  38. [38]

    David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R So, Maud Texier, and JeffDean. 2022. The carbon footprint of machine learning training will plateau, then shrink. Computer55, 7, 18–28

  39. [39]

    David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and JeffDean. 2021. Carbon emissions and large neural network training.arXiv preprint arXiv:2104.10350

  40. [40]

    Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The impact of ai on developer productivity: Evidence from github copilot.arXiv preprint arXiv:2302.06590. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026. Preprint Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models FSE047:23

  41. [41]

    Salesforce Research. 2023. Discussion on CodeT5+ model execution error. https://huggingface.co/Salesforce/codet5p- 2b/discussions/9. Accessed: 2025-09-09

  42. [42]

    Salesforce Research. 2023. GitHub Issue: AssertionError when running CodeT5+ models. https://github.com/salesforce/ CodeT5/issues/192. Accessed: 2025-09-09

  43. [43]

    2011.Wilcoxon-Signed-Rank Test

    Denise Rey and Markus Neuhäuser. 2011.Wilcoxon-Signed-Rank Test. Springer Berlin Heidelberg, Berlin, Heidelberg, 1658–1659. doi:10.1007/978-3-642-04898-2_616

  44. [44]

    Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, and Tushar Sharma. 2025. An Adaptive Language-Agnostic Pruning Method for Greener Language Models for Code.Proceedings ACM Software Engineering2, Foundations of Software Engineering, Article FSE054, 1183-1204 pages. doi:10.1145/3715773

  45. [45]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108

  46. [46]

    Victor Sanh, Thomas Wolf, and Alexander M. Rush. 2020. Movement pruning: adaptive sparsity byfine-tuning. In Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, BC, Canada, Article 1711, 20378-20389 pages

  47. [47]

    please” and “thank you

    Toria Sheffield. 2025. OpenAI CEO claims saying “please” and “thank you” to CHATGPT costs “tens of millions of dollars” - here’s why.People.com(Apr 2025). https://people.com/open-ai-ceo-claims-saying-please-thank-you-to- chatgpt-costs-millions-of-dollars-in-electricity-bills-11721523

  48. [48]

    Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer

  49. [49]

    InProceedings of the AAAI Conference on Artificial Intelligence, Vol

    Q-bert: Hessian based ultra low precision quantization of bert. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8815–8821

  50. [50]

    Fu, Zhiqiang Xie, Beidi Chen, Clark W

    Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark W. Barrett, Joseph Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, and Ce Zhang. 2023. High-throughput Generative Inference of Large Language Models with a Single GPU. InInternational Conference on Machine Learning

  51. [51]

    Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, and David Lo. 2024. Greening large language models of code. InProceedings of the 46th international conference on software engineering: software engineering in society. 142–153

  52. [52]

    Jieke Shi, Zhou Yang, Bowen Xu, Hong Jin Kang, and David Lo. 2022. Compressing pre-trained models of code into 3 mb. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12

  53. [53]

    David So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, and Quoc V Le. 2021. Searching for efficient transformers for language modeling.Advances in neural information processing systems34, 6010–6022

  54. [54]

    Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and policy considerations for modern deep learning research. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 13693–13696

  55. [55]

    Student. 1908. The probable error of a mean.Biometrika(1908), 1–25

  56. [56]

    Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization.Automated Software Engineering 31, 1, 22. doi:10.1007/s10515-024-00421-4

  57. [57]

    Jeffrey Svajlenko and Chanchal K Roy. 2021. Bigclonebench. InCode Clone Analysis: Research, Tools, and Practices. Springer, 93–105

  58. [58]

    Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97). 6105–6114. https://proceedings.mlr.press/v97/tan19a.html

  59. [59]

    Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy J. Lin. 2019. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.ArXivabs/1903.12136. https://api.semanticscholar.org/CorpusID: 85543565

  60. [60]

    Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 5797–5808. doi:10.18653/v1/P19-1580

  61. [61]

    Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software testing with large language models: Survey, landscape, and vision.IEEE Transactions on Software Engineering50, 4, 911–936

  62. [62]

    Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Advances in neural information processing systems33, 5776–5788

  63. [63]

    Yue Wang, Hung Le, Akhilesh Gotmare, Nghi Bui, Junnan Li, and Steven Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1069–1088. doi:10.18653/v1/2023.emnlp-main.68

  64. [64]

    Yue Wang, Weishi Wang, Shafiq Joty, and Steven C.H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder- Decoder Models for Code Understanding and Generation. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8696–8708. doi:10.18653/v1/2021.emnlp- Proc. ACM Softw. ...

  65. [65]

    Xiaokai Wei, Sujan Kumar Gonugondla, Shiqi Wang, Wasi Ahmad, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, et al . 2023. Towards greener yet powerful code generation via quantization: An empirical study. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Softw...

  66. [66]

    Min-Hsien Weng, Shaoqun Wu, and Mark Dyer. 2022. Identification and visualization of key topics in scientific publications with transformer-based language models and document clustering methods.Applied Sciences12, 21, 11220. doi:10.3390/app122111220

  67. [67]

    Frank F Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn. 2022. A systematic evaluation of large language models of code. InProceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10

  68. [68]

    Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, and Tie-Yan Liu. 2021. NAS-BERT: Task-agnostic and adaptive-size BERT compression with neural architecture search. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1933–1943

  69. [69]

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, et al . 2024. Qwen2 Technical Report.arXiv preprint arXiv:2407.10671

  70. [70]

    Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Yuan Xie, and Liang He. 2025. Recent advances of foundation language models-based continual learning: A survey.Comput. Surveys57, 5, 1–38

  71. [71]

    Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8BERT: Quantized 8Bit BERT. InFifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS). IEEE Computer Society, Los Alamitos, CA, USA, 36–39. doi:10.1109/EMC2-NIPS53020.2019.00016

  72. [72]

    Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, and Gennady Pekhimenko. 2022. DietCode: Automatic optimization for dynamic tensor programs.Proceedings of Machine Learning and Systems4, 848–863

  73. [73]

    Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.ArXivabs/1606.06160. https://api.semanticscholar.org/ CorpusID:14395129 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026