arxiv: 2604.25903 · v1 · submitted 2026-04-28 · 💻 cs.SE · cs.LG

Recognition: unknown

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

Ajmain Inqiad Alam , Palash Roy , Chanchal K. Roy , Banani Roy , Kevin A. Schneider

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:48 UTC · model grok-4.3

classification 💻 cs.SE cs.LG

keywords LLM compressioncarbon efficiencysoftware engineeringcode clone detectioncode summarizationcode generationmodel optimizationgreen AI

0 comments

The pith

Ordering compression techniques by a carbon-tax principle yields up to 49x memory reduction in language models for software engineering tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Carbon-Taxed Transformers as a pipeline that orders compression steps according to an economic carbon-tax analogy to penalize inefficient model structures and reward deployment-ready ones. It applies this across encoder-only, encoder-decoder, and decoder-only architectures on code clone detection, summarization, and generation. A sympathetic reader would care because the approach delivers large cuts in memory, runtime, and emissions while retaining most accuracy, addressing the environmental and scalability barriers to using large models in software engineering. Two ablation studies confirm that the specific ordering and component choices drive the gains rather than any single technique alone.

Core claim

CTT operationalizes a computational carbon tax to order compression techniques, producing up to 49x memory reduction, 8-10x faster inference on clone detection, 3x on summarization, 4-7x on generation, up to 81% lower CO2 emissions, and accuracy retention of around 98% on clone detection, 89% on summarization, and up to 91% textual metrics with 68% pass@1 on generation.

What carries the argument

The carbon-tax ordering principle in the CTT pipeline, which systematically sequences compression steps to penalize architectural inefficiencies and reward efficient ones across model types.

If this is right

Large language models become practical to deploy on modest hardware for clone detection, summarization, and generation in software engineering.
The carbon emissions associated with running these models drop substantially, supporting more sustainable AI use in the field.
The same pipeline ordering applies to encoder-only, encoder-decoder, and decoder-only architectures without architecture-specific redesign.
Accuracy remains high enough on standard benchmarks to support real-world SE applications after compression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The carbon-tax ordering might extend to non-SE language model tasks if the penalty metric is recalibrated for different objectives.
Testing the pipeline on models larger than those evaluated here could reveal whether the gains scale or plateau.
Economic metaphors like taxation could guide other efficiency optimizations in AI by making tradeoffs explicit and quantifiable.

Load-bearing premise

The assumption that ordering compression techniques according to a computational carbon-tax principle will reliably produce multiplicative efficiency gains without unacceptable accuracy loss across the tested architectures and SE tasks.

What would settle it

An experiment that applies the same compression components in random or alternative orderings and measures whether the efficiency-accuracy tradeoffs match or exceed those of the carbon-tax ordering on the same tasks and models.

read the original abstract

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable capabilities in different SE tasks, they are unmanageably large, slow to deploy, memory-intensive, and carbon-heavy. This reality threatens not only the scalability and accessibility of AI-powered SE, but also its long-term environmental sustainability. The research challenge is clear: we must go beyond accuracy and address efficiency and environmental cost as first-class design constraints. To meet this challenge, we introduce Carbon-Taxed Transformers (CTT), a systematic multi-architectural compression principled pipeline ordering inspired by economic carbon taxation principles. Drawing from the economic concept of carbon pricing, CTT operationalizes a computational carbon tax that penalizes architectural inefficiencies and rewards deployment-ready compression. We evaluate CTT across three core SE tasks: code clone detection, code summarization, and code generation, with models spanning encoder-only, encoder-decoder, and decoder-only architecture. Our results show that CTT delivers on inference: (1) up to 49x memory reduction, (2) time reduction up to 8-10x for clone detection, up to 3x for summarization, and 4-7x for generation, (3) up to 81% reduction in CO2 emissions and (4) CTT retains around 98% accuracy on clone detection, around 89% on summarization, and up to 91% (textual metrics) and 68% (pass@1) for generation. Two ablation studies show that pipeline ordering and individual component contributions are both essential, providing empirical justification for CTT's design and effectiveness. This work establishes a viable path toward responsible AI in SE through aggressive yet performance-preserving compression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a carbon-tax-inspired ordering for standard compression steps on LLMs for SE tasks and reports large efficiency gains, but the ablations do not clearly isolate whether that ordering drives the results.

read the letter

The main point is a pipeline that applies pruning, quantization, and distillation in a sequence justified by an economic carbon-tax analogy. It tests the result on code clone detection, summarization, and generation across encoder-only, encoder-decoder, and decoder-only models, claiming up to 49x memory savings, 3-10x speedups, 81% lower CO2, and accuracy retention in the 68-98% range depending on the task and metric.

Referee Report

2 major / 2 minor

Summary. The paper introduces Carbon-Taxed Transformers (CTT), a systematic compression pipeline for LLMs in software engineering that orders techniques according to a computational carbon tax principle inspired by economic carbon pricing. It evaluates the pipeline on code clone detection, code summarization, and code generation tasks using encoder-only, encoder-decoder, and decoder-only models. Reported outcomes include up to 49x memory reduction, inference speedups of 8-10x (clone detection), 3x (summarization), and 4-7x (generation), up to 81% CO2 reduction, and accuracy retention of ~98% (clone detection), ~89% (summarization), and up to 91% textual / 68% pass@1 (generation). Two ablation studies are cited to establish that both the ordering and the individual components are essential.

Significance. If the empirical results prove robust, the work would offer a practical, environmentally-aware approach to LLM compression tailored to SE tasks, with the multi-architecture and multi-task evaluation providing a useful breadth. The framing of compression as a 'carbon-taxed' pipeline is a novel heuristic that could stimulate further research on sustainability-driven design choices. Strengths include the focus on inference metrics and the attempt to justify the pipeline via ablations. However, the absence of a formal definition for the carbon tax, detailed baselines, and statistical validation limits the immediate significance and generalizability of the claimed multiplicative gains.

major comments (2)

[Section 3 and Section 5] Section 3 (CTT Pipeline) and Section 5 (Ablation Studies): The manuscript provides no formula, metric, or operational definition for the 'computational carbon tax' used to order compression techniques. This is load-bearing for the central claim, because the paper attributes the reported multiplicative efficiency gains (49x memory, 3-10x time, 81% CO2) specifically to this principled ordering, yet the ablations supply no quantitative comparison against alternative sequences or random orderings of the same components.
[Section 4] Section 4 (Results): The accuracy and efficiency claims (e.g., 98% clone detection accuracy, 68% pass@1 for generation) are stated without error bars, confidence intervals, statistical tests, or per-model baseline tables comparing CTT against uncompressed models and against the same techniques applied in non-carbon-tax orderings. This directly affects verification of whether the carbon-tax ordering is necessary for acceptable accuracy retention across architectures.

minor comments (2)

[Abstract and Section 4] The abstract and results sections use approximate phrasing ('around 98%', 'up to 49x') without accompanying tables that list exact values, standard deviations, or hardware/measurement details for CO2 and time metrics.
[Section 3] No description is given of the specific compression techniques included in the pipeline or how their individual carbon costs were estimated prior to ordering.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. We address each major comment point by point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Section 3 and Section 5] Section 3 (CTT Pipeline) and Section 5 (Ablation Studies): The manuscript provides no formula, metric, or operational definition for the 'computational carbon tax' used to order compression techniques. This is load-bearing for the central claim, because the paper attributes the reported multiplicative efficiency gains (49x memory, 3-10x time, 81% CO2) specifically to this principled ordering, yet the ablations supply no quantitative comparison against alternative sequences or random orderings of the same components.

Authors: We agree that a formal operational definition is needed to make the central claim fully verifiable. Section 3 currently describes the carbon tax as a heuristic inspired by economic carbon pricing that prioritizes techniques with lower computational and environmental cost, but it lacks an explicit formula. In the revised manuscript we will add a precise metric: a carbon-tax score defined as a normalized weighted sum of memory footprint, inference latency, and estimated CO2 emissions for each technique, with techniques ordered by ascending score. For the ablation studies, the existing experiments already compare the full ordered pipeline against versions that omit ordering or individual components; however, we acknowledge the value of additional controls. We will include new results comparing the carbon-tax ordering against random permutations of the same techniques and against alternative heuristics (e.g., ordering by model size alone or by latency alone) to quantify the benefit of the proposed ordering. revision: yes
Referee: [Section 4] Section 4 (Results): The accuracy and efficiency claims (e.g., 98% clone detection accuracy, 68% pass@1 for generation) are stated without error bars, confidence intervals, statistical tests, or per-model baseline tables comparing CTT against uncompressed models and against the same techniques applied in non-carbon-tax orderings. This directly affects verification of whether the carbon-tax ordering is necessary for acceptable accuracy retention across architectures.

Authors: We accept this criticism on the presentation of empirical results. The current version reports point estimates without measures of variability or formal statistical comparisons. In the revision we will add standard deviations across repeated runs, 95% confidence intervals, and appropriate statistical tests (paired t-tests or Wilcoxon signed-rank tests) for all accuracy and efficiency metrics. We will also expand the baseline tables to show, for each model and task, the uncompressed baseline, the CTT pipeline, and the same compression techniques applied in non-carbon-tax orderings. These additions will allow direct assessment of whether the carbon-tax ordering is required to retain acceptable accuracy while delivering the reported efficiency gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline without derivations or self-referential reductions

full rationale

The paper presents CTT as an empirical multi-stage compression pipeline for LLMs on SE tasks, with results from experiments and two ablation studies. No equations, derivations, or formal definitions appear in the abstract or described structure. The carbon-tax ordering is described as an inspirational principle operationalized into a pipeline, but without any quoted formula, fitted parameter, or self-citation chain that reduces a claimed result to its own inputs by construction. Ablations are invoked to support ordering and components, yet the absence of mathematical steps means no load-bearing claim reduces to a tautology or renamed fit. This is a standard empirical evaluation whose central claims rest on measured metrics rather than internal theoretical circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of the ordered compression pipeline; the carbon-tax analogy is treated as a guiding principle rather than a derived result.

axioms (1)

domain assumption A principled ordering of standard compression techniques can produce multiplicative gains in memory, speed, and emissions while preserving task performance.
The paper states that pipeline ordering is essential and justifies it via ablations.

invented entities (1)

Computational carbon tax no independent evidence
purpose: To operationalize penalization of architectural inefficiencies in the compression ordering
The tax is an analogy used to motivate the pipeline design; no actual economic mechanism or external validation is provided.

pith-pipeline@v0.9.0 · 5647 in / 1527 out tokens · 68780 ms · 2026-05-07T15:48:47.336348+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 29 canonical work pages · 10 internal anchors

[1]

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Uniﬁed Pre-training for Program Understanding and Generation. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2655–2668. doi:10.18653/v1/2021.naacl-main.211 Proc. ACM Softw. Eng., Vol. 3,...

work page doi:10.18653/v1/2021.naacl-main.211 2021
[2]

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. 2021. Program Synthesis with Large Language Models.arXiv preprint arXiv:2108.07732 (2021)

work page internal anchor Pith review arXiv 2021
[3]

Shamil Ayupov and Nadezhda Chirkova. 2022. Parameter-Eﬃcient Finetuning of Transformers for Source Code. Proceedings of the Second Workshop on Eﬃcient Natural Language and Speech Processing. https://neurips2022-enlsp. github.io/papers/paper_24.pdf

2022
[4]

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. BitFit: Simple Parameter-eﬃcient Fine-tuning for Transformer-based Masked Language-models. InProceedings of the 60th Annual Meeting of the Association for Compu- tational Linguistics. Association for Computational Linguistics, Dublin, Ireland, 1–9. doi:10.18653/v1/2022.acl-short.1

work page doi:10.18653/v1/2022.acl-short.1 2022
[5]

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Eﬃcient Deployment. InProceedings of the Eighth International Conference on Learning Representations. https://openreview.net/forum?id=HylxE1HKwS

2020
[6]

, Xie X: A survey on evaluation of large language models

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2024. A Survey on Evaluation of Large Language Models.ACM transactions on intelligent systems and technology15, 3, 1–45. doi:10.1145/3641289

work page doi:10.1145/3641289 2024
[7]

Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, and Jingren Zhou. 2021. AdaBERT: task-adaptive BERT compression with diﬀerentiable neural architecture search. In Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence. Article 341, 2463-2469 pages

2021
[8]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al . 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374

work page internal anchor Pith review arXiv 2021
[9]

de Araújo and JPW and MinervaBooks , title =

Benoit Courty, Victor Schmidt, Sasha Luccioni, Goyal-Kamal, MarionCoutarel, Boris Feld, et al . 2024.mlco2/codecarbon: v2.4.1. doi:10.5281/zenodo.11171501

work page doi:10.5281/zenodo.11171501 2024
[10]

Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. LLM.int8(): 8-bit matrix multiplication for transformers at scale. InProceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, LA, USA, Article 2198, 30318-30332 pages

2022
[11]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research20, 55, 1–21

2019
[12]

Sol Farahmand. 2025. Working with LLMS: Using Lora vs quantization vs both. https://medium.com/@sol. farahmand1986/working-with-llms-using-lora-vs-quantization-vs-both-8b20c7db427d

2025
[13]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing. 1536–1547. doi:10.18653/v1/2020.ﬁndi...

work page doi:10.18653/v1/2020 2020
[14]

Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. InInternational Conference on Learning Representations. https://openreview.net/forum?id=rJl-b3RcF7

2019
[15]

Elias Frantar, Saleh Ashkboos, Torsten Hoeﬂer, and Dan Alistarh. 2022. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323(2022)

work page internal anchor Pith review arXiv 2022
[16]

Mitchell Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing bert: Studying the eﬀects of weight pruning on transfer learning. InProceedings of the 5th Workshop on Representation Learning for NLP. 143–155

2020
[17]

Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Uniﬁed Cross-Modal Pre-training for Code Representation. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Dublin, Ireland, 7212–7225. doi:10.18653/v1/2022.acl-long.499

work page doi:10.18653/v1/2022.acl-long.499 2022
[18]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology33, 8, 1–79

2024
[19]

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Eﬃcient Transfer Learning for NLP. InProceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97). 2790–2799. https://proceedings.mlr.pre...

2019
[20]

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
[21]

https://openreview.net/forum?id=nZeVKeeFYf9

LoRA: Low-Rank Adaptation of Large Language Models.Proceedings of the Tenth International Conference on Learning Representations(2022). https://openreview.net/forum?id=nZeVKeeFYf9

2022
[22]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, et al. 2024. Qwen2. 5-Coder Technical Report.arXiv preprint arXiv:2409.12186

work page internal anchor Pith review arXiv 2024
[23]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026. Preprint FSE047:22 Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani R...

work page internal anchor Pith review arXiv 2019
[24]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for eﬃcient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713

2018
[25]

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. InFindings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 4163–4174. doi:10.18653/ v1/2020.ﬁnding...

2020
[26]

Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, and Kurt Keutzer. 2022. Learned token pruning for transformers. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 784–794

2022
[27]

Baolin Li, Yankai Jiang, Vijay Gadepally, and Devesh Tiwari. 2024. Toward sustainable genai using generation directives for carbon-friendly large language model inference.arXiv preprint arXiv:2403.12900

work page arXiv 2024
[28]

Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muennighoﬀ, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia LI, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Joel Lamy-Poirier, Joao Monteiro, Nicolas Gontier, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Ben Lipkin, Muhtasham Oblokulov, Zhi...

2023
[29]

Xiang Lisa Li and Percy Liang. 2021. Preﬁx-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Online, 4582–4597. doi:10.18653/v1/2021.acl- long.353

work page doi:10.18653/v1/2021.acl- 2021
[30]

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al . 2022. Competition-level code generation with alphacode.Science378, 6624, 1092–1097

2022
[31]

Ji Lin, Wei-Ming Chen, Yujun Lin, Chuang Gan, Song Han, et al . 2020. Mcunet: Tiny deep learning on iot devices. Advances in neural information processing systems33 (2020), 11711–11722

2020
[32]

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems6, 87–100

2024
[33]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review arXiv 2019
[34]

Sasha Luccioni, Yacine Jernite, and Emma Strubell. 2024. Power Hungry Processing: Watts Driving the Cost of AI Deployment?. InProceedings of the ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, Rio de Janeiro, Brazil, 85–99. doi:10.1145/3630106.3658542

work page doi:10.1145/3630106.3658542 2024
[35]

Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. COMPACTER: eﬃcient low-rank hypercomplex adapter layers. InProceedings of the 35th International Conference on Neural Information Processing Systems. Article 79, 1022–1035 pages

2021
[36]

Paul Michel, Omer Levy, and Graham Neubig. 2019. Are Sixteen Heads Really Better than One?. InAdvances in Neural Information Processing Systems, Vol. 32. 14037–14047. https://proceedings.neurips.cc/paper_ﬁles/paper/2019/ﬁle/ 2c601ad9d2ﬀ9bc8b282670cdd54f69f-Paper.pdf

2019
[37]

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. InThe Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_

2023
[38]

David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R So, Maud Texier, and JeﬀDean. 2022. The carbon footprint of machine learning training will plateau, then shrink. Computer55, 7, 18–28

2022
[39]

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and JeﬀDean. 2021. Carbon emissions and large neural network training.arXiv preprint arXiv:2104.10350

work page internal anchor Pith review arXiv 2021
[40]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The impact of ai on developer productivity: Evidence from github copilot.arXiv preprint arXiv:2302.06590. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026. Preprint Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models FSE047:23

work page internal anchor Pith review arXiv 2023
[41]

Salesforce Research. 2023. Discussion on CodeT5+ model execution error. https://huggingface.co/Salesforce/codet5p- 2b/discussions/9. Accessed: 2025-09-09

2023
[42]

Salesforce Research. 2023. GitHub Issue: AssertionError when running CodeT5+ models. https://github.com/salesforce/ CodeT5/issues/192. Accessed: 2025-09-09

2023
[43]

2011.Wilcoxon-Signed-Rank Test

Denise Rey and Markus Neuhäuser. 2011.Wilcoxon-Signed-Rank Test. Springer Berlin Heidelberg, Berlin, Heidelberg, 1658–1659. doi:10.1007/978-3-642-04898-2_616

work page doi:10.1007/978-3-642-04898-2_616 2011
[44]

Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, and Tushar Sharma. 2025. An Adaptive Language-Agnostic Pruning Method for Greener Language Models for Code.Proceedings ACM Software Engineering2, Foundations of Software Engineering, Article FSE054, 1183-1204 pages. doi:10.1145/3715773

work page doi:10.1145/3715773 2025
[45]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108

work page internal anchor Pith review arXiv 2019
[46]

Victor Sanh, Thomas Wolf, and Alexander M. Rush. 2020. Movement pruning: adaptive sparsity byﬁne-tuning. In Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, BC, Canada, Article 1711, 20378-20389 pages

2020
[47]

please” and “thank you

Toria Sheﬃeld. 2025. OpenAI CEO claims saying “please” and “thank you” to CHATGPT costs “tens of millions of dollars” - here’s why.People.com(Apr 2025). https://people.com/open-ai-ceo-claims-saying-please-thank-you-to- chatgpt-costs-millions-of-dollars-in-electricity-bills-11721523

2025
[48]

Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer
[49]

InProceedings of the AAAI Conference on Artiﬁcial Intelligence, Vol

Q-bert: Hessian based ultra low precision quantization of bert. InProceedings of the AAAI Conference on Artiﬁcial Intelligence, Vol. 34. 8815–8821
[50]

Fu, Zhiqiang Xie, Beidi Chen, Clark W

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark W. Barrett, Joseph Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, and Ce Zhang. 2023. High-throughput Generative Inference of Large Language Models with a Single GPU. InInternational Conference on Machine Learning

2023
[51]

Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, and David Lo. 2024. Greening large language models of code. InProceedings of the 46th international conference on software engineering: software engineering in society. 142–153

2024
[52]

Jieke Shi, Zhou Yang, Bowen Xu, Hong Jin Kang, and David Lo. 2022. Compressing pre-trained models of code into 3 mb. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12

2022
[53]

David So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, and Quoc V Le. 2021. Searching for eﬃcient transformers for language modeling.Advances in neural information processing systems34, 6010–6022

2021
[54]

Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and policy considerations for modern deep learning research. InProceedings of the AAAI conference on artiﬁcial intelligence, Vol. 34. 13693–13696

2020
[55]

Student. 1908. The probable error of a mean.Biometrika(1908), 1–25

1908
[56]

Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization.Automated Software Engineering 31, 1, 22. doi:10.1007/s10515-024-00421-4

work page doi:10.1007/s10515-024-00421-4 2024
[57]

Jeﬀrey Svajlenko and Chanchal K Roy. 2021. Bigclonebench. InCode Clone Analysis: Research, Tools, and Practices. Springer, 93–105

2021
[58]

Mingxing Tan and Quoc Le. 2019. Eﬃcientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97). 6105–6114. https://proceedings.mlr.press/v97/tan19a.html

2019
[59]

Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy J. Lin. 2019. Distilling Task-Speciﬁc Knowledge from BERT into Simple Neural Networks.ArXivabs/1903.12136. https://api.semanticscholar.org/CorpusID: 85543565

work page arXiv 2019
[60]

Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 5797–5808. doi:10.18653/v1/P19-1580

work page doi:10.18653/v1/p19-1580 2019
[61]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software testing with large language models: Survey, landscape, and vision.IEEE Transactions on Software Engineering50, 4, 911–936

2024
[62]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.Advances in neural information processing systems33, 5776–5788

2020
[63]

Yue Wang, Hung Le, Akhilesh Gotmare, Nghi Bui, Junnan Li, and Steven Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1069–1088. doi:10.18653/v1/2023.emnlp-main.68

work page doi:10.18653/v1/2023.emnlp-main.68 2023
[64]

Yue Wang, Weishi Wang, Shaﬁq Joty, and Steven C.H. Hoi. 2021. CodeT5: Identiﬁer-aware Uniﬁed Pre-trained Encoder- Decoder Models for Code Understanding and Generation. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8696–8708. doi:10.18653/v1/2021.emnlp- Proc. ACM Softw. ...

work page doi:10.18653/v1/2021.emnlp- 2021
[65]

Xiaokai Wei, Sujan Kumar Gonugondla, Shiqi Wang, Wasi Ahmad, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, et al . 2023. Towards greener yet powerful code generation via quantization: An empirical study. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Softw...

2023
[66]

Min-Hsien Weng, Shaoqun Wu, and Mark Dyer. 2022. Identiﬁcation and visualization of key topics in scientiﬁc publications with transformer-based language models and document clustering methods.Applied Sciences12, 21, 11220. doi:10.3390/app122111220

work page doi:10.3390/app122111220 2022
[67]

Frank F Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn. 2022. A systematic evaluation of large language models of code. InProceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10

2022
[68]

Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, and Tie-Yan Liu. 2021. NAS-BERT: Task-agnostic and adaptive-size BERT compression with neural architecture search. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1933–1943

2021
[69]

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, et al . 2024. Qwen2 Technical Report.arXiv preprint arXiv:2407.10671

work page internal anchor Pith review arXiv 2024
[70]

Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Yuan Xie, and Liang He. 2025. Recent advances of foundation language models-based continual learning: A survey.Comput. Surveys57, 5, 1–38

2025
[71]

Oﬁr Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8BERT: Quantized 8Bit BERT. InFifth Workshop on Energy Eﬃcient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS). IEEE Computer Society, Los Alamitos, CA, USA, 36–39. doi:10.1109/EMC2-NIPS53020.2019.00016

work page doi:10.1109/emc2-nips53020.2019.00016 2019
[72]

Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, and Gennady Pekhimenko. 2022. DietCode: Automatic optimization for dynamic tensor programs.Proceedings of Machine Learning and Systems4, 848–863

2022
[73]

Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.ArXivabs/1606.06160. https://api.semanticscholar.org/ CorpusID:14395129 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE047. Publication date: July 2026

work page Pith review arXiv 2016