Recognition: 2 theorem links
· Lean TheoremQuantifying the Carbon Emissions of Machine Learning
Pith reviewed 2026-05-15 05:47 UTC · model grok-4.3
The pith
A tool called the Machine Learning Emissions Calculator approximates the carbon emissions of training neural networks based on server location, energy grid, training duration, and hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. The calculator approximates emissions using factors including the location of the server and its energy grid, the length of the training procedure, and the make and model of hardware. We accompany this tool with an explanation of these factors as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
What carries the argument
The Machine Learning Emissions Calculator, which estimates carbon output by combining inputs on server location, energy grid carbon intensity, training length, and hardware specifications.
If this is right
- Individual practitioners can use the calculator to estimate emissions for their training runs and identify high-impact factors to adjust.
- Organizations can incorporate the tool into their decision-making to select lower-emission servers or optimize training procedures.
- Greater awareness of energy grid differences may encourage training in regions with cleaner electricity sources.
- Concrete mitigation steps include shortening training times through better algorithms and using more efficient hardware.
- Reporting emissions alongside model performance could become a standard practice in machine learning research.
Where Pith is reading between the lines
- Integrating such calculators into popular ML frameworks could make emission tracking automatic during training.
- This work could support the development of benchmarks that include environmental cost alongside accuracy metrics.
- Future extensions might account for the full lifecycle of models, including inference and data collection phases.
- Policy makers could use aggregated data from such tools to regulate data center energy use.
Load-bearing premise
The listed factors of server location, energy grid, training length, and hardware are sufficient to accurately approximate emissions in a way that reliably guides mitigation decisions.
What would settle it
A side-by-side comparison where actual measured carbon emissions from a real training run deviate substantially from the calculator's prediction for the same inputs.
read the original abstract
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Machine Learning Emissions Calculator, a practical tool to approximate the carbon emissions of training ML models. It identifies key influencing factors (server location and regional energy grid, training duration, and hardware type), explains how these affect emissions, and outlines mitigation actions for individual practitioners and organizations.
Significance. If the calculator's approximations can be shown to be reliable, the work would provide a timely, community-facing resource for quantifying and reducing the environmental impact of machine learning. The emphasis on actionable mitigation steps is a constructive contribution to an emerging area of concern.
major comments (2)
- [Machine Learning Emissions Calculator] The section presenting the Machine Learning Emissions Calculator derives estimates from hardware TDP ratings, assumed utilization, regional average carbon intensities, and user-supplied runtime, yet reports no side-by-side comparison of these estimates against metered power draw or time-resolved grid emissions for any actual training run. This absence directly affects the central claim that the listed factors suffice for reliable approximations that can guide mitigation decisions.
- [Explanation of factors] No sensitivity analysis or error bounds are provided for the approximation method (e.g., impact of dynamic power draw, cooling overhead, or intra-day grid variation), leaving open whether systematic over- or under-estimation by tens of percent occurs in practice.
minor comments (1)
- [Abstract] The abstract would benefit from an explicit statement of the tool's intended scope and known limitations.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the scope and limitations of the presented calculator. We address each major comment below and outline planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Machine Learning Emissions Calculator] The section presenting the Machine Learning Emissions Calculator derives estimates from hardware TDP ratings, assumed utilization, regional average carbon intensities, and user-supplied runtime, yet reports no side-by-side comparison of these estimates against metered power draw or time-resolved grid emissions for any actual training run. This absence directly affects the central claim that the listed factors suffice for reliable approximations that can guide mitigation decisions.
Authors: We agree that direct empirical validation against metered power draw would provide stronger evidence of reliability. The manuscript presents the calculator as a practical approximation tool based on standard methods (TDP values, average grid intensities, and assumed utilization) drawn from existing literature, rather than a precision measurement instrument. The central claim is that these factors are the dominant drivers and can be used for actionable estimates to guide mitigation, not that the tool matches real-time metering exactly. In revision we will add an explicit limitations subsection that discusses approximation error sources, cites prior studies performing such comparisons, and clarifies that the tool is intended for order-of-magnitude guidance and awareness rather than precise auditing. revision: partial
-
Referee: [Explanation of factors] No sensitivity analysis or error bounds are provided for the approximation method (e.g., impact of dynamic power draw, cooling overhead, or intra-day grid variation), leaving open whether systematic over- or under-estimation by tens of percent occurs in practice.
Authors: We accept this point and will incorporate a new sensitivity analysis subsection. The revision will quantify the effect of varying utilization rates, PUE values for cooling overhead, and temporal fluctuations in regional carbon intensity, providing explicit error bounds and showing how these propagate to the final emission estimate. This will allow readers to assess the robustness of the approximations for different use cases. revision: yes
Circularity Check
No circularity in emissions calculator derivation
full rationale
The paper presents a practical estimation tool whose inputs are server location and grid carbon intensity, training duration, and hardware TDP ratings drawn from external public data sources. No equations, fitted parameters, or predictions are defined that reduce to the tool's own outputs by construction, and no self-citations are invoked to establish uniqueness or to smuggle in ansatzes. The central claim is therefore an independent aggregation of standard factors rather than a self-referential loop, making the derivation self-contained.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 20 Pith papers
-
An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization
The paper introduces the Amortized Efficiency Threshold (AET) to identify the deployment volume at which neural combinatorial optimization solvers become more energy-efficient overall than heuristic baselines after am...
-
Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints
Nearly every arXiv submission leaks hidden sensitive information through its source files, existing cleaners fail, and ALC-NG provides a more reliable fix.
-
Segment Anything
A promptable model trained on 1B masks achieves competitive zero-shot segmentation performance across tasks and is released publicly with its dataset.
-
OPT: Open Pre-trained Transformer Language Models
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
-
EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization
EnergyLens predicts multi-GPU LLM inference energy consumption with 9-13% MAPE and identifies configurations with up to 52x energy efficiency differences.
-
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
-
Decomposing the Generalization Gap in PROTAC Activity Prediction: Variance Attribution and the Inter-Laboratory Ceiling
Inter-laboratory measurement variance dominates the generalization gap in PROTAC activity prediction, capping LOTO AUROC near 0.67 across models and architectures.
-
SAM 2: Segment Anything in Images and Videos
SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation datas...
-
StarCoder 2 and The Stack v2: The Next Generation
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
-
Multi-Dimensional Model Integrity and Responsibility Assessment Index and Scoring Framework
MIRAI is a unified index that combines five responsibility dimensions into one score for tabular models, demonstrating that predictive performance does not ensure high overall integrity.
-
Position: LLM Inference Should Be Evaluated as Energy-to-Token Production
LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.
-
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
UniSD unifies complementary self-distillation mechanisms for autoregressive LLMs and achieves up to +5.4 point gains over base models and +2.8 over baselines across six benchmarks and six models.
-
Agentic Insight Generation in VSM Simulations
A two-step agentic system for extracting insights from VSM simulations achieves up to 86% accuracy with top LLMs by using progressive data discovery and slim context.
-
Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds
A frugal zero-shot local-LLM pipeline extracts relations at F1 0.70 and reaches 0.55 EM on multi-hop QA through self-consistency, cross-model oracles, and confidence routing, while identifying an agreement paradox whe...
-
ChatGPT, is this real? The influence of generative AI on writing style in top-tier cybersecurity papers
Top-tier cybersecurity papers exhibit a post-2022 increase in AI marker words and higher lexical complexity, suggesting generative AI is influencing academic writing style.
-
StarCoder: may the source be with you!
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
-
From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
Reference graph
Works this paper leans on
-
[1]
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
- [2]
-
[3]
Institute for Global Environmental Strategies Hayama, Japan, 2006
Simon Eggleston, Leandro Buendia, Kyoko Miwa, Todd Ngara, and Kiyoto Tanabe.2006 IPCC guidelines for national greenhouse gas inventories, volume 5. Institute for Global Environmental Strategies Hayama, Japan, 2006
work page 2006
-
[4]
WM To and Peter KC Lee. Ghg emissions from electricity consumption: A case study of hong kong from 2002 to 2015 and trends to 2030. Journal of cleaner production, 165:589–598, 2017
work page 2002
-
[5]
Electricity- specific emission factors for grid electricity.Ecometrica, Emissionfactors
Matthew Brander, Aman Sood, Charlotte Wylie, Amy Haughton, and Jessica Lovell. Electricity- specific emission factors for grid electricity.Ecometrica, Emissionfactors. com, 2011
work page 2011
-
[6]
Computa- tional physics on graphics processing units
Ari Harju, Topi Siro, Filippo Federici Canova, Samuli Hakala, and Teemu Rantalaiho. Computa- tional physics on graphics processing units. In Proceedings of the 11th international conference on Applied Parallel and Scientific Computing, pages 3–26. Springer-Verlag, 2012
work page 2012
-
[7]
Nima Tajbakhsh, Jae Y Shin, Suryakanth R Gurudu, R Todd Hurst, Christopher B Kendall, Michael B Gotway, and Jianming Liang. Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging, 35(5):1299–1312, 2016
work page 2016
-
[8]
Food image recognition using deep convolutional network with pre-training and fine-tuning
Keiji Yanai and Yoshiyuki Kawano. Food image recognition using deep convolutional network with pre-training and fine-tuning. In 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pages 1–6. IEEE, 2015
work page 2015
-
[9]
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classifica- tion. arXiv preprint arXiv:1801.06146, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Random search for hyper-parameter optimization
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb):281–305, 2012
work page 2012
-
[12]
Matthias Feurer and Frank Hutter. Hyperparameter optimization. In Automated Machine Learning, pages 3–33. Springer, 2019
work page 2019
-
[13]
Google environmental report 2018, 2018
Google. Google environmental report 2018, 2018
work page 2018
- [14]
- [15]
-
[16]
Machine learning applications for data center optimization, 2014
Jim Gao. Machine learning applications for data center optimization, 2014
work page 2014
-
[17]
https://www.google.com/about/ datacenters/efficiency/internal/, 2019
Google Data Centers efficiency: How we do it. https://www.google.com/about/ datacenters/efficiency/internal/, 2019. Accessed: 2019-08-23
work page 2019
-
[18]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[19]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hy- perband: A novel bandit-based approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[21]
Massively parallel hyperparameter tuning
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 2018
-
[22]
BOHB: Robust and Efficient Hyperparameter Optimization at Scale
Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[23]
Tearing apart google’s tpu 3.0 ai coprocessor
Paul Teich. Tearing apart google’s tpu 3.0 ai coprocessor. https://www.nextplatform. com/2018/05/10/tearing-apart-googles-tpu-3-0-ai-coprocessor/ , 2018. 5
work page 2018
-
[24]
NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks
Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. Neuralpower: Predict and deploy energy-efficient convolutional neural networks.arXiv preprint arXiv:1710.05420, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[25]
Tackling climate change with machine learning
David Rolnick, Priya L Donti, Lynn H Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman- Brown, et al. Tackling climate change with machine learning. arXiv preprint arXiv:1906.05433, 2019
-
[26]
Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks
Victor Schmidt, Alexandra Luccioni, S. Karthik Mukkavilli, Narmada Balasooriya, Kris Sankaran, Jennifer Chayes, and Yoshua Bengio. Visualizing the consequences of climate change using cycle-consistent adversarial networks. CoRR, abs/1905.03709, 2019. 6 Appendix A: Energy Grid Data Used for the ML Emissions Calculator For clarity purposes, the data present...
work page internal anchor Pith review Pith/arXiv arXiv 1905
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.