pith. machine review for the scientific record. sign in

arxiv: 1910.09700 · v2 · submitted 2019-10-21 · 💻 cs.CY · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Quantifying the Carbon Emissions of Machine Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:47 UTC · model grok-4.3

classification 💻 cs.CY cs.LG
keywords machine learningcarbon emissionsemissions calculatorenergy consumptionneural network trainingenvironmental impactsustainability
0
0 comments X

The pith

A tool called the Machine Learning Emissions Calculator approximates the carbon emissions of training neural networks based on server location, energy grid, training duration, and hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a new online tool for estimating the carbon emissions produced when training machine learning models. It identifies server location and the carbon intensity of the local energy grid as major factors, along with how long the training runs and what hardware is used. The authors provide explanations of these factors and suggest specific actions that researchers and companies can take to lower their emissions. This matters because training large models consumes significant energy, yet the environmental cost has not been easy to quantify before. Users of the tool can input their details to get an estimate and then choose greener options.

Core claim

We present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. The calculator approximates emissions using factors including the location of the server and its energy grid, the length of the training procedure, and the make and model of hardware. We accompany this tool with an explanation of these factors as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.

What carries the argument

The Machine Learning Emissions Calculator, which estimates carbon output by combining inputs on server location, energy grid carbon intensity, training length, and hardware specifications.

If this is right

  • Individual practitioners can use the calculator to estimate emissions for their training runs and identify high-impact factors to adjust.
  • Organizations can incorporate the tool into their decision-making to select lower-emission servers or optimize training procedures.
  • Greater awareness of energy grid differences may encourage training in regions with cleaner electricity sources.
  • Concrete mitigation steps include shortening training times through better algorithms and using more efficient hardware.
  • Reporting emissions alongside model performance could become a standard practice in machine learning research.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integrating such calculators into popular ML frameworks could make emission tracking automatic during training.
  • This work could support the development of benchmarks that include environmental cost alongside accuracy metrics.
  • Future extensions might account for the full lifecycle of models, including inference and data collection phases.
  • Policy makers could use aggregated data from such tools to regulate data center energy use.

Load-bearing premise

The listed factors of server location, energy grid, training length, and hardware are sufficient to accurately approximate emissions in a way that reliably guides mitigation decisions.

What would settle it

A side-by-side comparison where actual measured carbon emissions from a real training run deviate substantially from the calculator's prediction for the same inputs.

read the original abstract

From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Machine Learning Emissions Calculator, a practical tool to approximate the carbon emissions of training ML models. It identifies key influencing factors (server location and regional energy grid, training duration, and hardware type), explains how these affect emissions, and outlines mitigation actions for individual practitioners and organizations.

Significance. If the calculator's approximations can be shown to be reliable, the work would provide a timely, community-facing resource for quantifying and reducing the environmental impact of machine learning. The emphasis on actionable mitigation steps is a constructive contribution to an emerging area of concern.

major comments (2)
  1. [Machine Learning Emissions Calculator] The section presenting the Machine Learning Emissions Calculator derives estimates from hardware TDP ratings, assumed utilization, regional average carbon intensities, and user-supplied runtime, yet reports no side-by-side comparison of these estimates against metered power draw or time-resolved grid emissions for any actual training run. This absence directly affects the central claim that the listed factors suffice for reliable approximations that can guide mitigation decisions.
  2. [Explanation of factors] No sensitivity analysis or error bounds are provided for the approximation method (e.g., impact of dynamic power draw, cooling overhead, or intra-day grid variation), leaving open whether systematic over- or under-estimation by tens of percent occurs in practice.
minor comments (1)
  1. [Abstract] The abstract would benefit from an explicit statement of the tool's intended scope and known limitations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the scope and limitations of the presented calculator. We address each major comment below and outline planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Machine Learning Emissions Calculator] The section presenting the Machine Learning Emissions Calculator derives estimates from hardware TDP ratings, assumed utilization, regional average carbon intensities, and user-supplied runtime, yet reports no side-by-side comparison of these estimates against metered power draw or time-resolved grid emissions for any actual training run. This absence directly affects the central claim that the listed factors suffice for reliable approximations that can guide mitigation decisions.

    Authors: We agree that direct empirical validation against metered power draw would provide stronger evidence of reliability. The manuscript presents the calculator as a practical approximation tool based on standard methods (TDP values, average grid intensities, and assumed utilization) drawn from existing literature, rather than a precision measurement instrument. The central claim is that these factors are the dominant drivers and can be used for actionable estimates to guide mitigation, not that the tool matches real-time metering exactly. In revision we will add an explicit limitations subsection that discusses approximation error sources, cites prior studies performing such comparisons, and clarifies that the tool is intended for order-of-magnitude guidance and awareness rather than precise auditing. revision: partial

  2. Referee: [Explanation of factors] No sensitivity analysis or error bounds are provided for the approximation method (e.g., impact of dynamic power draw, cooling overhead, or intra-day grid variation), leaving open whether systematic over- or under-estimation by tens of percent occurs in practice.

    Authors: We accept this point and will incorporate a new sensitivity analysis subsection. The revision will quantify the effect of varying utilization rates, PUE values for cooling overhead, and temporal fluctuations in regional carbon intensity, providing explicit error bounds and showing how these propagate to the final emission estimate. This will allow readers to assess the robustness of the approximations for different use cases. revision: yes

Circularity Check

0 steps flagged

No circularity in emissions calculator derivation

full rationale

The paper presents a practical estimation tool whose inputs are server location and grid carbon intensity, training duration, and hardware TDP ratings drawn from external public data sources. No equations, fitted parameters, or predictions are defined that reduce to the tool's own outputs by construction, and no self-citations are invoked to establish uniqueness or to smuggle in ansatzes. The central claim is therefore an independent aggregation of standard factors rather than a self-referential loop, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5408 in / 935 out tokens · 31203 ms · 2026-05-15T05:47:52.538039+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization

    cs.LG 2026-05 unverdicted novelty 7.0

    The paper introduces the Amortized Efficiency Threshold (AET) to identify the deployment volume at which neural combinatorial optimization solvers become more energy-efficient overall than heuristic baselines after am...

  2. Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints

    cs.CR 2026-04 unverdicted novelty 7.0

    Nearly every arXiv submission leaks hidden sensitive information through its source files, existing cleaners fail, and ALC-NG provides a more reliable fix.

  3. Segment Anything

    cs.CV 2023-04 unverdicted novelty 7.0

    A promptable model trained on 1B masks achieves competitive zero-shot segmentation performance across tasks and is released publicly with its dataset.

  4. OPT: Open Pre-trained Transformer Language Models

    cs.CL 2022-05 unverdicted novelty 7.0

    OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

  5. Multitask Prompted Training Enables Zero-Shot Task Generalization

    cs.LG 2021-10 conditional novelty 7.0

    Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.

  6. EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

    cs.LG 2026-05 unverdicted novelty 6.0

    EnergyLens predicts multi-GPU LLM inference energy consumption with 9-13% MAPE and identifies configurations with up to 52x energy efficiency differences.

  7. PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

    cs.CL 2026-05 unverdicted novelty 6.0

    PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.

  8. Decomposing the Generalization Gap in PROTAC Activity Prediction: Variance Attribution and the Inter-Laboratory Ceiling

    cs.LG 2026-05 accept novelty 6.0

    Inter-laboratory measurement variance dominates the generalization gap in PROTAC activity prediction, capping LOTO AUROC near 0.67 across models and architectures.

  9. SAM 2: Segment Anything in Images and Videos

    cs.CV 2024-08 conditional novelty 6.0

    SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation datas...

  10. StarCoder 2 and The Stack v2: The Next Generation

    cs.SE 2024-02 accept novelty 6.0

    StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.

  11. DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

    cs.LG 2023-09 accept novelty 6.0

    DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.

  12. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    cs.CL 2022-11 unverdicted novelty 6.0

    BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

  13. Multi-Dimensional Model Integrity and Responsibility Assessment Index and Scoring Framework

    cs.LG 2026-05 unverdicted novelty 5.0

    MIRAI is a unified index that combines five responsibility dimensions into one score for tabular models, demonstrating that predictive performance does not ensure high overall integrity.

  14. Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

    cs.CE 2026-05 unverdicted novelty 5.0

    LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.

  15. UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

    cs.CL 2026-05 unverdicted novelty 5.0

    UniSD unifies complementary self-distillation mechanisms for autoregressive LLMs and achieves up to +5.4 point gains over base models and +2.8 over baselines across six benchmarks and six models.

  16. Agentic Insight Generation in VSM Simulations

    cs.CL 2026-04 unverdicted novelty 5.0

    A two-step agentic system for extracting insights from VSM simulations achieves up to 86% accuracy with top LLMs by using progressive data discovery and slim context.

  17. Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds

    cs.AI 2026-04 unverdicted novelty 5.0

    A frugal zero-shot local-LLM pipeline extracts relations at F1 0.70 and reaches 0.55 EM on multi-hop QA through self-consistency, cross-model oracles, and confidence routing, while identifying an agreement paradox whe...

  18. ChatGPT, is this real? The influence of generative AI on writing style in top-tier cybersecurity papers

    cs.CR 2026-04 unverdicted novelty 5.0

    Top-tier cybersecurity papers exhibit a post-2022 increase in AI marker words and higher lexical complexity, suggesting generative AI is influencing academic writing style.

  19. StarCoder: may the source be with you!

    cs.CL 2023-05 accept novelty 5.0

    StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

  20. From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint

    cs.CY 2026-05 unverdicted novelty 4.0

    A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 20 Pith papers · 8 internal anchors

  1. [1]

    Energy and Policy Considerations for Deep Learning in NLP

    Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019

  2. [2]

    Green ai

    Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. Green ai. arXiv preprint arXiv:1907.10597, 2019

  3. [3]

    Institute for Global Environmental Strategies Hayama, Japan, 2006

    Simon Eggleston, Leandro Buendia, Kyoko Miwa, Todd Ngara, and Kiyoto Tanabe.2006 IPCC guidelines for national greenhouse gas inventories, volume 5. Institute for Global Environmental Strategies Hayama, Japan, 2006

  4. [4]

    Ghg emissions from electricity consumption: A case study of hong kong from 2002 to 2015 and trends to 2030

    WM To and Peter KC Lee. Ghg emissions from electricity consumption: A case study of hong kong from 2002 to 2015 and trends to 2030. Journal of cleaner production, 165:589–598, 2017

  5. [5]

    Electricity- specific emission factors for grid electricity.Ecometrica, Emissionfactors

    Matthew Brander, Aman Sood, Charlotte Wylie, Amy Haughton, and Jessica Lovell. Electricity- specific emission factors for grid electricity.Ecometrica, Emissionfactors. com, 2011

  6. [6]

    Computa- tional physics on graphics processing units

    Ari Harju, Topi Siro, Filippo Federici Canova, Samuli Hakala, and Teemu Rantalaiho. Computa- tional physics on graphics processing units. In Proceedings of the 11th international conference on Applied Parallel and Scientific Computing, pages 3–26. Springer-Verlag, 2012

  7. [7]

    Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging, 35(5):1299–1312, 2016

    Nima Tajbakhsh, Jae Y Shin, Suryakanth R Gurudu, R Todd Hurst, Christopher B Kendall, Michael B Gotway, and Jianming Liang. Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging, 35(5):1299–1312, 2016

  8. [8]

    Food image recognition using deep convolutional network with pre-training and fine-tuning

    Keiji Yanai and Yoshiyuki Kawano. Food image recognition using deep convolutional network with pre-training and fine-tuning. In 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pages 1–6. IEEE, 2015

  9. [9]

    Universal Language Model Fine-tuning for Text Classification

    Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classifica- tion. arXiv preprint arXiv:1801.06146, 2018

  10. [10]

    Random search for hyper-parameter optimization

    James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb):281–305, 2012

  11. [12]

    Hyperparameter optimization

    Matthias Feurer and Frank Hutter. Hyperparameter optimization. In Automated Machine Learning, pages 3–33. Springer, 2019

  12. [13]

    Google environmental report 2018, 2018

    Google. Google environmental report 2018, 2018

  13. [14]

    Beyond carbon neutral

    Microsoft. Beyond carbon neutral. white paper, 2018

  14. [15]

    Aws & sustainability, 2019

    Amazon Web Services. Aws & sustainability, 2019

  15. [16]

    Machine learning applications for data center optimization, 2014

    Jim Gao. Machine learning applications for data center optimization, 2014

  16. [17]

    https://www.google.com/about/ datacenters/efficiency/internal/, 2019

    Google Data Centers efficiency: How we do it. https://www.google.com/about/ datacenters/efficiency/internal/, 2019. Accessed: 2019-08-23

  17. [18]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

  18. [19]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018

  19. [20]

    Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

    Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hy- perband: A novel bandit-based approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560, 2016

  20. [21]

    Massively parallel hyperparameter tuning

    Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 2018

  21. [22]

    BOHB: Robust and Efficient Hyperparameter Optimization at Scale

    Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. arXiv preprint arXiv:1807.01774, 2018

  22. [23]

    Tearing apart google’s tpu 3.0 ai coprocessor

    Paul Teich. Tearing apart google’s tpu 3.0 ai coprocessor. https://www.nextplatform. com/2018/05/10/tearing-apart-googles-tpu-3-0-ai-coprocessor/ , 2018. 5

  23. [24]

    NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

    Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. Neuralpower: Predict and deploy energy-efficient convolutional neural networks.arXiv preprint arXiv:1710.05420, 2017

  24. [25]

    Tackling climate change with machine learning

    David Rolnick, Priya L Donti, Lynn H Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman- Brown, et al. Tackling climate change with machine learning. arXiv preprint arXiv:1906.05433, 2019

  25. [26]

    Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks

    Victor Schmidt, Alexandra Luccioni, S. Karthik Mukkavilli, Narmada Balasooriya, Kris Sankaran, Jennifer Chayes, and Yoshua Bengio. Visualizing the consequences of climate change using cycle-consistent adversarial networks. CoRR, abs/1905.03709, 2019. 6 Appendix A: Energy Grid Data Used for the ML Emissions Calculator For clarity purposes, the data present...