pith. sign in

arxiv: 2606.10660 · v1 · pith:L7CNH4RHnew · submitted 2026-06-09 · 💻 cs.CY · cs.AI

Accounting for AI Inference in Corporate GHG Inventories: A Four-Tier Methodology for Scope 3 Category 1 Reporting

Pith reviewed 2026-06-27 11:55 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords AI inference emissionsScope 3 Category 1corporate GHG inventoriescarbon accountingdata centre locationwater-carbon trade-offCSRD reportingEEIO models
0
0 comments X

The pith

A four-tier framework lets companies report AI inference emissions in Scope 3 Category 1 by matching method precision to the usage data they hold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a four-tier method for placing AI inference services such as API calls and embedded SaaS features into corporate greenhouse-gas inventories under Scope 3 Category 1. The tiers begin with direct calculations that multiply token counts by peer-reviewed GPU energy figures and local grid carbon intensities, then step down through intermediate data levels to a final spend-based economic input-output proxy when no usage records exist. Current practice either skips the category or applies a single broad ICT-sector factor that inflates results by ten to forty times. When the tiers are applied to a 200-person European firm the total falls below one tonne of CO2 equivalent. The same calculation surfaces a location-dependent trade-off in which low-carbon grids can carry higher water footprints.

Core claim

We propose a four-tier framework that matches estimation precision to the data organisations can realistically obtain, progressing from direct token-based physical estimation using GPU energy benchmarks and regional grid carbon intensities down to a spend-based EEIO fallback for services where no usage data exists. Applied to a 200-person European firm, the framework yields a total below 1 tCO2e, illustrating that the compliance challenge is methodological rather than magnitude-driven. We further document a water-carbon trade-off that current ESG tools do not surface.

What carries the argument

The four-tier estimation framework that scales from token-based physical calculations using GPU benchmarks and grid intensities to spend-based EEIO fallbacks.

If this is right

  • AI inference can be included in Scope 3 Category 1 inventories without omitting the category or applying sector-wide overestimates.
  • Firms that hold token or usage data obtain estimates far lower than those produced by generic economic input-output factors.
  • Data-centre location decisions must weigh both carbon intensity and water use, since hydro-heavy grids can increase water footprints.
  • The overall contribution of AI services to corporate totals remains small once appropriate methods replace broad proxies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Companies may begin logging token volumes or API calls to move their estimates into higher-precision tiers.
  • The tiered structure could be adapted to other digital services whose emissions currently rely on the same broad economic factors.
  • Regulators writing future disclosure rules might treat tiered physical-to-economic methods as an acceptable standard for AI-related Scope 3 items.

Load-bearing premise

Peer-reviewed GPU energy benchmarks and published grid carbon intensities give accurate values for real-world AI inference workloads across services and regions.

What would settle it

Direct meter readings of energy draw for a commercial AI inference endpoint that fall outside the range reported by the GPU energy benchmarks would show the physical tier is not representative.

Figures

Figures reproduced from arXiv: 2606.10660 by Barcelona), Guillermo Llopis (SOMA AI.

Figure 1
Figure 1. Figure 1: Four-tier framework decision flowchart. Assign one tier per AI service starting from the top. Most inventories will operate at Tier 2a or 2b. If yes, use it and document the provider’s methodology. If no, are exact token counts available from an API billing portal? If yes, Tier 2a. If token counts are unavailable but message or session data exists — for example, from an enterprise admin dashboard — Tier 2b… view at source ↗
Figure 2
Figure 2. Figure 2: Emission factor derivation chain. GPU-level benchmark energy (ML.ENERGY v3) is scaled to facility level, multiplied by regional grid carbon intensity, and used as an inventory input. Class Hardware scenario GPU Wh / 1K tokens Facility Wh / 1K tokens Uncertainty A — Small H100 industry-avg 0.033 0.040 ±40% A — Small B200 best-practice 0.013 0.014 ±30% B — Mid H100 industry-avg (est.) 0.135 0.162 ±50% B — Mi… view at source ↗
Figure 3
Figure 3. Figure 3: Carbon intensity of AI inference by model class and cloud region (H100-central, kg CO2e per million tokens, 2023 grid data). Sweden (0.006) is 13× lower than Singapore (0.076) for Class B. Grid intensities from EPA eGRID 2023 [5a] for US regions and Ember Carbon Emissions Intensity Data Explorer [5b], calendar year 2023, for EU and Asia-Pacific. Region Cloud examples Grid (kg CO2e/kWh) Class A Class B Clas… view at source ↗
Figure 4
Figure 4. Figure 4: Water–carbon trade-off across cloud regions (Class B, H100-central). Sweden has the lowest carbon intensity in Europe but the highest water footprint. Ireland performs best on both dimensions simultaneously. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Facility-level energy by model class (Wh per 1,000 tokens; error bars ±50%). H100-central vs. B200- optimistic. S1.1 GPU Energy to Facility Energy GPU energy measurements from ML.ENERGY Leaderboard v3 (joules per output token) are converted to facility Wh per 1,000 tokens using: GPU Wh/1K tokens = (GPU J/token × 1000) / 3600 Facility Wh/1K tokens = GPU Wh/1K × PUE Where PUE = 1.20 for H100 industry average… view at source ↗
Figure 6
Figure 6. Figure 6: Water intensity by model class and region (mL per 1,000 tokens, scope-1 + scope-2). S1.4 Water Factor Derivation Scope-1 WUE (L/kWh IT energy) from Li et al. [6] [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

AI inference services -- API subscriptions, enterprise chat tools, and SaaS products with embedded AI features -- fall unambiguously within Scope 3 Category 1 under the Corporate Sustainability Reporting Directive (CSRD), which requires disclosure for fiscal years starting January 2024. Yet no standardised methodology exists for including them in corporate GHG inventories. Current practice either omits the category entirely or applies a generic economic input-output (EEIO) factor calibrated to the ICT sector as a whole, overestimating AI inference emissions by 10-40x relative to physically derived alternatives. We propose a four-tier framework that matches estimation precision to the data organisations can realistically obtain, progressing from direct token-based physical estimation -- using GPU energy benchmarks and regional grid carbon intensities -- down to a spend-based EEIO fallback for services where no usage data exists. Emission factors are derived from peer-reviewed GPU energy benchmarks (ML.ENERGY Leaderboard v3), confirmed grid carbon intensities (EPA eGRID 2023; Ember 2023), and published water use effectiveness data (Li et al., 2025). Applied to a 200-person European firm, the framework yields a total below 1 tCO2e, illustrating that the compliance challenge is methodological rather than magnitude-driven. We further document a water-carbon trade-off that current ESG tools do not surface: Sweden's hydro-dominated grid delivers the lowest carbon intensity in our dataset but the highest water footprint, with direct implications for data centre location strategy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a four-tier methodology for reporting greenhouse gas emissions from AI inference services (API subscriptions, enterprise chat tools, SaaS with embedded AI) under Scope 3 Category 1, as required by the Corporate Sustainability Reporting Directive. Tiers progress from direct token-based physical estimation (using GPU energy benchmarks and regional grid carbon intensities) to a spend-based EEIO fallback. The authors claim generic EEIO methods overestimate emissions by 10-40x and apply the framework to a 200-person European firm to obtain a total below 1 tCO2e. They also document a water-carbon trade-off in grid choices using water use effectiveness data.

Significance. If the framework and case study hold, the work provides a practical, data-matched approach to a timely compliance gap in corporate ESG reporting for AI services. It demonstrates that properly estimated emissions can be small, reframing the issue as methodological. Reliance on peer-reviewed public benchmarks (ML.ENERGY Leaderboard v3) and grid data (EPA eGRID 2023, Ember 2023) supports reproducibility. The water footprint discussion adds a dimension often missing from carbon-only tools and has implications for data center strategy.

major comments (2)
  1. [§4 (Case Study)] §4 (Case Study): The central claim that the framework yields a total below 1 tCO2e (and that the compliance challenge is methodological rather than magnitude-driven) rests on ML.ENERGY Leaderboard v3 benchmarks and EPA eGRID 2023/Ember 2023 intensities accurately representing the firm's actual AI services, including model sizes, hardware utilization, batching, and provider data-center locations. The manuscript provides no validation, sensitivity analysis, or comparison to provider-reported values, which directly affects the <1 tCO2e result and the 10-40x overestimation comparison.
  2. [Methodology (four-tier description)] Methodology (four-tier description): The 10-40x overestimation range is asserted relative to generic EEIO factors calibrated to the ICT sector, but the manuscript does not detail the exact EEIO factor applied to the case-study services or provide a side-by-side physical vs. EEIO calculation for the same services, leaving the quantitative claim difficult to verify independently.
minor comments (2)
  1. Ensure the full reference list includes complete citations for all sources mentioned in the abstract (e.g., Li et al., 2025) with consistent formatting and DOIs where available.
  2. Figure or table presenting the case-study breakdown (if present) should explicitly list which services fell into which tier and the token or spend volumes used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and indicate where revisions will be made to strengthen the work.

read point-by-point responses
  1. Referee: [§4 (Case Study)] The central claim that the framework yields a total below 1 tCO2e (and that the compliance challenge is methodological rather than magnitude-driven) rests on ML.ENERGY Leaderboard v3 benchmarks and EPA eGRID 2023/Ember 2023 intensities accurately representing the firm's actual AI services, including model sizes, hardware utilization, batching, and provider data-center locations. The manuscript provides no validation, sensitivity analysis, or comparison to provider-reported values, which directly affects the <1 tCO2e result and the 10-40x overestimation comparison.

    Authors: The case study applies the framework to estimated usage for a representative 200-person European firm using the best publicly available benchmarks and grid data; it is illustrative rather than based on proprietary firm-specific telemetry. We agree a sensitivity analysis on parameters such as utilization and model size would improve the presentation and will add this in revision. Direct validation against provider-reported values is not feasible without access to confidential data, which is a limitation of any methodology relying on open benchmarks, but the sources remain reproducible and peer-reviewed. revision: partial

  2. Referee: [Methodology (four-tier description)] The 10-40x overestimation range is asserted relative to generic EEIO factors calibrated to the ICT sector, but the manuscript does not detail the exact EEIO factor applied to the case-study services or provide a side-by-side physical vs. EEIO calculation for the same services, leaving the quantitative claim difficult to verify independently.

    Authors: The 10-40x range reflects literature comparisons between AI-specific physical factors (from ML.ENERGY benchmarks) and generic ICT-sector EEIO factors. For the case study we prioritized physical tiers. We will revise to include an explicit side-by-side table for the case-study services, stating the precise EEIO factor (source and value) and the resulting emissions under a pure spend-based approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external benchmarks.

full rationale

The paper's four-tier framework derives emission estimates by applying independent external inputs—ML.ENERGY Leaderboard v3 GPU benchmarks, EPA eGRID 2023 and Ember 2023 grid intensities, and Li et al. 2025 water data—to usage data or spend-based fallbacks. The <1 tCO2e case-study result for the 200-person firm is produced by direct substitution of these public sources into the tiered methodology, with no parameter fitting to the paper's own outputs, no self-definitional equations, and no load-bearing self-citations. The derivation chain remains self-contained against verifiable external data and does not reduce to any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only. Relies on external benchmarks and grid data as inputs; no free parameters or invented entities introduced in the proposal itself.

axioms (2)
  • domain assumption AI inference services fall unambiguously within Scope 3 Category 1 under CSRD
    Stated directly in abstract as the regulatory premise for the framework.
  • domain assumption GPU energy benchmarks and regional grid intensities accurately represent inference workloads
    Used to derive emission factors; location in abstract where physical estimation tier is described.

pith-pipeline@v0.9.1-grok · 5806 in / 1356 out tokens · 19036 ms · 2026-06-27T11:55:37.639375+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    IEA, Paris, April 2025.https://www.iea.org/reports/ energy-and-ai

    International Energy Agency.Energy and AI. IEA, Paris, April 2025.https://www.iea.org/reports/ energy-and-ai

  2. [2]

    Official Journal of the European 12 AI Inference in Corporate GHG InventoriesLlopis (2026) Union, 2022

    European Commission.Corporate Sustainability Reporting Directive (CSRD), Directive 2022/2464/EU, andEuropean Sustainability Reporting Standard E1 (Climate Change). Official Journal of the European 12 AI Inference in Corporate GHG InventoriesLlopis (2026) Union, 2022

  3. [3]

    University of Michigan, May 2026.https://ml.energy/leaderboard

    ML.ENERGY Leaderboard.ML.ENERGY Leaderboard v3: GPU-level inference energy benchmarks. University of Michigan, May 2026.https://ml.energy/leaderboard

  4. [4]

    AAAI 2026

    Niu, C., et al.TokenPowerBench: A Benchmark for Measuring Per-Token Energy Consumption of Large Language Model Inference. AAAI 2026. arXiv:2512.03024. https://ojs.aaai.org/index.php/ AAAI/article/view/40535 [5a] U.S. Environmental Protection Agency.Emissions & Generation Resource Integrated Database (eGRID) 2023. EPA, Washington, DC, 2024. [5b] Ember.Elec...

  5. [5]

    and Ren, Shaolei , title =

    Li, P., Yang, J., Islam, M. A., & Ren, S.Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models.Communications of the ACM, 2025. https://doi.org/10.1145/3724499

  6. [6]

    Energy and Policy Considerations for Deep Learning in NLP

    Strubell, E., Ganesh, A., & McCallum, A.Energy and Policy Considerations for Deep Learning in NLP. ACL 2019.https://doi.org/10.18653/v1/P19-1355

  7. [7]

    Patterson, D., et al.Carbon Emissions and Large Neural Network Training.Communications of the ACM, 65(6), 52–57, 2022.https://doi.org/10.1145/3520312

  8. [8]

    Lannelongue, L., Grealey, J., & Inouye, M.Green Algorithms: Quantifying the Carbon Footprint of Computation.Advanced Science, 8(12), 2021.https://doi.org/10.1002/advs.202100707

  9. [9]

    & Banse, A.EcoLogits: Evaluating the Environmental Impacts of Generative AI

    Rincé, S. & Banse, A.EcoLogits: Evaluating the Environmental Impacts of Generative AI. Journal of Open Source Software, 10(111), 7471, 2025.https://doi.org/10.21105/joss.07471

  10. [10]

    NeurIPS 2019 Workshop on Tackling Climate Change with ML

    Lottick, K., Susai, S., Friedler, S.A., & Wilson, J.P.Energy Usage Reports: Environmental awareness as part of algorithmic accountability. NeurIPS 2019 Workshop on Tackling Climate Change with ML

  11. [11]

    Official Journal of the European Union, 2024

    European Parliament and Council.EU Artificial Intelligence Act, Regulation 2024/1689/EU, Article 53 (Obligations for providers of general-purpose AI models). Official Journal of the European Union, 2024

  12. [12]

    WRI/WBCSD, 2011

    World Resources Institute & World Business Council for Sustainable Development.GHG Protocol Corporate Value Chain (Scope 3) Accounting and Reporting Standard. WRI/WBCSD, 2011

  13. [13]

    https://doi.org/ 10.1111/jiec.12715

    Stadler, K., et al.EXIOBASE 3: Developing a Time Series of Detailed Environmentally Extended Multi- Regional Input-Output Tables.Journal of Industrial Ecology, 22(3), 502–515, 2018. https://doi.org/ 10.1111/jiec.12715

  14. [14]

    World Resources Institute, 2020.https://doi.org/10.46830/wrirpt.20.00003

    Reig, P., Luo, T., Christensen, E., & Sinistore, J.Guidance for Calculating Water Use Embedded in Purchased Electricity. World Resources Institute, 2020.https://doi.org/10.46830/wrirpt.20.00003

  15. [15]

    GSF, 2023

    Green Software Foundation.Software Carbon Intensity (SCI) Specification, v1.1. GSF, 2023. Standardised as ISO/IEC 21031:2024.https://sci.greensoftware.foundation

  16. [16]

    Singapore: EMA, 2025

    Energy Market Authority (EMA).Singapore Energy Statistics 2025, Chapter 2: Energy Transformation. Singapore: EMA, 2025. https://www.ema.gov.sg/resources/singapore-energy-statistics/ chapter2 13 AI Inference in Corporate GHG InventoriesLlopis (2026) Supplementary Material The following sections contain technical derivations, detailed limitations, and the p...