pith. sign in

hub Mixed citations

Carbon Emissions and Large Neural Network Training

Mixed citation behavior. Most common role is background (69%).

79 Pith papers citing it
Background 69% of classified citations
abstract

The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization. We are now optimizing where and when large models are trained. Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems. Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X. These large factors also make retroactive estimates of energy cost difficult. To avoid miscalculations, we believe ML papers requiring large computational resources should make energy consumption and CO2e explicit when practical. We are working to be more transparent about energy use and CO2e in our future research. To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark.

hub tools

citation-role summary

background 12 method 1

citation-polarity summary

claims ledger

  • abstract The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large

co-cited works

clear filters

representative citing papers

A generative pre-trained transformer with Kerr-soliton attention

physics.optics · 2026-05-22 · unverdicted · novelty 7.0

Kerr-soliton attention realizes transformer attention in physical hardware via Kerr solitons in a resonator, with analytic training and experimental inference showing high-fidelity agreement between hardware and model.

Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

TokenArena is a continuous benchmark for AI inference endpoints that measures output speed, time to first token, blended price, effective context, quality, and modeled energy to produce composites of joules per correct answer, dollars per correct answer, and endpoint fidelity.

Stochastic Thermodynamics of Associative Memory

cond-mat.stat-mech · 2026-01-03 · unverdicted · novelty 7.0

DenseAMs show tradeoffs between entropy production, retrieval accuracy, and speed at intermediate loads, with a new failure mode in higher-order networks at finite temperature.

SAM 3: Segment Anything with Concepts

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.

Segment Anything

cs.CV · 2023-04-05 · unverdicted · novelty 7.0

A promptable model trained on 1B masks achieves competitive zero-shot segmentation performance across tasks and is released publicly with its dataset.

Mass-Editing Memory in a Transformer

cs.CL · 2022-10-13 · conditional · novelty 7.0

MEMIT scales direct memory editing in transformers from single facts to thousands of associations by optimizing MLP weight updates.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

Recasting AI Data Centers as Engines for Carbon Removal

math.OC · 2026-05-13 · unverdicted · novelty 6.0

AI data center waste heat upgraded by heat pumps can drive direct air capture to achieve net CO2 removal and offset operational emissions in several US states under current and 2030 scenarios.

citing papers explorer

Showing 7 of 7 citing papers after filters.

  • Mass-Editing Memory in a Transformer cs.CL · 2022-10-13 · conditional · none · ref 13 · internal anchor

    MEMIT scales direct memory editing in transformers from single facts to thousands of associations by optimizing MLP weight updates.

  • High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 65 · internal anchor

    Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

  • Multitask Prompted Training Enables Zero-Shot Task Generalization cs.LG · 2021-10-15 · conditional · none · ref 40 · internal anchor

    Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.

  • When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks cs.HC · 2025-10-06 · conditional · none · ref 62 · internal anchor

    A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.

  • SAM 2: Segment Anything in Images and Videos cs.CV · 2024-08-01 · conditional · none · ref 21 · internal anchor

    SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation dataset collected to date.

  • Selective Memory Retention for Long-Horizon LLM Agents cs.AI · 2026-06-28 · conditional · none · ref 7 · internal anchor

    TraceRetain applies feature-based scoring to evict low-value entries from bounded external memory in frozen LLM agents, preserving task success under 75% synthetic distractors on ALFWorld where unbounded memory degrades.

  • minAction.net: Energy-First Neural Architecture Design -- From Biological Principles to Systematic Validation cs.LG · 2026-04-27 · conditional · none · ref 8 · internal anchor

    Large-scale experiments show architecture performance depends on task type, not universality, and a single-parameter energy penalty reduces computational energy by ~1000x with negligible accuracy cost.