{"paper":{"title":"EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"EnergyLens predicts multi-GPU LLM inference energy with 9-13 percent error to identify efficient configurations without exhaustive profiling.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Anantha P. Chandrakasan, Eun Kyung Lee, Kyungmi Lee, Tamar Eilam, Xin Zhang, Zhiye Song","submitted_at":"2026-05-14T01:37:26Z","abstract_excerpt":"We present EnergyLens, an end-to-end framework for energy-aware large language model (LLM) inference optimization. As LLMs scale, predicting and reducing their energy footprint has become critical for sustainability and datacenter operations, yet existing approaches either require production-level code and expensive profiling or fail to accurately capture multi-GPU energy behavior. As a result, practitioners lack tools for deciding which optimizations to prioritize and for selecting among existing deployment configurations when exhaustive profiling is impractical. EnergyLens addresses this gap"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"EnergyLens achieves mean absolute percentage errors (MAPEs) between 9.25% and 13.19% for multi-GPU prefill and decode energy, and correctly identifies Pareto-optimal overlap configurations.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The empirically driven communication energy model and load-imbalance-aware MoE modeling generalize accurately to unseen multi-GPU configurations and model scales beyond the validation set.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"EnergyLens predicts multi-GPU LLM inference energy consumption with 9-13% MAPE and identifies configurations with up to 52x energy efficiency differences.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"EnergyLens predicts multi-GPU LLM inference energy with 9-13 percent error to identify efficient configurations without exhaustive profiling.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a735860f05419c7295ba1c9d68c3ee32f7488ec477d357fee66ce57959e8e6c8"},"source":{"id":"2605.14249","kind":"arxiv","version":1},"verdict":{"id":"27cda7c1-8df0-423b-90db-dfcbc449b912","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:44:12.207532Z","strongest_claim":"EnergyLens achieves mean absolute percentage errors (MAPEs) between 9.25% and 13.19% for multi-GPU prefill and decode energy, and correctly identifies Pareto-optimal overlap configurations.","one_line_summary":"EnergyLens predicts multi-GPU LLM inference energy consumption with 9-13% MAPE and identifies configurations with up to 52x energy efficiency differences.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The empirically driven communication energy model and load-imbalance-aware MoE modeling generalize accurately to unseen multi-GPU configurations and model scales beyond the validation set.","pith_extraction_headline":"EnergyLens predicts multi-GPU LLM inference energy with 9-13 percent error to identify efficient configurations without exhaustive profiling."},"references":{"count":22,"sample":[{"doi":"10.1109/sc41404.2022.00051","year":2022,"title":"El-rec: Efficient large-scale recommendation model training via tensor-train embedding table","work_id":"0625ddd0-436d-4c7a-b66a-97409cfc830c","ref_index":1,"cited_arxiv_id":"1404.2022","is_internal_anchor":true},{"doi":"10.1109/iiswc63097.2024.00012","year":2024,"title":"Enhanced system- level coherence for heterogeneous unified memory architectures","work_id":"56fc806b-3fa8-47de-a622-00d820ed3b49","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv:2602.23036 [cs]","work_id":"957c3fcf-9ae9-46cb-b3b2-5d35a07edca9","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv: 2309.14393","work_id":"45f12f1f-db66-42b7-be6d-96e165162089","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2002,"title":"arXiv: 2002.05651","work_id":"be04efe9-971a-4803-9eae-552d48683244","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":22,"snapshot_sha256":"7db9af77c5f3e349fe81b381155a7067ac46d4bf7b6e8ddee9ed47f4c9b67666","internal_anchors":5},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}