Available: https://arxiv.org/abs/2511.07885

Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J · 2026 · arXiv 2511.07885

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

representative citing papers

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

cs.DC · 2026-05-20 · conditional · novelty 7.0

LlamaWeb is a WebGPU backend for llama.cpp that uses static memory planning, tunable kernels, and templated multi-precision support to cut memory use by 29-33% and raise decode throughput by 45-69% versus prior browser frameworks on tested hardware.

The xPU-athalon: Quantifying the Competition of AI Acceleration

cs.AR · 2026-04-12 · unverdicted · novelty 6.0

Quantitative benchmarks across recent AI accelerators reveal that optimal hardware choice varies with workload parameters and that several platforms incur substantially higher idle power than GPUs.

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

cs.LG · 2026-05-01 · unverdicted · novelty 4.0

AgentStop uses execution signals to early-terminate failing local LLM agent trajectories, cutting energy use 15-20% with minimal utility loss.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU cs.DC · 2026-05-20 · conditional · none · ref 60 · internal anchor
LlamaWeb is a WebGPU backend for llama.cpp that uses static memory planning, tunable kernels, and templated multi-precision support to cut memory use by 29-33% and raise decode throughput by 45-69% versus prior browser frameworks on tested hardware.

Available: https://arxiv.org/abs/2511.07885

fields

years

verdicts

representative citing papers

citing papers explorer