The cost of dynamic reasoning: Demystifying ai agents and test- time scaling from an ai infrastructure perspective.arXiv preprint arXiv:2506.04301

· 2025 · arXiv 2506.04301

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation

cs.DC · 2026-04-11 · unverdicted · novelty 8.0

Tessera performs kernel-granularity disaggregation on heterogeneous GPUs, achieving up to 2.3x throughput and 1.6x cost efficiency gains for large model inference while generalizing beyond prior methods.

Specialize Roles, Mix Deployments: Pushing the Cost-Accuracy Frontier of LLM Agent Teams

cs.MA · 2026-05-28 · unverdicted · novelty 7.0

AgentCARD benchmark shows heterogeneous LLM agent teams with mixed deployments reach the cost-accuracy frontier, delivering up to 44% higher accuracy or 12x lower cost than uniform teams, with domain-specific role bottlenecks.

Inference-Time Budget Control for LLM Search Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.

Agentic AI Workload Characteristics

cs.DC · 2026-05-25 · unverdicted · novelty 5.0

Agentic workloads with context caching become decode-dominated with high KV-cache reuse and show tool use shifting from early read/explore to later execute/write phases.

Same Voice, Different Lab: On the Homogenization of Frontier LLM Personalities

cs.HC · 2026-03-20 · unverdicted · novelty 5.0

Frontier LLMs homogenize toward systematic and analytical personalities, suppressing emotional traits like remorseful or sycophantic, indicating an implicit consensus on optimal assistant behavior.

Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective

cs.AI · 2025-11-01 · conditional · novelty 5.0

The paper analyzes CPU bottlenecks in agentic AI serving, selects representative workloads, and demonstrates that CPU-aware scheduling optimizations COMB and MAS can reduce P50 latency by up to 1.7x and total latency by up to 2.49x on two hardware systems.

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

cs.LG · 2026-05-01 · unverdicted · novelty 4.0

AgentStop uses execution signals to early-terminate failing local LLM agent trajectories, cutting energy use 15-20% with minimal utility loss.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

citing papers explorer

Showing 8 of 8 citing papers.

Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation cs.DC · 2026-04-11 · unverdicted · none · ref 6
Tessera performs kernel-granularity disaggregation on heterogeneous GPUs, achieving up to 2.3x throughput and 1.6x cost efficiency gains for large model inference while generalizing beyond prior methods.
Specialize Roles, Mix Deployments: Pushing the Cost-Accuracy Frontier of LLM Agent Teams cs.MA · 2026-05-28 · unverdicted · none · ref 28
AgentCARD benchmark shows heterogeneous LLM agent teams with mixed deployments reach the cost-accuracy frontier, delivering up to 44% higher accuracy or 12x lower cost than uniform teams, with domain-specific role bottlenecks.
Inference-Time Budget Control for LLM Search Agents cs.AI · 2026-05-07 · unverdicted · none · ref 18
A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.
Agentic AI Workload Characteristics cs.DC · 2026-05-25 · unverdicted · none · ref 14
Agentic workloads with context caching become decode-dominated with high KV-cache reuse and show tool use shifting from early read/explore to later execute/write phases.
Same Voice, Different Lab: On the Homogenization of Frontier LLM Personalities cs.HC · 2026-03-20 · unverdicted · none · ref 19
Frontier LLMs homogenize toward systematic and analytical personalities, suppressing emotional traits like remorseful or sycophantic, indicating an implicit consensus on optimal assistant behavior.
Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective cs.AI · 2025-11-01 · conditional · none · ref 15
The paper analyzes CPU bottlenecks in agentic AI serving, selects representative workloads, and demonstrates that CPU-aware scheduling optimizations COMB and MAS can reduce P50 latency by up to 1.7x and total latency by up to 2.49x on two hardware systems.
AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices cs.LG · 2026-05-01 · unverdicted · none · ref 10
AgentStop uses execution signals to early-terminate failing local LLM agent trajectories, cutting energy use 15-20% with minimal utility loss.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 199
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

The cost of dynamic reasoning: Demystifying ai agents and test- time scaling from an ai infrastructure perspective.arXiv preprint arXiv:2506.04301

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer