A Context-Contaminated Restart Model derives exact success probabilities and an optimal pipeline depth T* = sqrt(B * log(1/(1-ε1)) / log(1/(1-ε0))) for fixed budget B, validated on SWE-bench where it fits data far better than IID assumptions.
Budget-aware tool-use enables effective agent scaling
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 3polarities
background 3representative citing papers
MCPP uses Monte Carlo simulations of workflow executions to dynamically allocate resources and replan, raising constrained completion probability over baselines on CodeFlow and ProofFlow.
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
Derives unique closed-form decentralized policy minimizing worst-agent online regret that asymptotically converges to centralized Nash-optimal policy in mean-field limit, with added online mixture weighting.
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
citing papers explorer
-
Why Retrying Fails: Context Contamination in LLM Agent Pipelines
A Context-Contaminated Restart Model derives exact success probabilities and an optimal pipeline depth T* = sqrt(B * log(1/(1-ε1)) / log(1/(1-ε0))) for fixed budget B, validated on SWE-bench where it fits data far better than IID assumptions.
-
On Time, Within Budget: Constraint-Driven Online Resource Allocation for Agentic Workflows
MCPP uses Monte Carlo simulations of workflow executions to dynamically allocate resources and replan, raising constrained completion probability over baselines on CodeFlow and ProofFlow.
-
Evaluation-driven Scaling for Scientific Discovery
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
-
MEMOA: Massive Mixtures of Online Agents via Mean-Field Decentralized Nash Equilibria
Derives unique closed-form decentralized policy minimizing worst-agent online regret that asymptotically converges to centralized Nash-optimal policy in mean-field limit, with added online mixture weighting.
-
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
- To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling