Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.
Title resolution pending
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 12roles
background 3polarities
background 3representative citing papers
Multiscreen replaces softmax attention with screening to provide absolute query-key relevance, resulting in models with 30% fewer parameters that maintain stable performance at long contexts.
APWA is a distributed multi-agent architecture that decomposes parallelizable agentic workflows into non-interfering subproblems for scalable execution on heterogeneous resources.
PAI-2 improves factual correctness in LLM answers by 4% on average across benchmarks using adaptive graph traversal and planning, with 6% gains from traversal algorithms and 18% from enabled planning.
CANTANTE uses contrastive rollouts to attribute system rewards to individual agents, enabling better prompt optimization than prior methods on programming, math, and QA benchmarks.
SCoL trains LLMs via meta-reinforcement learning to generate layer-specific update instructions that improve knowledge acquisition and retention from context streams over standard baselines.
Argus generates GPU kernels achieving 99-104% of hand-optimized throughput on key LLM kernels by enforcing compile-time data-flow invariants via a tag-based DSL and an in-context RL planner.
Gym-Anything turns arbitrary software into agent environments via multi-agent setup and auditing, creating CUA-World with 10K+ long-horizon tasks and showing that trajectory distillation plus test-time auditing improves small VLMs.
CUE-R uses REMOVE, REPLACE, and DUPLICATE interventions on individual evidence items to quantify their per-item utility in RAG along correctness, grounding faithfulness, and confidence axes.
MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.
SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.
citing papers explorer
-
The Context Gathering Decision Process: A POMDP Framework for Agentic Search
Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.
-
Screening Is Enough
Multiscreen replaces softmax attention with screening to provide absolute query-key relevance, resulting in models with 30% fewer parameters that maintain stable performance at long contexts.
-
APWA: A Distributed Architecture for Parallelizable Agentic Workflows
APWA is a distributed multi-agent architecture that decomposes parallelizable agentic workflows into non-interfering subproblems for scalable execution on heterogeneous resources.
-
PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents
PAI-2 improves factual correctness in LLM answers by 4% on average across benchmarks using adaptive graph traversal and planning, with 6% gains from traversal algorithms and 18% from enabled planning.
-
CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution
CANTANTE uses contrastive rollouts to attribute system rewards to individual agents, enabling better prompt optimization than prior methods on programming, math, and QA benchmarks.
-
Self-Consolidating Language Models: Continual Knowledge Incorporation from Context
SCoL trains LLMs via meta-reinforcement learning to generate layer-specific update instructions that improve knowledge acquisition and retention from context streams over standard baselines.
-
ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants
Argus generates GPU kernels achieving 99-104% of hand-optimized throughput on key LLM kernels by enforcing compile-time data-flow invariants via a tag-based DSL and an in-context RL planner.
-
Gym-Anything: Turn any Software into an Agent Environment
Gym-Anything turns arbitrary software into agent environments via multi-agent setup and auditing, creating CUA-World with 10K+ long-horizon tasks and showing that trajectory distillation plus test-time auditing improves small VLMs.
-
CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
CUE-R uses REMOVE, REPLACE, and DUPLICATE interventions on individual evidence items to quantify their per-item utility in RAG along correctness, grounding faithfulness, and confidence axes.
-
MeMo: Memory as a Model
MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.
-
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.
-
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.