A glovebox robotic lab with agentic LLM reasoning synthesized 352 lithium halide spinel samples, covering 72% of possible metal combinations and raising the rate of good ionic conductors from 1.33% to 5.33% over the campaign.
Canonical reference
Cascade: Cumulative agentic skill creation through autonomous development and evolution
Canonical reference. 100% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
years
2026 8roles
background 4polarities
background 4representative citing papers
Identifies library drift as a failure mode in self-evolving LLM skill libraries and shows a governance recipe improves pass@1 from 0.258 to 0.584 on MBPP+ hard-100.
SearchSkill improves exact match scores and retrieval efficiency on open-domain QA by conditioning LLM actions on skills from an evolving SkillBank updated from failure patterns via two-stage SFT.
Ratchet provides a minimal hygiene recipe for self-managing skill libraries in frozen LLM agents, delivering +0.328 rolling-mean pass@1 gain on MBPP+ hard-100 and +0.22 peak lift on SWE-bench Verified.
The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
The paper introduces the KDR task, HKA multi-agent framework, and KDR-Bench to enable LLM agents to integrate structured knowledge into deep research reports, with experiments showing outperformance over prior agents.
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
citing papers explorer
-
Agentic LLM Reasoning in a Self-Driving Laboratory for Air-Sensitive Lithium Halide Spinel Conductors
A glovebox robotic lab with agentic LLM reasoning synthesized 352 lithium halide spinel samples, covering 72% of possible metal combinations and raising the rate of good ionic conductors from 1.33% to 5.33% over the campaign.
-
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries
Identifies library drift as a failure mode in self-evolving LLM skill libraries and shows a governance recipe improves pass@1 from 0.258 to 0.584 on MBPP+ hard-100.
-
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks
SearchSkill improves exact match scores and retrieval efficiency on open-domain QA by conditioning LLM actions on skills from an evolving SkillBank updated from failure patterns via two-stage SFT.
-
Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents
Ratchet provides a minimal hygiene recipe for self-managing skill libraries in frozen LLM agents, delivering +0.328 rolling-mean pass@1 gain on MBPP+ hard-100 and +0.22 peak lift on SWE-bench Verified.
-
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
-
Towards Knowledgeable Deep Research: Framework and Benchmark
The paper introduces the KDR task, HKA multi-agent framework, and KDR-Bench to enable LLM agents to integrate structured knowledge into deep research reports, with experiments showing outperformance over prior agents.
-
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
- A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications