pith. sign in

hub Canonical reference

On the tool manipulation capability of open-source large language models

Canonical reference. 100% of citing Pith papers cite this work as background.

13 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

roles

background 5

polarities

background 5

clear filters

representative citing papers

Continual Model Routing in Evolving Model Hubs

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

Formalizes continual model routing (CMR), releases CMRBench with over 2000 models, and presents CARvE which outperforms retrieval, fine-tuning and adapter-merging baselines on model/family/domain accuracy.

Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

Claw-Eval is a new trajectory-aware benchmark for LLM agents that records execution traces, audit logs, and environment snapshots to evaluate completion, safety, and robustness across 300 tasks, revealing that opaque grading misses 44% of safety issues.

Memory in the Age of AI Agents

cs.CL · 2025-12-15 · unverdicted · novelty 6.0

The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.

Capability Self-Assessment: Teaching LLMs to Know Their Limits

cs.AI · 2026-05-29 · unverdicted · novelty 5.0

Reinforcement learning teaches LLMs to assess their own capabilities more effectively than supervised fine-tuning, preserves original skills, generalizes out of distribution, and aids local-cloud routing and data selection.

citing papers explorer

Showing 1 of 1 citing paper after filters.