API-Bank: A comprehensive benchmark for tool-augmented LLMs

Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, Yongbin Li · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

dataset 2

citation-polarity summary

use dataset 2

representative citing papers

When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

Tool-use agents suffer large accuracy drops from reward and transition perturbations but domain-randomized RL on static perturbations closes about 27% of the unseen transition gap while retaining most clean performance.

Sequential Behavioral Watermarking for LLM Agents

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

SeqWM embeds watermarks into history-conditioned action transitions in LLM agent trajectories and verifies them position-agnostically, achieving robust detection under perturbations where prior per-step methods fail.

citing papers explorer

Showing 2 of 2 citing papers.

When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents cs.AI · 2026-05-12 · unverdicted · none · ref 16
Tool-use agents suffer large accuracy drops from reward and transition perturbations but domain-randomized RL on static perturbations closes about 27% of the unseen transition gap while retaining most clean performance.
Sequential Behavioral Watermarking for LLM Agents cs.CR · 2026-05-11 · unverdicted · none · ref 19
SeqWM embeds watermarks into history-conditioned action transitions in LLM agent trajectories and verifies them position-agnostically, achieving robust detection under perturbations where prior per-step methods fail.

API-Bank: A comprehensive benchmark for tool-augmented LLMs

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer