pith. the verified trust layer for science. sign in

Autobench-v: Can large vision-language models benchmark themselves?arXiv preprint arXiv:2410.21259

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 1 cs.LG 1

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

SkillGen: Verified Inference-Time Agent Skill Synthesis

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

SkillGen synthesizes auditable skills from agent trajectories via contrastive induction on successes and failures, then verifies net performance impact by comparing outcomes with and without the skill on identical tasks.

citing papers explorer

Showing 2 of 2 citing papers.