pith. sign in

hub Canonical reference

Agent-RewardBench: Towards a unified benchmark for reward modeling across perception, planning, and safety in real- world multimodal agents

Canonical reference. 90% of citing Pith papers cite this work as background.

24 Pith papers citing it
Background 90% of classified citations

hub tools

citation-role summary

background 9 dataset 1

citation-polarity summary

years

2026 22 2025 2

polarities

background 9 support 1

clear filters

representative citing papers

Code Generation by Differential Test Time Scaling

cs.SE · 2026-05-19 · unverdicted · novelty 7.0

DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.

Evaluation of Agents under Simulated AI Marketplace Dynamics

cs.IR · 2026-04-15 · unverdicted · novelty 6.0

Marketplace Evaluation uses repeated-interaction simulations to assess information access systems with marketplace-level metrics such as retention and market share that complement traditional accuracy measures.

ReflectCAP: Detailed Image Captioning with Reflective Memory

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.

citing papers explorer

Showing 24 of 24 citing papers.