pith. sign in

Canonical reference

Title resolution pending

Canonical reference. 73% of citing Pith papers cite this work as background.

56 Pith papers citing it
Background 73% of classified citations

citation-role summary

background 7 baseline 2 dataset 2

citation-polarity summary

representative citing papers

Security in LLM-as-a-Judge: A Comprehensive SoK

cs.CR · 2026-03-31 · accept · novelty 8.0

The first SoK on LLM-as-a-Judge security organizes attacks targeting judges, attacks using judges, defenses leveraging judges, and security-domain applications while flagging vulnerabilities.

AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

AgroTools is a new benchmark for tool-augmented multimodal agents in agriculture featuring 539 QA pairs, 1,097 images, five task families, and 14 tools, with evaluations showing major limitations in current models' tool planning and execution.

Dependency-Aware Privacy for Multi-turn Agents

cs.CR · 2026-05-04 · unverdicted · novelty 7.0

RootGuard delivers turn-invariant privacy for multi-turn agents by noising root private attributes once and applying deterministic post-processing to all derived releases.

A-MBER: Affective Memory Benchmark for Emotion Recognition

cs.AI · 2026-04-08 · unverdicted · novelty 7.0

A-MBER is a new benchmark for evaluating AI models on using interaction history to recognize and explain a user's present affective state across judgment, retrieval, and explanation tasks.

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.

Generating Statistical Charts with Validation-Driven LLM Workflows

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A validation-driven LLM workflow generates 1,500 charts from 74 UCI datasets with 30,003 aligned QA pairs, revealing that current multimodal models handle chart syntax well but struggle with value extraction and reasoning.

citing papers explorer

Showing 50 of 56 citing papers.