pith. sign in

ReportBench: Evaluating deep research agents via academic survey tasks

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 1 dataset 1

citation-polarity summary

fields

cs.AI 3 cs.CL 1

years

2026 4

verdicts

UNVERDICTED 4

representative citing papers

How Adversarial Environments Mislead Agentic AI?

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

citing papers explorer

Showing 4 of 4 citing papers.

  • Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents? cs.CL · 2026-05-18 · unverdicted · none · ref 24

    REFLECT benchmark shows current LLM judges achieve below 55% accuracy detecting failures in evidence-based research agents, especially on evidence verification.

  • Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics cs.AI · 2026-05-09 · unverdicted · none · ref 12

    Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.

  • How Adversarial Environments Mislead Agentic AI? cs.AI · 2026-04-20 · unverdicted · none · ref 19

    Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.

  • AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 103

    The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.