Canonical reference

Infiagent-dabench: Evaluating agents on data analysis tasks

Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai, Qianli Ma, Guoyin Wang, Xuwu Wang, Jing Su, Jingjing Xu, Ming Zhu, et al · 2024 · arXiv 2401.05507

Canonical reference. 100% of citing Pith papers cite this work as background.

7 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 7 citing papers

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

cs.CR · 2025-07-14 · unverdicted · novelty 8.0

ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

cs.MA · 2025-06-05 · accept · novelty 7.0

A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.

How to Interpret Agent Behavior

cs.AI · 2026-05-13 · conditional · novelty 6.0

ACT*ONOMY is a Grounded-Theory-derived hierarchical taxonomy and open repository that enables systematic comparison and characterization of autonomous agent behavior across trajectories.

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

cs.AI · 2025-04-19 · unverdicted · novelty 6.0

InfiGUI-R1 uses Reasoning Injection via spatial distillation followed by Deliberation Enhancement via RL to evolve GUI agents from reactive actors to deliberative reasoners, reporting strong performance on grounding and trajectory tasks.

Auditing and Controlling AI Agent Actions in Spreadsheets

cs.HC · 2026-04-22 · unverdicted · novelty 5.0

Pista decomposes AI agent actions in spreadsheets into auditable steps, enabling real-time user intervention that improves task outcomes, user comprehension, agent perception, and sense of co-ownership over baseline agents.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

citing papers explorer

Showing 7 of 7 citing papers.

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation cs.CR · 2025-07-14 · unverdicted · none · ref 16
ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills cs.CL · 2026-05-20 · unverdicted · none · ref 46
Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems cs.MA · 2025-06-05 · accept · none · ref 59
A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
How to Interpret Agent Behavior cs.AI · 2026-05-13 · conditional · none · ref 18
ACT*ONOMY is a Grounded-Theory-derived hierarchical taxonomy and open repository that enables systematic comparison and characterization of autonomous agent behavior across trajectories.
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners cs.AI · 2025-04-19 · unverdicted · none · ref 19
InfiGUI-R1 uses Reasoning Injection via spatial distillation followed by Deliberation Enhancement via RL to evolve GUI agents from reactive actors to deliberative reasoners, reporting strong performance on grounding and trajectory tasks.
Auditing and Controlling AI Agent Actions in Spreadsheets cs.HC · 2026-04-22 · unverdicted · none · ref 22
Pista decomposes AI agent actions in spreadsheets into auditable steps, enabling real-time user intervention that improves task outcomes, user comprehension, agent perception, and sense of co-ownership over baseline agents.
AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 70
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

Infiagent-dabench: Evaluating agents on data analysis tasks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer