Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan

Sparkme: Adaptive semi-structured interviewing for qualitative insight discovery · 2025 · arXiv 2602.21136

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations

cs.CL · 2026-05-30 · unverdicted · novelty 7.0

Momento benchmark reveals current agents fail at multi-session tasks mainly by misestimating user state and treating old session history as current context instead of stale data needing re-validation.

T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains

cs.CL · 2026-06-09 · unverdicted · novelty 5.0

T1-Bench introduces a multi-domain benchmark for agentic LLM systems featuring 25 domains, interleaved scenarios, and both automatic and human evaluation.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations cs.CL · 2026-05-30 · unverdicted · none · ref 1
Momento benchmark reveals current agents fail at multi-session tasks mainly by misestimating user state and treating old session history as current context instead of stale data needing re-validation.
T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains cs.CL · 2026-06-09 · unverdicted · none · ref 1
T1-Bench introduces a multi-domain benchmark for agentic LLM systems featuring 25 domains, interleaved scenarios, and both automatic and human evaluation.

Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan

fields

years

verdicts

representative citing papers

citing papers explorer