Laura Hanu and Unitary

Shao Zhang, Xihuai Wang, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, et al · 2025 · arXiv 2502.11882

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.

When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks

cs.HC · 2025-10-06 · conditional · novelty 6.0

A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.

Structured In-context Environment Scaling for Large Language Model Reasoning

cs.CL · 2025-09-27 · conditional · novelty 6.0

SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.

citing papers explorer

Showing 3 of 3 citing papers.

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents cs.CR · 2026-05-11 · unverdicted · none · ref 7
Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.
When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks cs.HC · 2025-10-06 · conditional · none · ref 101
A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.
Structured In-context Environment Scaling for Large Language Model Reasoning cs.CL · 2025-09-27 · conditional · none · ref 30
SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.

Laura Hanu and Unitary

fields

years

verdicts

representative citing papers

citing papers explorer