Twin agents as personal digital representations create distinct trust calibration challenges because they dissolve the boundary between AI and human decision-makers, unlike existing frameworks designed for clear separation.
hub Mixed citations
To trust or to think: Cognitive forcing functions can reduce over-reliance on AI in AI-assisted decision-making
Mixed citation behavior. Most common role is background (47%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Older Korean immigrants use pragmatic disengagement to avoid stressful technologies and interdependent navigation where digital skills are shared family resources, treating non-use as culturally grounded data refusal.
HANSEL extracts navigable evidence from agent trajectories with 83.7% precision and 88.8% recall on 45 tasks, reduces volume by 61.6%, and improves verification metrics in a 14-participant study.
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.
Summary reasoning traces from LLMs maintain task performance and increase trust and appeal relative to answer-only or full-trace conditions, but none of the formats improve users' metacognitive calibration on reasoning tasks.
Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.
AI-authored goals produce higher SMART quality scores but lower psychological ownership, commitment, importance, and goal-directed behavior than self-authored goals, with ownership as the mediating mechanism.
A folk theorem for LLMs proves that all feasible and individually rational outcomes can be sustained as ε-equilibria in repeated games where LLMs advise client populations, despite indirect observation.
LLMs produce interpretive closure in 87.5% of ambiguous social scenarios through narrative alignment, reversal, or normative advice, with first-person perspectives increasing alignment tendencies.
The study proposes the Gradual Voluntary Participation (GVP) framework to reconceptualize participatory AI governance in journalism as a gradual and voluntary process using a bidimensional matrix.
The authors introduce Agentivism as a learning theory for human-AI interaction that explains how durable capability develops through selective delegation, epistemic monitoring, reconstructive internalization, and transfer under reduced support.
Higher generative AI error rates reduce user reliance, but task difficulty does not significantly moderate this effect.
A qualitative study with 22 creative writers finds that the reflective value of AI refusals depends on alignment with users' situational thinking phases, cognitive beliefs, and views of AI roles.
A new benchmark exposes food-safety gaps in current LLMs and guardrails, and a fine-tuned 4B model is offered as a domain-specific fix.
LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.
Shorter LLM response latencies reduce perceived output thoughtfulness and usefulness, while task type affects prompting frequency independently of latency.
13 participants became convinced AI understands human values after chatbot interactions evaluated with the VAPT toolkit.
An experiment found LLM counterarguments improved group flexibility and satisfaction while AI mediation boosted minority participation but lowered psychological safety.
Two linked user studies find that LLM rationale correctness and certainty framing affect trust and decision confidence while presentation format does not, and incorrect rationales increase gaze attention and pupil size.
Author proposes adversarial co-thinking as a method of calibrating and triangulating multiple GenAI tools to generate critique during academic paper drafting, based on personal parallel use of Claude, ChatGPT, and Gemini.
RAID is a reflective agent system that infers intent from single expert edits and propagates corrections across compositional knowledge bases through a three-step architecture.
Exploratory user study of 48 participants finds trade-offs in efficiency, contextual alignment, and social comfort when AI writing assistance varies along synchronous and visual dimensions.
Mixed-methods study finds AI assistance linked to higher textual overlap with suggestions in writing tasks, and a reflective interface prototype increases user awareness of AI incorporation.
LLM reasoning traces and post-hoc explanations increase false trust in incorrect predictions, whereas contrastive dual explanations enhance users' ability to distinguish correct from incorrect AI outputs.
citing papers explorer
-
From Role to Person: Trust Calibration Challenges in Twin Agents
Twin agents as personal digital representations create distinct trust calibration challenges because they dissolve the boundary between AI and human decision-makers, unlike existing frameworks designed for clear separation.
-
HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification
HANSEL extracts navigable evidence from agent trajectories with 83.7% precision and 88.8% recall on 45 tasks, reduces volume by 61.6%, and improves verification metrics in a 14-participant study.
-
Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents
Exploratory interview study with 17 developers identifies four forms of emergent oversight work for software agents and documents situated challenges and heuristics.
-
Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition
Summary reasoning traces from LLMs maintain task performance and increase trust and appeal relative to answer-only or full-trace conditions, but none of the formats improve users' metacognitive calibration on reasoning tasks.
-
"It became a self-fulfilling prophecy": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps
Users entangle their lived experiences with AI predictions in menstrual tracking apps, leading to self-fulfilling prophecies, limited critical awareness from UI, and isolation for non-normative users.
-
Optimized but Unowned: How AI-Authored Goals Undermine the Motivation They Are Meant to Drive
AI-authored goals produce higher SMART quality scores but lower psychological ownership, commitment, importance, and goal-directed behavior than self-authored goals, with ownership as the mediating mechanism.
-
Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs
A folk theorem for LLMs proves that all feasible and individually rational outcomes can be sustained as ε-equilibria in repeated games where LLMs advise client populations, despite indirect observation.
-
What Did They Mean? How LLMs Resolve Ambiguous Social Situations across Perspectives and Roles
LLMs produce interpretive closure in 87.5% of ambiguous social scenarios through narrative alignment, reversal, or normative advice, with first-person perspectives increasing alignment tendencies.
-
Gradual Voluntary Participation: A Framework for Participatory AI Governance in Journalism
The study proposes the Gradual Voluntary Participation (GVP) framework to reconceptualize participatory AI governance in journalism as a gradual and voluntary process using a bidimensional matrix.
-
Agentivism: a learning theory for the age of artificial intelligence
The authors introduce Agentivism as a learning theory for human-AI interaction that explains how durable capability develops through selective delegation, epistemic monitoring, reconstructive internalization, and transfer under reduced support.
-
Effects of Generative AI Errors on User Reliance Across Task Difficulty
Higher generative AI error rates reduce user reliance, but task difficulty does not significantly moderate this effect.
-
Beyond Compliance: How AI Could Help Creative Writers by Refusing Them
A qualitative study with 22 creative writers finds that the reflective value of AI refusals depends on alignment with users' situational thinking phases, cognitive beliefs, and views of AI roles.
-
Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models
A new benchmark exposes food-safety gaps in current LLMs and guardrails, and a fine-tuned 4B model is offered as a domain-specific fix.
-
Analyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems
LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.
-
The Impact of Response Latency and Task Type on Human-LLM Interaction and Perception
Shorter LLM response latencies reduce perceived output thoughtfulness and usefulness, while task type affects prompting frequency independently of latency.
-
AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations
13 participants became convinced AI understands human values after chatbot interactions evaluated with the VAPT toolkit.
-
Investigating LLM-Powered Dissenting Minority Support in Power-Imbalanced Group Decision-Making: Counterargument and Mediation as Intervention Strategies
An experiment found LLM counterarguments improved group flexibility and satisfaction while AI mediation boosted minority participation but lowered psychological safety.
-
When LLM Rationales Become User-Facing: Effects on Trust Perception, Decision-Making, and Gaze Behaviors
Two linked user studies find that LLM rationale correctness and certainty framing affect trust and decision confidence while presentation format does not, and incorrect rationales increase gaze attention and pupil size.
-
Adversarial Co-Thinking: Calibration and Triangulation Across Multiple GenAI Tools in HCI Writing
Author proposes adversarial co-thinking as a method of calibrating and triangulating multiple GenAI tools to generate critique during academic paper drafting, based on personal parallel use of Claude, ChatGPT, and Gemini.
-
Scaling Expert Feedback with Reflective Edit Propagation in Compositional Knowledge Bases
RAID is a reflective agent system that infers intent from single expert edits and propagates corrections across compositional knowledge bases through a three-step architecture.
-
"It Felt a Bit Eerie": Exploring Humanlike Interactions During Collaborative Writing with an Artificial Agent
Exploratory user study of 48 participants finds trade-offs in efficiency, contextual alignment, and social comfort when AI writing assistance varies along synchronous and visual dimensions.
-
Overreliance in Writing Tasks: Exploring Similarity-Based Measures of AI Influence on Writing and Proposing a Reflective Writing Interface Intervention
Mixed-methods study finds AI assistance linked to higher textual overlap with suggestions in writing tasks, and a reflective interface prototype increases user awareness of AI incorporation.
-
Evaluating the False Trust Engendered by LLM Explanations
LLM reasoning traces and post-hoc explanations increase false trust in incorrect predictions, whereas contrastive dual explanations enhance users' ability to distinguish correct from incorrect AI outputs.
-
Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows
Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.
-
Auditing and Controlling AI Agent Actions in Spreadsheets
Pista decomposes AI agent actions in spreadsheets into auditable steps, enabling real-time user intervention that improves task outcomes, user comprehension, agent perception, and sense of co-ownership over baseline agents.
-
Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research
AVA is a specialized GenAI platform for development policy research that provides verifiable syntheses from World Bank reports and is associated with 2.4-3.9 hours of weekly time savings in a large-scale user evaluation.
-
Toward Human-AI Complementarity Across Diverse Tasks
Human-AI hybrids achieve only +0.4pp over AI alone on diverse tasks because confidence routing fails to identify the small set of cases where humans can correct AI errors.
-
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
-
Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction
Analysis of 1,223 AI-HCI papers shows declining focus on human epistemic sovereignty and rising optimization of autonomous agents, leading to a proposal for scaffolded cognitive friction via multi-agent systems to preserve human cognitive agency.
-
Hallucinations in Organization-backed AI advisors: Evidence about Skepticism, Verification, and Reliance in Goal-Directed Use
Literature review synthesizing evidence on user skepticism, verification, and reliance with hallucinating AI advisors, noting that output-related cues like warnings show weak effects and that content category has not been experimentally varied.
-
VArify: A Visual Analytics System for Verifying Knowledge Enhanced Large Language Model Responses in Food Science
VArify introduces a tree visualization to support human verification of GraphRAG evidence for LLM responses in food science, evaluated in a study with six domain experts.
-
A Model of Integrated Information Processing in Human-AI Interaction
The IIP model is a cybernetic framework representing humans and AI as coupled control loops whose efficacy depends on input adequacy, reference consonance, and output operativity to guide interface design.
-
Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks
An online experiment finds that showing users an overview of an AI's values reduces reliance on AI suggestions during writing tasks.
-
Exploring Instant Photography using Generative AI: A Design Probe with the UnReality Camera
The UnReality Camera augments instant photos with generative AI from spoken input, and a design probe found users balancing artistic control with appreciation for unpredictability, suspense during printing, and ownership from the physical form.
-
From Trust to Appropriate Reliance: Measurement Constructs in Human-AI Decision-Making
A literature review shows that constructs for appropriate reliance on AI are fragmented, presents three views on the topic, and calls for consensus on objective metrics to enable better comparisons across studies.
- Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions