Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
Medium personality expression in LLM agents yields the most positive user perceptions in goal-oriented tasks, further improved by trait alignment.
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
An LLM-driven role-playing game with implicit input enhancement and explicit practice tracking produced 21-27% better retention of slang phrases after one week than a standard AI-led virtual classroom.
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specific designs are needed.
Direct research on AI consciousness is intractable, so the field should prioritize studying perceived AI consciousness and its societal consequences.
citing papers explorer
-
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
-
Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks
Medium personality expression in LLM agents yields the most positive user perceptions in goal-oriented tasks, further improved by trait alignment.
-
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
-
Game Master LLM: Task-Based Role-Playing for Natural Slang Learning
An LLM-driven role-playing game with implicit input enhancement and explicit practice tracking produced 21-27% better retention of slang phrases after one week than a standard AI-led virtual classroom.
-
What if AI systems weren't chatbots?
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specific designs are needed.
-
AI and Consciousness: Shifting Focus Towards Tractable Questions
Direct research on AI consciousness is intractable, so the field should prioritize studying perceived AI consciousness and its societal consequences.