pith. sign in

arxiv: 2605.22509 · v1 · pith:XUXOXJ5Rnew · submitted 2026-05-21 · 💻 cs.HC · cs.CL

Reflecti-Mate: A Conversational Agent for Adaptive Decision-Making Support Through System 1 and System 2 Thinking

Pith reviewed 2026-05-22 03:58 UTC · model grok-4.3

classification 💻 cs.HC cs.CL
keywords conversational agentsdecision makingreflective thinkingsystem 1system 2adaptive systemshuman-computer interactionholistic reflection
0
0 comments X

The pith

A conversational agent that adapts to individual thinking patterns promotes better integration of intuitive and analytical processes during personal decision making.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

High-stakes decisions require blending cognitive analysis with emotional and intuitive insights, yet most support systems focus only on the cognitive side. This paper tests an agent named Reflecti-Mate that monitors a user's language in real time and adjusts its questions to encourage a wider range of thinking. In experiments with 128 people, those using the adaptive agent showed more unique reflection paths, used language that mixed different thought types more often, and rated the support as more holistic. A comparison agent without adaptation produced similar, mostly analytical responses from everyone. If this approach holds, it suggests decision tools could become more effective by matching how each person naturally thinks rather than imposing one style.

Core claim

The central discovery is that the Reflecti-Mate agent, by adapting to users' thought patterns, enables more personalized reflective trajectories, elicits more integrative reflective language, and is perceived as providing stronger support for holistic reflection, whereas a baseline agent leads to homogenized profiles dominated by cognitive language.

What carries the argument

Reflecti-Mate, a conversational agent that adapts its support based on detected patterns in the user's reflective language to balance System 1 intuitive and System 2 analytical thinking.

If this is right

  • Users follow more personalized paths in their reflection instead of uniform ones.
  • Reflective language becomes more integrative across cognitive and non-cognitive elements.
  • Participants rate the agent higher for supporting complete, holistic reflection.
  • Without adaptation, user outputs converge to similar cognitive-heavy styles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Decision-support tools in fields like finance or medicine could benefit from similar adaptation to capture emotional considerations.
  • Long-term studies tracking whether integrated reflection leads to better actual decisions would test the practical value.
  • Interface designers might incorporate language monitoring to detect when users are stuck in one mode of thinking.

Load-bearing premise

The language analysis measures and perception scales truly indicate integration of System 1 and System 2 thinking rather than being artifacts of how the agents were designed or what participants believed the researchers expected.

What would settle it

Re-running the study with the same agents but using a different set of language metrics or blinded evaluators that show no increase in integrative language or perceived holism for the adaptive agent.

Figures

Figures reproduced from arXiv: 2605.22509 by Catharine Oertel, Catholijn M. Jonker, Hayley Hung, Morita Tarvirdians, Senthil Chandrasegaran.

Figure 1
Figure 1. Figure 1: ReflectiMate. A possible interaction with ReflectiMate [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Conditions. The baseline agent selects the next prompt based on the conversational history—including [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transformation of Linguistic Markers by Condition. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Reflection trajectories across clusters and conditions. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

Making high-stakes personal decisions involves cognitive, emotional, and intuitive processes, and individuals differ in how they allocate attention across these modes. Integration of these processes has shown to benefit decision making. Yet, most current decision-support systems focus primarily on supporting cognitive aspects, rather than adapting to the individual's thinking profile to support integration of different types of thoughts. In this study, we investigate an agent designed to encourage integration by adapting to the individual user's thought patterns. We explore its effects on participants' perceptions of the agent and their reflective behavior, in comparison with unaided pre-reflection and a baseline agent. In a between-subjects study (N = 128), our agent, which fostered broad and elaborated thinking, enabled more personalized reflective trajectories, elicited more integrative reflective language, and was perceived as providing stronger support for holistic reflection. In contrast, the baseline agent produced homogenized profiles dominated by cognitive language across participants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Reflecti-Mate, an adaptive conversational agent designed to support integration of System 1 (intuitive/emotional) and System 2 (analytical) thinking during high-stakes personal decisions. It reports a between-subjects study (N=128) comparing the adaptive agent against unaided pre-reflection and a non-adaptive baseline agent. Key claims are that the adaptive agent produces more personalized reflective trajectories, elicits more integrative reflective language, and is perceived as offering stronger holistic support, while the baseline yields homogenized, cognitively dominated profiles.

Significance. If the central results hold under scrutiny, the work offers a useful contribution to HCI research on adaptive decision-support systems by shifting focus from purely cognitive aids to agents that promote balanced cognitive-emotional integration. The between-subjects design and dual measurement approach (language analysis plus perception scales) are positive elements that could inform future personalized reflection tools. The paper would benefit from stronger validation of its language metrics to realize this potential.

major comments (2)
  1. [Methods] Methods section: The paper must supply precise details on the language-analysis pipeline used to quantify 'integrative reflective language' and 'personalized reflective trajectories' (e.g., specific dictionaries, coding rubrics, inter-rater reliability, or automated feature extraction). Without these, it is impossible to determine whether observed differences reflect genuine System 1/2 integration or simply surface-level compliance with the agent's prompts. This is load-bearing for the headline claim.
  2. [Results] Results section: The abstract asserts directional superiority on integrative language and personalization, yet the provided text supplies no statistical tests, effect sizes, confidence intervals, or descriptive statistics for the between-condition comparisons. Include a results table (or §5) reporting means, SDs, and inferential tests so readers can evaluate the magnitude and reliability of the reported effects.
minor comments (2)
  1. [System Design] Clarify the exact differences in prompt structure between the adaptive and baseline agents in the system description to help readers assess the degree of adaptation.
  2. [Abstract] The abstract would be strengthened by a brief parenthetical mention of the primary statistical outcomes or effect sizes supporting the main claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us identify areas for improvement in our manuscript. We address each major comment below and plan to incorporate the suggested changes in the revised version.

read point-by-point responses
  1. Referee: [Methods] Methods section: The paper must supply precise details on the language-analysis pipeline used to quantify 'integrative reflective language' and 'personalized reflective trajectories' (e.g., specific dictionaries, coding rubrics, inter-rater reliability, or automated feature extraction). Without these, it is impossible to determine whether observed differences reflect genuine System 1/2 integration or simply surface-level compliance with the agent's prompts. This is load-bearing for the headline claim.

    Authors: We fully agree with the referee that a precise description of the language-analysis pipeline is crucial for validating our findings and ensuring the results are not merely due to prompt compliance. In the revised manuscript, we will provide detailed information on the specific dictionaries, coding rubrics, inter-rater reliability metrics, and automated feature extraction methods used to measure integrative reflective language and personalized reflective trajectories. This addition will strengthen the methodological transparency and allow for better assessment of the claims regarding System 1 and System 2 integration. revision: yes

  2. Referee: [Results] Results section: The abstract asserts directional superiority on integrative language and personalization, yet the provided text supplies no statistical tests, effect sizes, confidence intervals, or descriptive statistics for the between-condition comparisons. Include a results table (or §5) reporting means, SDs, and inferential tests so readers can evaluate the magnitude and reliability of the reported effects.

    Authors: We acknowledge that the current manuscript does not include the requested statistical details in the Results section. We will revise the paper to include a results table or expanded section reporting means, standard deviations, statistical tests (e.g., appropriate inferential statistics for the between-subjects design), effect sizes, and confidence intervals for the comparisons on integrative language and personalization measures. This will enable readers to properly evaluate the magnitude and reliability of the effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical evaluation

full rationale

The paper reports results from a between-subjects experiment (N=128) that directly compares participant outcomes across three conditions using standard language-analysis metrics and perception scales. No derivations, equations, fitted parameters, or first-principles predictions are present; the reported effects on reflective trajectories, integrative language, and perceived support are measured outcomes rather than quantities constructed from the inputs by definition. The work contains no self-citation chains that bear the central claim, no uniqueness theorems imported from prior author work, and no ansatz smuggled via citation. The analysis is therefore self-contained as an empirical comparison whose validity rests on the chosen measures rather than on any internal reduction to its own premises.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a purely empirical HCI user study; it introduces no mathematical derivations, new theoretical entities, or fitted parameters that the central claim depends on.

pith-pipeline@v0.9.0 · 5712 in / 1120 out tokens · 36280 ms · 2026-05-22T03:58:56.269890+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    Phi-4 Technical Report

    Marah I. Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C.T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Y...

  2. [2]

    Annalena Aicher, Daniel Kornmüller, Yuki Matsuda, Stefan Ultes, Wolfgang Minker, and Keiichi Yasumoto. 2023. Towards breaking the self-imposed filter bubble in argumentative dialogues. (2023)

  3. [3]

    Annalena Aicher, Wolfgang Minker, and Stefan Ultes. 2022. Towards modelling self-imposed filter bubbles in argumentative dialogue systems. (2022)

  4. [4]

    Riku Arakawa and Hiromu Yakura. [n. d.]. Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth. InProceedings of the 6th ACM Conference on Conversational User Interfaces. 1–14

  5. [5]

    Ruben T Azevedo, Salvatore Maria Aglioti, and Bigna Lenggenhager. 2016. Participants’ above-chance recognition of own-heart sound combined with poor metacognitive awareness suggests implicit knowledge of own heart cardiodynamics. Scientific reports6, 1 (2016), 26545

  6. [6]

    Max H Bazerman and Dolly Chugh. 2006. Bounded awareness: Focusing failures in negotiation. InNegotiation theory and research. Psychology Press, 7–26

  7. [7]

    Antoine Bechara and Antonio R Damasio. 2005. The somatic marker hypothesis: A neural theory of economic decision.Games and economic behavior52, 2 (2005), 336–372

  8. [8]

    Godfred O Boateng, Torsten B Neilands, Edward A Frongillo, Hugo R Melgar-Quiñonez, and Sera L Young. 2018. Best practices for developing and validating scales for health, social, and behavioral research: a primer.Frontiers in public health6 (2018), 149

  9. [9]

    2025.Altman says Gen Z uses ChatGPT for life decisions, here’s why that’s both smart and risky

    Becca Caddy. 2025.Altman says Gen Z uses ChatGPT for life decisions, here’s why that’s both smart and risky. TechRadar. https://www.techradar.com/computing/artificial-intelligence/altman-says-gen- z-uses-chatgpt-for-life-decisions-heres-why-thats-both-smart-and-risky Accessed: 2025-12-05

  10. [10]

    Adrian R Camilleri. 2023. An investigation of big life decisions.Judgment and Decision Making18 (2023), e32

  11. [11]

    Timothy A Carey and Richard J Mullan. 2004. What is Socratic questioning? Psychotherapy: theory, research, practice, training41, 3 (2004), 217

  12. [12]

    Chun-Wei Chiang, Zhuoran Lu, Zhuoyan Li, and Ming Yin. 2024. Enhancing ai-assisted group decision making through llm-powered devil’s advocate. In Proceedings of the 29th International Conference on Intelligent User Interfaces. 103–119

  13. [13]

    NAJ Cornelissen, RJM Van Eerdt, HK Schraffenberger, and Willem FG Haselager. 2022. Reflection machines: increasing meaningful human control over Decision Support Systems.Ethics and Information Technology24, 2 (2022), 19

  14. [14]

    Patricia G Devine, Patrick S Forscher, Anthony J Austin, and William TL Cox. 2012. Long-term reduction in implicit race bias: A prejudice habit-breaking intervention. Journal of experimental social psychology48, 6 (2012), 1267–1278

  15. [15]

    2006.Head, Heart and Guts: How the world’s best companies develop complete leaders

    David L Dotlich, Peter C Cairo, and Stephen H Rhinesmith. 2006.Head, Heart and Guts: How the world’s best companies develop complete leaders. John Wiley & Sons

  16. [16]

    Glyn Elwyn and Talya Miron-Shatz. 2010. Deliberation before determination: the definition and evaluation of good decision making.Health Expectations13, 2 (2010), 139–147

  17. [17]

    Jonathan St B T Evans. 2008. Dual-processing accounts of reasoning, judgment, and social cognition.Annual Review of Psychology59 (2008), 255–278

  18. [18]

    Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Statistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior research methods41, 4 (2009), 1149–1160

  19. [19]

    Franz Faul, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. G* Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences.Behavior research methods39, 2 (2007), 175–191

  20. [20]

    Robert E Goodin and Simon J Niemeyer. 2003. When does deliberation begin? Internal reflection versus public discussion in deliberative democracy.Political Studies51, 4 (2003), 627–649

  21. [21]

    Adam J Guastella and Mark R Dadds. 2006. Cognitive-behavioral models of emotional writing: A validation study.Cognitive Therapy and Research30, 3 (2006), 397–414

  22. [22]

    Emmanuel Hadoux, Anthony Hunter, and Sylwia Polberg. 2023. Strategic argumentation dialogues for persuasion: Framework and experiments based on modelling the beliefs and concerns of the persuadee.Argument & Computation14, 2 (2023), 109–161

  23. [23]

    Kate E Hamilton-West and Lyn Quine. 2007. Effects of written emotional disclosure on health outcomes in patients with ankylosing spondylitis.Psychology and Health22, 6 (2007), 637–657

  24. [24]

    Hsieh-Hong Huang, Jack Shih-Chieh Hsu, and Cheng-Yuan Ku. 2012. Understanding the role of computer-mediated counter-argument in countering confirmation bias. Decision Support Systems53, 3 (2012), 438–447

  25. [25]

    Bryan D Jones. 1999. Bounded rationality.Annual review of political science2, 1 (1999), 297–321

  26. [26]

    Daniel Kahneman and Shane Frederick. 2002. Representativeness revisited: Attribute substitution in intuitive judgment. InHeuristics and Biases: The Psychology of Intuitive Judgment, Thomas Gilovich, Dale Griffin, and Daniel Kahneman (Eds.). Cambridge University Press, 49–81

  27. [27]

    G Klein. 1998. Sources of Power: How People Make decisions MIT Press Cambridge MA. (1998)

  28. [28]

    Philipp Koralus. 2025. The philosophic turn for AI agents: replacing centralized digital rhetoric with decentralized truth-seeking: P. Koralus.Mind & Society(2025), 1–24

  29. [29]

    Russell F Korte. 2003. Biases in decision making and implications for human resource development.Advances in Developing Human Resources5, 4 (2003), 440–457

  30. [30]

    Ivica Kostric, Krisztian Balog, and Ujwal Gadiraju. 2025. Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems. InProceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization. 164–173

  31. [31]

    Nguyen-Thinh Le and Laura Wartschinski. 2018. A cognitive assistant for improving human reasoning skills.International Journal of Human-Computer Studies117 (2018), 45–54

  32. [32]

    Lewis and William A

    David D. Lewis and William A. Gale. 1994. A sequential algorithm for training text classifiers.Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(1994), 3–12

  33. [33]

    Mary R Lynn. 1986. Determination and quantification of content validity.Nursing research35, 6 (1986), 382–386

  34. [34]

    Ine Mols, Elise Van den Hoven, and Berry Eggen. 2016. Informing design for reflection: An overview of current everyday practices. InProceedings of the 9th Nordic Conference on Human-Computer Interaction. 1–10

  35. [35]

    Paul Norris and Seymour Epstein. 2011. An experiential thinking style: Its facets and relations with objective and subjective criterion measures.Journal of personality79, 5 (2011), 1043–1080

  36. [36]

    Soya Park and Chinmay Kulkarni. 2023. Thinking assistants: Llm-based conversational assistants that help users think by asking rather than answering.arXiv preprint arXiv:2312.06024(2023)

  37. [37]

    Richard Paul and Linda Elder. 2007. Critical thinking: The art of Socratic questioning. Journal of developmental education31, 1 (2007), 36

  38. [38]

    Pennebaker, Ryan L

    James W. Pennebaker, Ryan L. Boyd, Richard J. Booth, A. Ashokkumar, and M. E. Francis. 2022.Linguistic Inquiry and Word Count: LIWC-22. Pennebaker Conglomerates, Austin, TX. https://www.liwc.app

  39. [39]

    Leon Reicherts, Gun Woo Park, and Yvonne Rogers. 2022. Extending Chatbots to probe users: Enhancing complex decision-making through probing conversations. In Proceedings of the 4th Conference on Conversational User Interfaces. 1–10

  40. [40]

    Leon Reicherts, Zelun Tony Zhang, Elisabeth von Oswald, Yuanting Liu, Yvonne Rogers, and Mariam Hassib. 2025. AI, help me think—but for myself: Assisting people in complex decision-making by providing different kinds of cognitive support. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19

  41. [41]

    Troy D Sadler and Dana L Zeidler. 2005. Patterns of informal reasoning in the context of socioscientific decision making.Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching42, 1 (2005), 112–138

  42. [42]

    Lucrezia Savioni, Stefano Triberti, Ilaria Durosini, and Gabriella Pravettoni. 2023. How to make big decisions: A cross-sectional study on the decision making process in life choices.Current Psychology42, 18 (2023), 15223–15236

  43. [43]

    Irene Scopelliti, Carey K Morewedge, Erin McCormick, H Lauren Min, Sophie Lebrecht, and Karim S Kassam. 2015. Bias blind spot: Structure, measurement, and consequences.Management Science61, 10 (2015), 2468–2486. UMAP ’26, June 08–11, 2026, Gothenburg, Sweden Tarvirdians et al

  44. [44]

    Sarah Seraj, Kate G Blackburn, and James W Pennebaker. 2021. Language left behind on social media exposes the emotional and cognitive costs of a romantic breakup. Proceedings of the National Academy of Sciences118, 7 (2021), e2017154118

  45. [45]

    2009.Active learning literature survey

    Burr Settles. 2009.Active learning literature survey. Ph. D. Dissertation. University of Wisconsin-Madison

  46. [46]

    Li Shi, Houjiang Liu, Yian Wong, Utkarsh Mujumdar, Dan Zhang, Jacek Gwizdka, and Matthew Lease. 2024. Argumentative experience: Reducing confirmation bias on controversial issues through llm-generated multi-persona debates.arXiv preprint arXiv:2412.04629(2024)

  47. [47]

    Paul J Silvia. 2022. The self-reflection and insight scale: Applying item response theory to craft an efficient short form.Current Psychology41, 12 (2022), 8635–8645

  48. [48]

    Steven A. Sloman. 1996. The empirical case for two systems of reasoning. Psychological Bulletin119, 1 (1996), 3–22

  49. [49]

    Grant Soosalu, Suzanne Henwood, and Arun Deo. 2019. Head, heart, and gut in decision making: Development of a multiple brain preference questionnaire.Sage Open9, 1 (2019), 2158244019837439

  50. [50]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. 2018.Reinforcement Learning: An Introduction. MIT Press

  51. [51]

    Tabachnick and Linda S

    Barbara G. Tabachnick and Linda S. Fidell. 2019.Using Multivariate Statistics(7 ed.). Pearson

  52. [52]

    Morita Tarvirdians, Senthil Chandrasegaran, Hayley Hung, Catholijn M Jonker, and Catharine Oertel. 2025. Reflection Before Action: Designing a Framework for Quantifying Thought Patterns for Increased Self-awareness in Personal Decision Making.arXiv preprint arXiv:2510.04364(2025)

  53. [53]

    Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics and Biases: Biases in judgments reveal some heuristics of thinking under uncertainty. Science185, 4157 (1974), 1124–1131

  54. [54]

    Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning.Machine Learning8, 3-4 (1992), 279–292

  55. [55]

    Klaus Weber, Annalena Aicher, Wolfang Minker, Stefan Ultes, and Elisabeth André

  56. [56]

    Fostering user engagement in the critical reflection of arguments.arXiv preprint arXiv:2308.09061(2023)

  57. [57]

    Yu Zhang, Jingwei Sun, Li Feng, Cen Yao, Mingming Fan, Liuxin Zhang, Qianying Wang, Xin Geng, and Yong Rui. 2024. See widely, think wisely: Toward designing a generative multi-agent system to burst filter bubbles. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–24

  58. [58]

    Zelun Tony Zhang and Leon Reicherts. 2025. Augmenting Human Cognition With Generative AI: Lessons From AI-Assisted Decision-Making.arXiv preprint arXiv:2504.03207(2025)